LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 15

Search options

  1. Book ; Online: Error Probability Bounds for Coded-Index DNA Storage

    Weinberger, Nir

    2022  

    Abstract: The DNA storage channel is considered, in which a codeword is comprised of $M$ unordered DNA molecules. At reading time, $N$ molecules are sampled with replacement, and then each molecule is sequenced. A coded-index concatenated-coding scheme is ... ...

    Abstract The DNA storage channel is considered, in which a codeword is comprised of $M$ unordered DNA molecules. At reading time, $N$ molecules are sampled with replacement, and then each molecule is sequenced. A coded-index concatenated-coding scheme is considered, in which the $m$th molecule of the codeword is restricted to a subset of all possible molecules (an inner code), which is unique for each $m$. The decoder has low-complexity, and is based on first decoding each molecule separately (the inner code), and then decoding the sequence of molecules (an outer code). Only mild assumptions are made on the sequencing channel, in the form of the existence of an inner code and decoder with vanishing error. The error probability of a random code as well as an expurgated code is analyzed and shown to decay exponentially with $N$. This establishes the importance of increasing the coverage depth $N/M$ in order to obtain low error probability.
    Keywords Computer Science - Information Theory
    Subject code 003
    Publishing date 2022-05-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Fundamental Limits of Reference-Based Sequence Reordering

    Weinberger, Nir / Shomorony, Ilan

    2023  

    Abstract: The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the fragments are ... ...

    Abstract The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the fragments are relatively long, and typically no fragment appears more than once, the scaling of the failure probability of maximum likelihood reconstruction algorithm is exactly determined for perfect reconstruction and bounded for partial reconstruction. Second, the regime in which the fragments are relatively short and repeating fragments abound is characterized. A trade-off is stated between the fraction of fragments that cannot be adequately reconstructed vs. the distortion level allowed for the reconstruction of each fragment, while still allowing vanishing failure probability
    Keywords Computer Science - Information Theory
    Publishing date 2023-07-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Learning Maximum Margin Channel Decoders

    Tsvieli, Amit / Weinberger, Nir

    2022  

    Abstract: The problem of learning a channel decoder is considered for two channel models. The first model is an additive noise channel whose noise distribution is unknown and nonparametric. The learner is provided with a fixed codebook and a dataset comprised of ... ...

    Abstract The problem of learning a channel decoder is considered for two channel models. The first model is an additive noise channel whose noise distribution is unknown and nonparametric. The learner is provided with a fixed codebook and a dataset comprised of independent samples of the noise, and is required to select a precision matrix for a nearest neighbor decoder in terms of the Mahalanobis distance. The second model is a non-linear channel with additive white Gaussian noise and unknown channel transformation. The learner is provided with a fixed codebook and a dataset comprised of independent input-output samples of the channel, and is required to select a matrix for a nearest neighbor decoder with a linear kernel. For both models, the objective of maximizing the margin of the decoder is addressed. Accordingly, for each channel model, a regularized loss minimization problem with a codebook-related regularization term and hinge-like loss function is developed, which is inspired by the support vector machine paradigm for classification problems. Expected generalization error bounds for the error probability loss function are provided for both models, under optimal choice of the regularization parameter. For the additive noise channel, a theoretical guidance for choosing the training signal-to-noise ratio is proposed based on this bound. In addition, for the non-linear channel, a high probability uniform generalization error bound is provided for the hypothesis class. For each channel, a stochastic sub-gradient descent algorithm for solving the regularized loss minimization problem is proposed, and an optimization error bound is stated. The performance of the proposed algorithms is demonstrated through several examples.
    Keywords Computer Science - Information Theory
    Subject code 003
    Publishing date 2022-03-13
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Quantifying the Loss of Acyclic Join Dependencies

    Kenig, Batya / Weinberger, Nir

    2022  

    Abstract: Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema ... ...

    Abstract Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by the acyclic join. Recent work has shown that the loss of an AJD can also be characterized by an information-theoretic measure. Motivated by the problem of automatically fitting an acyclic schema to a universal relation, we investigate the connection between these two characterizations of loss. We first show that the loss of an AJD is captured using the notion of KL-Divergence. We then show that the KL-divergence can be used to bound the number of redundant tuples. We prove a deterministic lower bound on the percentage of redundant tuples. For an upper bound, we propose a random database model, and establish a high probability bound on the percentage of redundant tuples, which coincides with the lower bound for large databases.

    Comment: To appear in PODS 2023
    Keywords Computer Science - Databases ; Computer Science - Information Theory ; E.4 ; H.2.1
    Subject code 005
    Publishing date 2022-10-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models

    Zhang, Yihan / Weinberger, Nir

    2022  

    Abstract: We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes $ ... ...

    Abstract We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes $n$ samples of a $d$-dimensional parameter vector $\theta_{*}\in\mathbb{R}^{d}$, multiplied by a random sign $ S_i $ ($1\le i\le n$), and corrupted by isotropic standard Gaussian noise. The sequence of signs $\{S_{i}\}_{i\in[n]}\in\{-1,1\}^{n}$ is drawn from a stationary homogeneous Markov chain with flip probability $\delta\in[0,1/2]$. As $\delta$ varies, this model smoothly interpolates two well-studied models: the Gaussian Location Model for which $\delta=0$ and the Gaussian Mixture Model for which $\delta=1/2$. Assuming that the estimator knows $\delta$, we establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $\|\theta_{*}\|,\delta,d,n$. We then provide an upper bound to the case of estimating $\delta$, assuming a (possibly inaccurate) knowledge of $\theta_{*}$. The bound is proved to be tight when $\theta_{*}$ is an accurately known constant. These results are then combined to an algorithm which estimates $\theta_{*}$ with $\delta$ unknown a priori, and theoretical guarantees on its error are stated.
    Keywords Mathematics - Statistics Theory ; Computer Science - Information Theory ; Computer Science - Machine Learning
    Subject code 519
    Publishing date 2022-06-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: On Mismatched Oblivious Relaying

    Dikshtein, Michael / Weinberger, Nir / Shamai, Shlomo

    2023  

    Abstract: We consider the problem of reliable communication over a discrete memoryless channel (DMC) with the help of a relay, termed the information bottleneck (IB) channel. There is no direct link between the source and the destination, and the information flows ...

    Abstract We consider the problem of reliable communication over a discrete memoryless channel (DMC) with the help of a relay, termed the information bottleneck (IB) channel. There is no direct link between the source and the destination, and the information flows in two hops. The first hop is a noisy channel from the source to the relay. The second hop is a noiseless but limited-capacity backhaul link from the relay to the decoder. We further assume that the relay is oblivious to the transmission codebook. We examine two mismatch scenarios. In the first setting, we assume the decoder is restricted to use some fixed decoding rule, which is mismatched to the actual channel. In the second setting, we assume that the relay is restricted to use some fixed compression metric, which is again mismatched to the statistics of the relay input. We establish bounds on the random- coding capacity of both settings, some of which are shown to be ensemble tight.

    Comment: Accepted to ISIT2023
    Keywords Computer Science - Information Theory
    Subject code 003
    Publishing date 2023-05-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: M-DAB

    Kobovich, Adir / Yaakobi, Eitan / Weinberger, Nir

    An Input-Distribution Optimization Algorithm for Composite DNA Storage by the Multinomial Channel

    2023  

    Abstract: Recent experiments have shown that the capacity of DNA storage systems may be significantly increased by synthesizing composite DNA letters. In this work, we model a DNA storage channel with composite inputs as a \textit{multinomial channel}, and propose ...

    Abstract Recent experiments have shown that the capacity of DNA storage systems may be significantly increased by synthesizing composite DNA letters. In this work, we model a DNA storage channel with composite inputs as a \textit{multinomial channel}, and propose an optimization algorithm for its capacity achieving input distribution, for an arbitrary number of output reads. The algorithm is termed multidimensional dynamic assignment Blahut-Arimoto (M-DAB), and is a generalized version of the DAB algorithm, proposed by Wesel et al. developed for the binomial channel. We also empirically observe a scaling law behavior of the capacity as a function of the support size of the capacity-achieving input distribution.

    Comment: 6 pages, 3 figures
    Keywords Computer Science - Information Theory ; H.1.1
    Publishing date 2023-09-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article ; Online: Guessing with a Bit of Help.

    Weinberger, Nir / Shayevitz, Ofer

    Entropy (Basel, Switzerland)

    2019  Volume 22, Issue 1

    Abstract: What is the value of just a few bits to a guesser? We study this problem in a setup where Alice wishes to guess an independent and identically distributed (i.i.d.) random vector and can procure a fixed number ... ...

    Abstract What is the value of just a few bits to a guesser? We study this problem in a setup where Alice wishes to guess an independent and identically distributed (i.i.d.) random vector and can procure a fixed number of
    Language English
    Publishing date 2019-12-26
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2014734-X
    ISSN 1099-4300 ; 1099-4300
    ISSN (online) 1099-4300
    ISSN 1099-4300
    DOI 10.3390/e22010039
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Book ; Online: The EM Algorithm is Adaptively-Optimal for Unbalanced Symmetric Gaussian Mixtures

    Weinberger, Nir / Bresler, Guy

    2021  

    Abstract: This paper studies the problem of estimating the means $\pm\theta_{*}\in\mathbb{R}^{d}$ of a symmetric two-component Gaussian mixture $\delta_{*}\cdot N(\theta_{*},I)+(1-\delta_{*})\cdot N(-\theta_{*},I)$ where the weights $\delta_{*}$ and $1-\delta_{*}$ ...

    Abstract This paper studies the problem of estimating the means $\pm\theta_{*}\in\mathbb{R}^{d}$ of a symmetric two-component Gaussian mixture $\delta_{*}\cdot N(\theta_{*},I)+(1-\delta_{*})\cdot N(-\theta_{*},I)$ where the weights $\delta_{*}$ and $1-\delta_{*}$ are unequal. Assuming that $\delta_{*}$ is known, we show that the population version of the EM algorithm globally converges if the initial estimate has non-negative inner product with the mean of the larger weight component. This can be achieved by the trivial initialization $\theta_{0}=0$. For the empirical iteration based on $n$ samples, we show that when initialized at $\theta_{0}=0$, the EM algorithm adaptively achieves the minimax error rate $\tilde{O}\Big(\min\Big\{\frac{1}{(1-2\delta_{*})}\sqrt{\frac{d}{n}},\frac{1}{\|\theta_{*}\|}\sqrt{\frac{d}{n}},\left(\frac{d}{n}\right)^{1/4}\Big\}\Big)$ in no more than $O\Big(\frac{1}{\|\theta_{*}\|(1-2\delta_{*})}\Big)$ iterations (with high probability). We also consider the EM iteration for estimating the weight $\delta_{*}$, assuming a fixed mean $\theta$ (which is possibly mismatched to $\theta_{*}$). For the empirical iteration of $n$ samples, we show that the minimax error rate $\tilde{O}\Big(\frac{1}{\|\theta_{*}\|}\sqrt{\frac{d}{n}}\Big)$ is achieved in no more than $O\Big(\frac{1}{\|\theta_{*}\|^{2}}\Big)$ iterations. These results robustify and complement recent results of Wu and Zhou obtained for the equal weights case $\delta_{*}=1/2$.
    Keywords Mathematics - Statistics Theory ; Computer Science - Information Theory ; Computer Science - Machine Learning
    Subject code 519
    Publishing date 2021-03-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: The DNA Storage Channel

    Weinberger, Nir / Merhav, Neri

    Capacity and Error Probability

    2021  

    Abstract: The DNA storage channel is considered, in which $M$ Deoxyribonucleic acid (DNA) molecules comprising each codeword, are stored without order, then sampled $N$ times with replacement, and the sequenced over a discrete memoryless channel. For a constant ... ...

    Abstract The DNA storage channel is considered, in which $M$ Deoxyribonucleic acid (DNA) molecules comprising each codeword, are stored without order, then sampled $N$ times with replacement, and the sequenced over a discrete memoryless channel. For a constant coverage depth $M/N$ and molecule length scaling $\Theta(\log M)$, lower (achievability) and upper (converse) bounds on the capacity of the channel, as well as a lower (achievability) bound on the reliability function of the channel are provided. Both the lower and upper bounds on the capacity generalize a bound which was previously known to hold only for the binary symmetric sequencing channel, and only under certain restrictions on the molecule length scaling and the crossover probability parameters. When specified to binary symmetric sequencing channel, these restrictions are completely removed for the lower bound and are significantly relaxed for the upper bound. The lower bound on the reliability function is achieved under a universal decoder, and reveals that the dominant error event is that of outage -- the event in which the capacity of the channel induced by the DNA molecule sampling operation does not support the target rate.
    Keywords Computer Science - Information Theory
    Subject code 003
    Publishing date 2021-09-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top