LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 10

Search options

  1. Book ; Online: A Generalized Lottery Ticket Hypothesis

    Alabdulmohsin, Ibrahim / Markeeva, Larisa / Keysers, Daniel / Tolstikhin, Ilya

    2021  

    Abstract: We introduce a generalization to the lottery ticket hypothesis in which the notion of "sparsity" is relaxed by choosing an arbitrary basis in the space of parameters. We present evidence that the original results reported for the canonical basis continue ...

    Abstract We introduce a generalization to the lottery ticket hypothesis in which the notion of "sparsity" is relaxed by choosing an arbitrary basis in the space of parameters. We present evidence that the original results reported for the canonical basis continue to hold in this broader setting. We describe how structured pruning methods, including pruning units or factorizing fully-connected layers into products of low-rank matrices, can be cast as particular instances of this "generalized" lottery ticket hypothesis. The investigations reported here are preliminary and are provided to encourage further research along this direction.

    Comment: Workshop on Sparsity in Neural Networks: Advancing Understanding and Practice (SNN'21). Updates: New curve on Figure 2(left) and discussion on Li et al
    Keywords Computer Science - Machine Learning ; Computer Science - Computer Vision and Pattern Recognition ; 68T05 ; I.2.6 ; I.2.10
    Publishing date 2021-07-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Fine-Grained Distribution-Dependent Learning Curves

    Bousquet, Olivier / Hanneke, Steve / Moran, Shay / Shafer, Jonathan / Tolstikhin, Ilya

    2022  

    Abstract: Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples. They are widely used by machine learning practitioners as a measure of an algorithm's performance, but classic PAC learning theory ... ...

    Abstract Learning curves plot the expected error of a learning algorithm as a function of the number of labeled input samples. They are widely used by machine learning practitioners as a measure of an algorithm's performance, but classic PAC learning theory cannot explain their behavior. In this paper we introduce a new combinatorial characterization called the VCL dimension that improves and refines the recent results of Bousquet et al. (2021). Our characterization sheds new light on the structure of learning curves by providing fine-grained bounds, and showing that for classes with finite VCL, the rate of decay can be decomposed into a linear component that depends only on the hypothesis class and an exponential component that depends also on the target distribution. In particular, the finer nuance of the VCL dimension implies lower bounds that are quantitatively stronger than the bounds of Bousquet et al. (2021) and qualitatively stronger than classic 'no free lunch' lower bounds. The VCL characterization solves an open problem studied by Antos and Lugosi (1998), who asked in what cases such lower bounds exist. As a corollary, we recover their lower bound for half-spaces in $\mathbb{R}^d$, and we do so in a principled way that should be applicable to other cases as well. Finally, to provide another viewpoint on our work and how it compares to traditional PAC learning bounds, we also present an alternative formulation of our results in a language that is closer to the PAC setting.
    Keywords Computer Science - Machine Learning ; Computer Science - Computational Complexity ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2022-08-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Predicting Neural Network Accuracy from Weights

    Unterthiner, Thomas / Keysers, Daniel / Gelly, Sylvain / Bousquet, Olivier / Tolstikhin, Ilya

    2020  

    Abstract: We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights, without evaluating it on input data. We motivate this task and introduce a formal setting for it. Even when using ... ...

    Abstract We show experimentally that the accuracy of a trained neural network can be predicted surprisingly well by looking only at its weights, without evaluating it on input data. We motivate this task and introduce a formal setting for it. Even when using simple statistics of the weights, the predictors are able to rank neural networks by their performance with very high accuracy (R2 score more than 0.98). Furthermore, the predictors are able to rank networks trained on different, unobserved datasets and with different architectures. We release a collection of 120k convolutional neural networks trained on four different datasets to encourage further research in this area, with the goal of understanding network training and performance better.

    Comment: Updated the Small CNN Zoo dataset: reduced the maximal learning rate and got rid of multiple bad runs. Replaced all the experiments with the new numbers. Added MLP. Fixed typo in the abstract (R2 score instead of Kendall's tau). Added several earlier related works to the literature overview
    Keywords Statistics - Machine Learning ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2020-02-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Wasserstein Auto-Encoders

    Tolstikhin, Ilya / Bousquet, Olivier / Gelly, Sylvain / Schoelkopf, Bernhard

    2017  

    Abstract: We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads ... ...

    Abstract We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial auto-encoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score.

    Comment: Published at ICLR 2018. Included much wider hyperparameter sweep: in significant improvements in FIDs on CelebA
    Keywords Statistics - Machine Learning ; Computer Science - Machine Learning
    Publishing date 2017-11-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: GeNet

    Rojas-Carulla, Mateo / Tolstikhin, Ilya / Luque, Guillermo / Youngblut, Nicholas / Ley, Ruth / Schölkopf, Bernhard

    Deep Representations for Metagenomics

    2019  

    Abstract: We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training. We provide a comparison with state-of-the-art methods Kraken and Centrifuge on datasets ...

    Abstract We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training. We provide a comparison with state-of-the-art methods Kraken and Centrifuge on datasets obtained from several sequencing technologies, in which dataset shift occurs. We show that GeNet obtains competitive precision and good recall, with orders of magnitude less memory requirements. Moreover, we show that a linear model trained on top of representations learned by GeNet achieves recall comparable to state-of-the-art methods on the aforementioned datasets, and achieves over 90% accuracy in a challenging pathogen detection problem. This provides evidence of the usefulness of the representations learned by GeNet for downstream biological tasks.
    Keywords Quantitative Biology - Genomics ; Computer Science - Machine Learning ; Statistics - Machine Learning
    Publishing date 2019-01-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: When can unlabeled data improve the learning rate?

    Göpfert, Christina / Ben-David, Shai / Bousquet, Olivier / Gelly, Sylvain / Tolstikhin, Ilya / Urner, Ruth

    2019  

    Abstract: In semi-supervised classification, one is given access both to labeled and unlabeled data. As unlabeled data is typically cheaper to acquire than labeled data, this setup becomes advantageous as soon as one can exploit the unlabeled data in order to ... ...

    Abstract In semi-supervised classification, one is given access both to labeled and unlabeled data. As unlabeled data is typically cheaper to acquire than labeled data, this setup becomes advantageous as soon as one can exploit the unlabeled data in order to produce a better classifier than with labeled data alone. However, the conditions under which such an improvement is possible are not fully understood yet. Our analysis focuses on improvements in the minimax learning rate in terms of the number of labeled examples (with the number of unlabeled examples being allowed to depend on the number of labeled ones). We argue that for such improvements to be realistic and indisputable, certain specific conditions should be satisfied and previous analyses have failed to meet those conditions. We then demonstrate examples where these conditions can be met, in particular showing rate changes from $1/\sqrt{\ell}$ to $e^{-c\ell}$ and from $1/\sqrt{\ell}$ to $1/\ell$. These results improve our understanding of what is and isn't possible in semi-supervised learning.
    Keywords Computer Science - Machine Learning ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2019-05-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Practical and Consistent Estimation of f-Divergences

    Rubenstein, Paul K. / Bousquet, Olivier / Djolonga, Josip / Riquelme, Carlos / Tolstikhin, Ilya

    2019  

    Abstract: The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We ... ...

    Abstract The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We consider the case of stronger structural assumptions that are commonly satisfied in modern machine learning, including representation learning and generative modelling with autoencoder architectures. Under these assumptions we propose and study an estimator that can be easily implemented, works well in high dimensions, and enjoys faster rates of convergence. We verify the behavior of our estimator empirically in both synthetic and real-data experiments, and discuss its direct implications for total correlation, entropy, and mutual information estimation.
    Keywords Statistics - Machine Learning ; Computer Science - Information Theory ; Computer Science - Machine Learning
    Publishing date 2019-05-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: What Do Neural Networks Learn When Trained With Random Labels?

    Maennel, Hartmut / Alabdulmohsin, Ibrahim / Tolstikhin, Ilya / Baldock, Robert J. N. / Bousquet, Olivier / Gelly, Sylvain / Keysers, Daniel

    2020  

    Abstract: We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs ... ...

    Abstract We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We study this alignment effect by investigating neural networks pre-trained on randomly labelled image data and subsequently fine-tuned on disjoint datasets with random or real labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling. We analyze how competing effects, such as specialization at later layers, may hide the positive transfer. These effects are studied in several network architectures, including VGG16 and ResNet18, on CIFAR10 and ImageNet.

    Comment: Accepted, NeurIPS2020
    Keywords Statistics - Machine Learning ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2020-06-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Competitive Training of Mixtures of Independent Deep Generative Models

    Locatello, Francesco / Vincent, Damien / Tolstikhin, Ilya / Rätsch, Gunnar / Gelly, Sylvain / Schölkopf, Bernhard

    2018  

    Abstract: A common assumption in causal modeling posits that the data is generated by a set of independent mechanisms, and algorithms should aim to recover this structure. Standard unsupervised learning, however, is often concerned with training a single model to ... ...

    Abstract A common assumption in causal modeling posits that the data is generated by a set of independent mechanisms, and algorithms should aim to recover this structure. Standard unsupervised learning, however, is often concerned with training a single model to capture the overall distribution or aspects thereof. Inspired by clustering approaches, we consider mixtures of implicit generative models that ``disentangle'' the independent generative mechanisms underlying the data. Relying on an additional set of discriminators, we propose a competitive training procedure in which the models only need to capture the portion of the data distribution from which they can produce realistic samples. As a by-product, each model is simpler and faster to train. We empirically show that our approach splits the training distribution in a sensible way and increases the quality of the generated samples.
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2018-04-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: MLP-Mixer

    Tolstikhin, Ilya / Houlsby, Neil / Kolesnikov, Alexander / Beyer, Lucas / Zhai, Xiaohua / Unterthiner, Thomas / Yung, Jessica / Steiner, Andreas / Keysers, Daniel / Uszkoreit, Jakob / Lucic, Mario / Dosovitskiy, Alexey

    An all-MLP Architecture for Vision

    2021  

    Abstract: Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient ... ...

    Abstract Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.

    Comment: v2: Fixed parameter counts in Table 1. v3: Added results on JFT-3B in Figure 2(right); Added Section 3.4 on the input permutations. v4: Updated the x label in Figure 2(right)
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2021-05-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top