LIVIVO - Search results -

Search results

Result 1 - 3 of total 3

Search options

Book ; Online: Nonparametric Variational Regularisation of Pretrained Transformers

Fehr, Fabio / Henderson, James

2023

Abstract: The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their ... ...

Abstract	The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model to the new domain is large. Nonparametric Variational Information Bottleneck (NVIB) has been proposed as a regulariser for training cross-attention in Transformers, potentially addressing the overfitting problem. We extend the NVIB framework to replace all types of attention functions in Transformers, and show that existing pretrained Transformers can be reinterpreted as Nonparametric Variational (NV) models using a proposed identity initialisation. We then show that changing the initialisation introduces a novel, information-theoretic post-training regularisation in the attention mechanism, which improves out-of-domain generalisation without any training. This success supports the hypothesis that pretrained Transformers are implicitly NV Bayesian models.
Keywords	Computer Science - Machine Learning ; Computer Science - Computation and Language
Subject code	006
Publishing date	2023-12-01
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Henderson, James / Fehr, Fabio

2022

Abstract: We propose a VAE for Transformers by developing a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to ...

Abstract	We propose a VAE for Transformers by developing a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components supported by nonparametric methods captures the variable number of vectors supported by attention, and the exchangeability of our nonparametric distributions captures the permutation invariance of attention. This allows NVIB to regularise the number of vectors accessible with attention, as well as the amount of information in individual vectors. By regularising the cross-attention of a Transformer encoder-decoder with NVIB, we propose a nonparametric variational autoencoder (NVAE). Initial experiments on training a NVAE on natural language text show that the induced embedding space has the desired properties of a VAE for Transformers. Comment: 33 pages, 10 figures, 3 tables. First time this work has been made public
Keywords	Computer Science - Machine Learning ; Computer Science - Computation and Language
Subject code	310
Publishing date	2022-07-27
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: HyperMixer

Mai, Florian / Pannatier, Arnaud / Fehr, Fabio / Chen, Haolin / Marelli, Francois / Fleuret, Francois / Henderson, James

An MLP-based Low Cost Alternative to Transformers

2022

Abstract: Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the ... ...

Abstract	Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning. Comment: Published at ACL 2023
Keywords	Computer Science - Computation and Language ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
Subject code	006
Publishing date	2022-03-07
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Your last searches

Search results

Search options

Book ; Online: Nonparametric Variational Regularisation of Pretrained Transformers

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: HyperMixer

Full text online

More links

Kategorien

Inter-library loan at ZB MED