LIVIVO - Search results -

Search results

Result 1 - 10 of total 17

Search options

Book ; Online: Accelerating Large Language Model Decoding with Speculative Sampling

Chen, Charlie / Borgeaud, Sebastian / Irving, Geoffrey / Lespiau, Jean-Baptiste / Sifre, Laurent / Jumper, John

2023

Abstract: We present speculative sampling, an algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call. Our algorithm relies on the observation that the latency of parallel scoring of short ... ...

Abstract	We present speculative sampling, an algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call. Our algorithm relies on the observation that the latency of parallel scoring of short continuations, generated by a faster but less powerful draft model, is comparable to that of sampling a single token from the larger target model. This is combined with a novel modified rejection sampling scheme which preserves the distribution of the target model within hardware numerics. We benchmark speculative sampling with Chinchilla, a 70 billion parameter language model, achieving a 2-2.5x decoding speedup in a distributed setup, without compromising the sample quality or making modifications to the model itself.
Keywords	Computer Science - Computation and Language
Publishing date	2023-02-02
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Large-Scale Retrieval for Reinforcement Learning

Humphreys, Peter C. / Guez, Arthur / Tieleman, Olivier / Sifre, Laurent / Weber, Théophane / Lillicrap, Timothy

2022

Abstract: Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making ... ...

Abstract	Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making into its network weights via gradient descent on training losses. Here, we pursue an alternative approach in which agents can utilise large-scale context sensitive database lookups to support their parametric computations. This allows agents to directly learn in an end-to-end manner to utilise relevant information to inform their outputs. In addition, new information can be attended to by the agent, without retraining, by simply augmenting the retrieval dataset. We study this approach for offline RL in 9x9 Go, a challenging game for which the vast combinatorial state space privileges generalisation over direct matching to past experiences. We leverage fast, approximate nearest neighbor techniques in order to retrieve relevant data from a set of tens of millions of expert demonstration states. Attending to this information provides a significant boost to prediction accuracy and game-play performance over simply using these demonstrations as training trajectories, providing a compelling demonstration of the value of large-scale retrieval in offline RL agents. Comment: Thirty-sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022), 16 pages
Keywords	Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
Subject code	006
Publishing date	2022-06-10
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Machine Translation Decoding beyond Beam Search

Leblond, Rémi / Alayrac, Jean-Baptiste / Sifre, Laurent / Pislar, Miruna / Lespiau, Jean-Baptiste / Antonoglou, Ioannis / Simonyan, Karen / Vinyals, Oriol

2021

Abstract: Beam search is the go-to method for decoding auto-regressive machine translation models. While it yields consistent improvements in terms of BLEU, it is only concerned with finding outputs with high model likelihood, and is thus agnostic to whatever end ... ...

Abstract	Beam search is the go-to method for decoding auto-regressive machine translation models. While it yields consistent improvements in terms of BLEU, it is only concerned with finding outputs with high model likelihood, and is thus agnostic to whatever end metric or score practitioners care about. Our aim is to establish whether beam search can be replaced by a more powerful metric-driven search technique. To this end, we explore numerous decoding algorithms, including some which rely on a value function parameterised by a neural network, and report results on a variety of metrics. Notably, we introduce a Monte-Carlo Tree Search (MCTS) based method and showcase its competitiveness. We provide a blueprint for how to use MCTS fruitfully in language applications, which opens promising future directions. We find that which algorithm is best heavily depends on the characteristics of the goal metric; we believe that our extensive experiments and analysis will inform further research in this area. Comment: 23 pages
Keywords	Computer Science - Computation and Language ; Computer Science - Machine Learning
Subject code	006
Publishing date	2021-04-12
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Muesli

Hessel, Matteo / Danihelka, Ivo / Viola, Fabio / Guez, Arthur / Schmitt, Simon / Sifre, Laurent / Weber, Theophane / Silver, David / van Hasselt, Hado

Combining Improvements in Policy Optimization

2021

Abstract: We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep ... ...

Abstract	We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
Keywords	Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
Publishing date	2021-04-13
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Mastering Atari, Go, chess and shogi by planning with a learned model.

Schrittwieser, Julian / Antonoglou, Ioannis / Hubert, Thomas / Simonyan, Karen / Sifre, Laurent / Schmitt, Simon / Guez, Arthur / Lockhart, Edward / Hassabis, Demis / Graepel, Thore / Lillicrap, Timothy / Silver, David

Nature

2020 Volume 588, Issue 7839, Page(s) 604–609

Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as ... ...

Abstract	Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess
Language	English
Publishing date	2020-12-23
Publishing country	England
Document type	Journal Article
ZDB-ID	120714-3
ISSN	1476-4687 ; 0028-0836
ISSN (online)	1476-4687
ISSN	0028-0836
DOI	10.1038/s41586-020-03051-4
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 26: Show issues			Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (1.OG) ab Jg. 2022: Lesesaal (EG)
Zs.MG 9: Show issues
Zs.MO 244: Show issues

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Details ▾
- See ZB MED holdings
- Order with fees

Book ; Online: Retrieval-Augmented Reinforcement Learning

Goyal, Anirudh / Friesen, Abram L. / Banino, Andrea / Weber, Theophane / Ke, Nan Rosemary / Badia, Adria Puigdomenech / Guez, Arthur / Mirza, Mehdi / Humphreys, Peter C. / Konyushkova, Ksenia / Sifre, Laurent / Valko, Michal / Osindero, Simon / Lillicrap, Timothy / Heess, Nicolas / Blundell, Charles

2022

Abstract: Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take ... ...

Abstract	Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. he proposed method facilitates learning agents that at test-time can condition their behavior on the entire dataset and not only the current state, or current trajectory. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.
Keywords	Computer Science - Machine Learning
Subject code	006
Publishing date	2022-02-16
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Improved protein structure prediction using potentials from deep learning.

Senior, Andrew W / Evans, Richard / Jumper, John / Kirkpatrick, James / Sifre, Laurent / Green, Tim / Qin, Chongli / Žídek, Augustin / Nelson, Alexander W R / Bridgland, Alex / Penedones, Hugo / Petersen, Stig / Simonyan, Karen / Crossan, Steve / Kohli, Pushmeet / Jones, David T / Silver, David / Kavukcuoglu, Koray / Hassabis, Demis

Nature

2020 Volume 577, Issue 7792, Page(s) 706–710

Abstract: Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid ... ...

Abstract	Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence
MeSH term(s)	Amino Acid Sequence ; Caspases/chemistry ; Caspases/genetics ; Datasets as Topic ; Deep Learning ; Models, Molecular ; Protein Conformation ; Protein Folding ; Proteins/chemistry ; Proteins/genetics ; Software
Chemical Substances	Proteins ; Caspases (EC 3.4.22.-) ; caspase 13 (EC 3.4.22.-)
Language	English
Publishing date	2020-01-15
Publishing country	England
Document type	Journal Article
ZDB-ID	120714-3
ISSN	1476-4687 ; 0028-0836
ISSN (online)	1476-4687
ISSN	0028-0836
DOI	10.1038/s41586-019-1923-7
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 26: Show issues			Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (1.OG) ab Jg. 2022: Lesesaal (EG)
Zs.MG 9: Show issues
Zs.MO 244: Show issues

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

Article ; Online: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Silver, David / Hubert, Thomas / Schrittwieser, Julian / Antonoglou, Ioannis / Lai, Matthew / Guez, Arthur / Lanctot, Marc / Sifre, Laurent / Kumaran, Dharshan / Graepel, Thore / Lillicrap, Timothy / Simonyan, Karen / Hassabis, Demis

Science (New York, N.Y.)

2018 Volume 362, Issue 6419, Page(s) 1140–1144

Abstract: The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have ... ...

Abstract	The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.
MeSH term(s)	Algorithms ; Artificial Intelligence ; Humans ; Reinforcement (Psychology) ; Software ; Video Games
Language	English
Publishing date	2018-12-06
Publishing country	United States
Document type	Journal Article ; Research Support, Non-U.S. Gov't
ZDB-ID	128410-1
ISSN	1095-9203 ; 0036-8075
ISSN (online)	1095-9203
ISSN	0036-8075
DOI	10.1126/science.aar6404
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 27: Show issues			Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (1.OG) ab Jg. 2022: Lesesaal (EG)
Zs.MG 77: Show issues

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

Article ; Online: Mastering the game of Stratego with model-free multiagent reinforcement learning.

Perolat, Julien / De Vylder, Bart / Hennes, Daniel / Tarassov, Eugene / Strub, Florian / de Boer, Vincent / Muller, Paul / Connor, Jerome T / Burch, Neil / Anthony, Thomas / McAleer, Stephen / Elie, Romuald / Cen, Sarah H / Wang, Zhe / Gruslys, Audrunas / Malysheva, Aleksandra / Khan, Mina / Ozair, Sherjil / Timbers, Finbarr /

Pohlen, Toby / Eccles, Tom / Rowland, Mark / Lanctot, Marc / Lespiau, Jean-Baptiste / Piot, Bilal / Omidshafiei, Shayegan / Lockhart, Edward / Sifre, Laurent / Beauguerlange, Nathalie / Munos, Remi / Silver, David / Singh, Satinder / Hassabis, Demis / Tuyls, Karl

Science (New York, N.Y.)

2022 Volume 378, Issue 6623, Page(s) 990–996

Abstract: We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a ... ...

Abstract	We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.
MeSH term(s)	Humans ; Artificial Intelligence ; Reinforcement, Psychology ; Learning ; Acetates
Chemical Substances	Stratego ; Acetates
Language	English
Publishing date	2022-12-01
Publishing country	United States
Document type	Journal Article ; Research Support, Non-U.S. Gov't
ZDB-ID	128410-1
ISSN	1095-9203 ; 0036-8075
ISSN (online)	1095-9203
ISSN	0036-8075
DOI	10.1126/science.add4679
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 27: Show issues			Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (1.OG) ab Jg. 2022: Lesesaal (EG)
Zs.MG 77: Show issues

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

Book ; Online: Training Compute-Optimal Large Language Models

2022

Abstract: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling ... ...

Abstract	We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.
Keywords	Computer Science - Computation and Language ; Computer Science - Machine Learning
Subject code	006
Publishing date	2022-03-29
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED