LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 17

Search options

  1. Book ; Online: Accelerating Large Language Model Decoding with Speculative Sampling

    Chen, Charlie / Borgeaud, Sebastian / Irving, Geoffrey / Lespiau, Jean-Baptiste / Sifre, Laurent / Jumper, John

    2023  

    Abstract: We present speculative sampling, an algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call. Our algorithm relies on the observation that the latency of parallel scoring of short ... ...

    Abstract We present speculative sampling, an algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call. Our algorithm relies on the observation that the latency of parallel scoring of short continuations, generated by a faster but less powerful draft model, is comparable to that of sampling a single token from the larger target model. This is combined with a novel modified rejection sampling scheme which preserves the distribution of the target model within hardware numerics. We benchmark speculative sampling with Chinchilla, a 70 billion parameter language model, achieving a 2-2.5x decoding speedup in a distributed setup, without compromising the sample quality or making modifications to the model itself.
    Keywords Computer Science - Computation and Language
    Publishing date 2023-02-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Large-Scale Retrieval for Reinforcement Learning

    Humphreys, Peter C. / Guez, Arthur / Tieleman, Olivier / Sifre, Laurent / Weber, Théophane / Lillicrap, Timothy

    2022  

    Abstract: Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making ... ...

    Abstract Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making into its network weights via gradient descent on training losses. Here, we pursue an alternative approach in which agents can utilise large-scale context sensitive database lookups to support their parametric computations. This allows agents to directly learn in an end-to-end manner to utilise relevant information to inform their outputs. In addition, new information can be attended to by the agent, without retraining, by simply augmenting the retrieval dataset. We study this approach for offline RL in 9x9 Go, a challenging game for which the vast combinatorial state space privileges generalisation over direct matching to past experiences. We leverage fast, approximate nearest neighbor techniques in order to retrieve relevant data from a set of tens of millions of expert demonstration states. Attending to this information provides a significant boost to prediction accuracy and game-play performance over simply using these demonstrations as training trajectories, providing a compelling demonstration of the value of large-scale retrieval in offline RL agents.

    Comment: Thirty-sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022), 16 pages
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Subject code 006
    Publishing date 2022-06-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Machine Translation Decoding beyond Beam Search

    Leblond, Rémi / Alayrac, Jean-Baptiste / Sifre, Laurent / Pislar, Miruna / Lespiau, Jean-Baptiste / Antonoglou, Ioannis / Simonyan, Karen / Vinyals, Oriol

    2021  

    Abstract: Beam search is the go-to method for decoding auto-regressive machine translation models. While it yields consistent improvements in terms of BLEU, it is only concerned with finding outputs with high model likelihood, and is thus agnostic to whatever end ... ...

    Abstract Beam search is the go-to method for decoding auto-regressive machine translation models. While it yields consistent improvements in terms of BLEU, it is only concerned with finding outputs with high model likelihood, and is thus agnostic to whatever end metric or score practitioners care about. Our aim is to establish whether beam search can be replaced by a more powerful metric-driven search technique. To this end, we explore numerous decoding algorithms, including some which rely on a value function parameterised by a neural network, and report results on a variety of metrics. Notably, we introduce a Monte-Carlo Tree Search (MCTS) based method and showcase its competitiveness. We provide a blueprint for how to use MCTS fruitfully in language applications, which opens promising future directions. We find that which algorithm is best heavily depends on the characteristics of the goal metric; we believe that our extensive experiments and analysis will inform further research in this area.

    Comment: 23 pages
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2021-04-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Muesli

    Hessel, Matteo / Danihelka, Ivo / Viola, Fabio / Guez, Arthur / Schmitt, Simon / Sifre, Laurent / Weber, Theophane / Silver, David / van Hasselt, Hado

    Combining Improvements in Policy Optimization

    2021  

    Abstract: We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep ... ...

    Abstract We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Publishing date 2021-04-13
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: Mastering Atari, Go, chess and shogi by planning with a learned model.

    Schrittwieser, Julian / Antonoglou, Ioannis / Hubert, Thomas / Simonyan, Karen / Sifre, Laurent / Schmitt, Simon / Guez, Arthur / Lockhart, Edward / Hassabis, Demis / Graepel, Thore / Lillicrap, Timothy / Silver, David

    Nature

    2020  Volume 588, Issue 7839, Page(s) 604–609

    Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as ... ...

    Abstract Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess
    Language English
    Publishing date 2020-12-23
    Publishing country England
    Document type Journal Article
    ZDB-ID 120714-3
    ISSN 1476-4687 ; 0028-0836
    ISSN (online) 1476-4687
    ISSN 0028-0836
    DOI 10.1038/s41586-020-03051-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Book ; Online: Retrieval-Augmented Reinforcement Learning

    Goyal, Anirudh / Friesen, Abram L. / Banino, Andrea / Weber, Theophane / Ke, Nan Rosemary / Badia, Adria Puigdomenech / Guez, Arthur / Mirza, Mehdi / Humphreys, Peter C. / Konyushkova, Ksenia / Sifre, Laurent / Valko, Michal / Osindero, Simon / Lillicrap, Timothy / Heess, Nicolas / Blundell, Charles

    2022  

    Abstract: Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take ... ...

    Abstract Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. he proposed method facilitates learning agents that at test-time can condition their behavior on the entire dataset and not only the current state, or current trajectory. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.
    Keywords Computer Science - Machine Learning
    Subject code 006
    Publishing date 2022-02-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: Improved protein structure prediction using potentials from deep learning.

    Senior, Andrew W / Evans, Richard / Jumper, John / Kirkpatrick, James / Sifre, Laurent / Green, Tim / Qin, Chongli / Žídek, Augustin / Nelson, Alexander W R / Bridgland, Alex / Penedones, Hugo / Petersen, Stig / Simonyan, Karen / Crossan, Steve / Kohli, Pushmeet / Jones, David T / Silver, David / Kavukcuoglu, Koray / Hassabis, Demis

    Nature

    2020  Volume 577, Issue 7792, Page(s) 706–710

    Abstract: Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid ... ...

    Abstract Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence
    MeSH term(s) Amino Acid Sequence ; Caspases/chemistry ; Caspases/genetics ; Datasets as Topic ; Deep Learning ; Models, Molecular ; Protein Conformation ; Protein Folding ; Proteins/chemistry ; Proteins/genetics ; Software
    Chemical Substances Proteins ; Caspases (EC 3.4.22.-) ; caspase 13 (EC 3.4.22.-)
    Language English
    Publishing date 2020-01-15
    Publishing country England
    Document type Journal Article
    ZDB-ID 120714-3
    ISSN 1476-4687 ; 0028-0836
    ISSN (online) 1476-4687
    ISSN 0028-0836
    DOI 10.1038/s41586-019-1923-7
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

    Silver, David / Hubert, Thomas / Schrittwieser, Julian / Antonoglou, Ioannis / Lai, Matthew / Guez, Arthur / Lanctot, Marc / Sifre, Laurent / Kumaran, Dharshan / Graepel, Thore / Lillicrap, Timothy / Simonyan, Karen / Hassabis, Demis

    Science (New York, N.Y.)

    2018  Volume 362, Issue 6419, Page(s) 1140–1144

    Abstract: The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have ... ...

    Abstract The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.
    MeSH term(s) Algorithms ; Artificial Intelligence ; Humans ; Reinforcement (Psychology) ; Software ; Video Games
    Language English
    Publishing date 2018-12-06
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 128410-1
    ISSN 1095-9203 ; 0036-8075
    ISSN (online) 1095-9203
    ISSN 0036-8075
    DOI 10.1126/science.aar6404
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Mastering the game of Stratego with model-free multiagent reinforcement learning.

    Perolat, Julien / De Vylder, Bart / Hennes, Daniel / Tarassov, Eugene / Strub, Florian / de Boer, Vincent / Muller, Paul / Connor, Jerome T / Burch, Neil / Anthony, Thomas / McAleer, Stephen / Elie, Romuald / Cen, Sarah H / Wang, Zhe / Gruslys, Audrunas / Malysheva, Aleksandra / Khan, Mina / Ozair, Sherjil / Timbers, Finbarr /
    Pohlen, Toby / Eccles, Tom / Rowland, Mark / Lanctot, Marc / Lespiau, Jean-Baptiste / Piot, Bilal / Omidshafiei, Shayegan / Lockhart, Edward / Sifre, Laurent / Beauguerlange, Nathalie / Munos, Remi / Silver, David / Singh, Satinder / Hassabis, Demis / Tuyls, Karl

    Science (New York, N.Y.)

    2022  Volume 378, Issue 6623, Page(s) 990–996

    Abstract: We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a ... ...

    Abstract We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.
    MeSH term(s) Humans ; Artificial Intelligence ; Reinforcement, Psychology ; Learning ; Acetates
    Chemical Substances Stratego ; Acetates
    Language English
    Publishing date 2022-12-01
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 128410-1
    ISSN 1095-9203 ; 0036-8075
    ISSN (online) 1095-9203
    ISSN 0036-8075
    DOI 10.1126/science.add4679
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Book ; Online: Training Compute-Optimal Large Language Models

    Hoffmann, Jordan / Borgeaud, Sebastian / Mensch, Arthur / Buchatskaya, Elena / Cai, Trevor / Rutherford, Eliza / Casas, Diego de Las / Hendricks, Lisa Anne / Welbl, Johannes / Clark, Aidan / Hennigan, Tom / Noland, Eric / Millican, Katie / Driessche, George van den / Damoc, Bogdan / Guy, Aurelia / Osindero, Simon / Simonyan, Karen / Elsen, Erich /
    Rae, Jack W. / Vinyals, Oriol / Sifre, Laurent

    2022  

    Abstract: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling ... ...

    Abstract We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2022-03-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top