LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 14

Search options

  1. Book ; Online: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

    Parisotto, Emilio / Salakhutdinov, Ruslan

    2021  

    Abstract: Many real-world applications such as robotics provide hard constraints on power and compute that limit the viable model complexity of Reinforcement Learning (RL) agents. Similarly, in many distributed RL settings, acting is done on un-accelerated ... ...

    Abstract Many real-world applications such as robotics provide hard constraints on power and compute that limit the viable model complexity of Reinforcement Learning (RL) agents. Similarly, in many distributed RL settings, acting is done on un-accelerated hardware such as CPUs, which likewise restricts model size to prevent intractable experiment run times. These "actor-latency" constrained settings present a major obstruction to the scaling up of model complexity that has recently been extremely successful in supervised learning. To be able to utilize large model capacity while still operating within the limits imposed by the system during acting, we develop an "Actor-Learner Distillation" (ALD) procedure that leverages a continual form of distillation that transfers learning progress from a large capacity learner model to a small capacity actor model. As a case study, we develop this procedure in the context of partially-observable environments, where transformer models have had large improvements over LSTMs recently, at the cost of significantly higher computational complexity. With transformer models as the learner and LSTMs as the actor, we demonstrate in several challenging memory environments that using Actor-Learner Distillation recovers the clear sample-efficiency gains of the transformer learner model while maintaining the fast inference and reduced total training time of the LSTM actor model.

    Comment: Published at ICLR 2021
    Keywords Computer Science - Machine Learning
    Subject code 004
    Publishing date 2021-04-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Structured State Space Models for In-Context Reinforcement Learning

    Lu, Chris / Schroecker, Yannick / Gu, Albert / Parisotto, Emilio / Foerster, Jakob / Singh, Satinder / Behbahani, Feryal

    2023  

    Abstract: Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many ... ...

    Abstract Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task. We evaluate our modified architecture on a set of partially-observable environments and find that, in practice, our model outperforms RNN's while also running over five times faster. Then, by leveraging the model's ability to handle long-range sequences, we achieve strong performance on a challenging meta-learning task in which the agent is given a randomly-sampled continuous control environment, combined with a randomly-sampled linear projection of the environment's observations and actions. Furthermore, we show the resulting model can adapt to out-of-distribution held-out tasks. Overall, the results presented in this paper show that structured state space models are fast and performant for in-context reinforcement learning tasks. We provide code at https://github.com/luchris429/popjaxrl.
    Keywords Computer Science - Machine Learning
    Subject code 006
    Publishing date 2023-03-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: A Generalist Dynamics Model for Control

    Schubert, Ingmar / Zhang, Jingwei / Bruce, Jake / Bechtle, Sarah / Parisotto, Emilio / Riedmiller, Martin / Springenberg, Jost Tobias / Byravan, Arunkumar / Hasenclever, Leonard / Heess, Nicolas

    2023  

    Abstract: We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small ... ...

    Abstract We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist TDM is applied to an unseen environment without any further training. Here, we demonstrate that generalizing system dynamics can work much better than generalizing optimal behavior directly as a policy. Additional results show that TDMs also perform well in a single-environment learning setting when compared to a number of baseline models. These properties make TDMs a promising ingredient for a foundation model of control.
    Keywords Computer Science - Artificial Intelligence ; Computer Science - Robotics
    Publishing date 2023-05-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: On Proximal Policy Optimization's Heavy-tailed Gradients

    Garg, Saurabh / Zhanson, Joshua / Parisotto, Emilio / Prasad, Adarsh / Kolter, J. Zico / Lipton, Zachary C. / Balakrishnan, Sivaraman / Salakhutdinov, Ruslan / Ravikumar, Pradeep

    2021  

    Abstract: Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust ... ...

    Abstract Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function. We demonstrate that the gradients, especially for the actor network, exhibit pronounced heavy-tailedness and that it increases as the agent's policy diverges from the behavioral policy (i.e., as the agent goes further off policy). Further examination implicates the likelihood ratios and advantages in the surrogate reward as the main sources of the observed heavy-tailedness. We then highlight issues arising due to the heavy-tailed nature of the gradients. In this light, we study the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavy-tailedness in gradients. Thus motivated, we propose incorporating GMOM, a high-dimensional robust estimator, into PPO as a substitute for three clipping tricks. Despite requiring less hyperparameter tuning, our method matches the performance of PPO (with all heuristics enabled) on a battery of MuJoCo continuous control tasks.

    Comment: ICML 2021
    Keywords Computer Science - Machine Learning ; Computer Science - Robotics ; Statistics - Machine Learning
    Publishing date 2021-02-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Concurrent Meta Reinforcement Learning

    Parisotto, Emilio / Ghosh, Soham / Yalamanchi, Sai Bhargav / Chinnaobireddy, Varsha / Wu, Yuhuai / Salakhutdinov, Ruslan

    2019  

    Abstract: State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment ... ...

    Abstract State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment becomes more and more challenging, and thus requiring more interaction episodes for the meta-learner, it needs the agent to reason over longer and longer time-scales. To combat the difficulty of long time-scale credit assignment, we propose an alternative parallel framework, which we name "Concurrent Meta-Reinforcement Learning" (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one. In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other. The goal of the communication is to coordinate, in a collaborative manner, the most efficient exploration of the shared task the agents are currently assigned. This coordination therefore represents the meta-learning aspect of the framework, as each agent can be assigned or assign itself a particular section of the current task's state space. This framework is in contrast to standard RL methods that assume that each parallel rollout occurs independently, which can potentially waste computation if many of the rollouts end up sampling the same part of the state space. Furthermore, the parallel setting enables us to define several reward sharing functions and auxiliary losses that are non-trivial to apply in the sequential setting. We demonstrate the effectiveness of our proposed CMRL at improving over sequential methods in a variety of challenging tasks.
    Keywords Computer Science - Artificial Intelligence
    Subject code 006
    Publishing date 2019-03-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Efficient Exploration via State Marginal Matching

    Lee, Lisa / Eysenbach, Benjamin / Parisotto, Emilio / Xing, Eric / Levine, Sergey / Salakhutdinov, Ruslan

    2019  

    Abstract: Exploration is critical to a reinforcement learning agent's performance in its given environment. Prior exploration methods are often based on using heuristic auxiliary predictions to guide policy behavior, lacking a mathematically-grounded objective ... ...

    Abstract Exploration is critical to a reinforcement learning agent's performance in its given environment. Prior exploration methods are often based on using heuristic auxiliary predictions to guide policy behavior, lacking a mathematically-grounded objective with clear properties. In contrast, we recast exploration as a problem of State Marginal Matching (SMM), where we aim to learn a policy for which the state marginal distribution matches a given target state distribution. The target distribution is a uniform distribution in most cases, but can incorporate prior knowledge if available. In effect, SMM amortizes the cost of learning to explore in a given environment. The SMM objective can be viewed as a two-player, zero-sum game between a state density model and a parametric policy, an idea that we use to build an algorithm for optimizing the SMM objective. Using this formalism, we further demonstrate that prior work approximately maximizes the SMM objective, offering an explanation for the success of these methods. On both simulated and real-world tasks, we demonstrate that agents that directly optimize the SMM objective explore faster and adapt more quickly to new tasks as compared to prior exploration methods.

    Comment: Videos and code: https://sites.google.com/view/state-marginal-matching
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence ; Computer Science - Robotics ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2019-06-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Actor-Mimic

    Parisotto, Emilio / Ba, Jimmy Lei / Salakhutdinov, Ruslan

    Deep Multitask and Transfer Reinforcement Learning

    2015  

    Abstract: The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent. Towards this goal, we define a novel method of multitask and transfer learning that enables an ... ...

    Abstract The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent. Towards this goal, we define a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains. This method, termed "Actor-Mimic", exploits the use of deep reinforcement learning and model compression techniques to train a single policy network that learns how to act in a set of distinct tasks by using the guidance of several expert teachers. We then show that the representations learnt by the deep policy network are capable of generalizing to new tasks with no prior expert guidance, speeding up learning in novel environments. Although our method can in general be applied to a wide range of problems, we use Atari games as a testing environment to demonstrate these methods.

    Comment: Accepted as a conference paper at ICLR 2016
    Keywords Computer Science - Machine Learning
    Subject code 629
    Publishing date 2015-11-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Shaking the foundations

    Ortega, Pedro A. / Kunesch, Markus / Delétang, Grégoire / Genewein, Tim / Grau-Moya, Jordi / Veness, Joel / Buchli, Jonas / Degrave, Jonas / Piot, Bilal / Perolat, Julien / Everitt, Tom / Tallec, Corentin / Parisotto, Emilio / Erez, Tom / Chen, Yutian / Reed, Scott / Hutter, Marcus / de Freitas, Nando / Legg, Shane

    delusions in sequence models for interaction and control

    2021  

    Abstract: The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive ... ...

    Abstract The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

    Comment: DeepMind Tech Report, 16 pages, 4 figures
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Publishing date 2021-10-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: A Generalist Agent

    Reed, Scott / Zolna, Konrad / Parisotto, Emilio / Colmenarejo, Sergio Gomez / Novikov, Alexander / Barth-Maron, Gabriel / Gimenez, Mai / Sulsky, Yury / Kay, Jackie / Springenberg, Jost Tobias / Eccles, Tom / Bruce, Jake / Razavi, Ali / Edwards, Ashley / Heess, Nicolas / Chen, Yutian / Hadsell, Raia / Vinyals, Oriol / Bordbar, Mahyar /
    de Freitas, Nando

    2022  

    Abstract: Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment ... ...

    Abstract Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
    Keywords Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Robotics
    Publishing date 2022-05-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Imitate and Repurpose

    Bohez, Steven / Tunyasuvunakool, Saran / Brakel, Philemon / Sadeghi, Fereshteh / Hasenclever, Leonard / Tassa, Yuval / Parisotto, Emilio / Humplik, Jan / Haarnoja, Tuomas / Hafner, Roland / Wulfmeier, Markus / Neunert, Michael / Moran, Ben / Siegel, Noah / Huber, Andrea / Romano, Francesco / Batchelor, Nathan / Casarini, Federico / Merel, Josh /
    Hadsell, Raia / Heess, Nicolas

    Learning Reusable Robot Movement Skills From Human and Animal Behaviors

    2022  

    Abstract: We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill ... ...

    Abstract We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer. Accompanying videos are available at https://bit.ly/robot-npmp.

    Comment: 30 pages, 9 figures, 8 tables, 14 videos at https://bit.ly/robot-npmp , submitted to Science Robotics
    Keywords Computer Science - Robotics ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 629
    Publishing date 2022-03-31
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top