LIVIVO - Das Suchportal für Lebenswissenschaften

switch to English language
Erweiterte Suche

Ihre letzten Suchen

  1. AU="Meulemans, Alexander"
  2. AU="Dolhnikoff, Marisa" AU="Dolhnikoff, Marisa"
  3. AU="Teng, M"
  4. AU="Misran, Misni"
  5. AU="Hoebeke, P"
  6. AU="Mahajan, Raman"
  7. AU=Dutta Venkatesh
  8. AU="Rivera, Alexis"
  9. AU="Shaghayegh Tarani"
  10. AU="Miener, T"
  11. AU="Barker, Jenny C"
  12. AU="Lorimer, D. R."
  13. AU="Peh, Kelvin S-H"
  14. AU="Hossein Safarpour"
  15. AU="Hall, Frances"
  16. AU="Weckmann, U."
  17. AU="Martínez-Sáez, O"
  18. AU="dos Santos, Alejandra Filippo Gonzalez Neves"
  19. AU="Beverly Castillo Herrera"
  20. AU="Fatin Izzati Abdul Hadi"
  21. AU="Musinguzi, Nicholas"
  22. AU=Lee Edward Y
  23. AU="Raval, Urdhva"
  24. AU="Senn, L Kirsten"
  25. AU="Matsutani, Noriyuki"
  26. AU="Bernstein, Herbert J"
  27. AU="Elisa Impresari"
  28. AU="Feldman, Noa"
  29. AU="Dhingra, Mandeep Singh"

Suchergebnis

Treffer 1 - 8 von insgesamt 8

Suchoptionen

  1. Buch ; Online: Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

    Meulemans, Alexander / Schug, Simon / Kobayashi, Seijin / Daw, Nathaniel / Wayne, Gregory

    2023  

    Abstract: To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA) ...

    Abstract To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards, by quantifying a counterfactual query: 'Would the agent still have reached this reward if it had taken another action?'. We show that measuring contributions w.r.t. rewarding states, as is done in HCA, results in spurious estimates of contributions, causing HCA to degrade towards the high-variance REINFORCE estimator in many relevant environments. Instead, we measure contributions w.r.t. rewards or learned representations of the rewarding objects, resulting in gradient estimates with lower variance. We run experiments on a suite of problems specifically designed to evaluate long-term credit assignment capabilities. By using dynamic programming, we measure ground-truth policy gradients and show that the improved performance of our new model-based credit assignment methods is due to lower bias and variance compared to HCA and common baselines. Our results demonstrate how modeling action contributions towards rewarding outcomes can be leveraged for credit assignment, opening a new path towards sample-efficient reinforcement learning.

    Comment: NeurIPS 2023 spotlight
    Schlagwörter Computer Science - Machine Learning ; Statistics - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2023-06-29
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  2. Buch ; Online: Challenges for Using Impact Regularizers to Avoid Negative Side Effects

    Lindner, David / Matoba, Kyle / Meulemans, Alexander

    2021  

    Abstract: Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, ...

    Abstract Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers.
    Schlagwörter Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Thema/Rubrik (Code) 306
    Erscheinungsdatum 2021-01-29
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  3. Buch ; Online: The least-control principle for local learning at equilibrium

    Meulemans, Alexander / Zucchet, Nicolas / Kobayashi, Seijin / von Oswald, Johannes / Sacramento, João

    2022  

    Abstract: Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep ... ...

    Abstract Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning. Here, we present a new principle for learning such systems with a temporally- and spatially-local rule. Our principle casts learning as a least-control problem, where we first introduce an optimal controller to lead the system towards a solution state, and then define learning as reducing the amount of control needed to reach such a state. We show that incorporating learning signals within a dynamics as an optimal control enables transmitting activity-dependent credit assignment information, avoids storing intermediate states in memory, and does not rely on infinitesimal learning signals. In practice, our principle leads to strong performance matching that of leading gradient-based learning methods when applied to an array of problems involving recurrent neural networks and meta-learning. Our results shed light on how the brain might learn and offer new ways of approaching a broad class of machine learning problems.

    Comment: Published at NeurIPS 2022. 56 pages
    Schlagwörter Computer Science - Machine Learning ; Computer Science - Neural and Evolutionary Computing ; 68T07 ; I.2.6
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2022-07-04
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  4. Buch ; Online: Minimizing Control for Credit Assignment with Strong Feedback

    Meulemans, Alexander / Farinha, Matilde Tristany / Cervera, Maria R. / Sacramento, João / Grewe, Benjamin F.

    2022  

    Abstract: The success of deep learning ignited interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need ... ...

    Abstract The success of deep learning ignited interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neuroscience showing that top-down feedback can significantly influence neural activity. Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. Instead of gradually changing the network weights towards configurations with low output loss, weight updates gradually minimize the amount of feedback required from a controller that drives the network to the supervised output label. Moreover, we show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using learning rules fully local in space and time. We complement our theoretical results with experiments on standard computer-vision benchmarks, showing competitive performance to backpropagation as well as robustness to noise. Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions.

    Comment: 26 pages, 4 figures
    Schlagwörter Computer Science - Neural and Evolutionary Computing ; Computer Science - Machine Learning ; 68T07 ; I.2.6
    Thema/Rubrik (Code) 629 ; 006
    Erscheinungsdatum 2022-04-14
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  5. Buch ; Online: Credit Assignment in Neural Networks through Deep Feedback Control

    Meulemans, Alexander / Farinha, Matilde Tristany / Ordóñez, Javier García / Aceituno, Pau Vilimelis / Sacramento, João / Grewe, Benjamin F.

    2021  

    Abstract: The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output. However, the majority of current attempts at biologically- ... ...

    Abstract The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output. However, the majority of current attempts at biologically-plausible learning methods are either non-local in time, require highly specific connectivity motives, or have no clear link to any known mathematical optimization method. Here, we introduce Deep Feedback Control (DFC), a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment. The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of feedback connectivity patterns. To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing. By combining dynamical system theory with mathematical optimization theory, we provide a strong theoretical foundation for DFC that we corroborate with detailed results on toy experiments and standard computer-vision benchmarks.

    Comment: 14 pages and 4 figures in the main manuscript; 49 pages and 15 figures in the supplementary materials
    Schlagwörter Computer Science - Machine Learning ; 68T07 ; I.2.6
    Thema/Rubrik (Code) 629
    Erscheinungsdatum 2021-06-15
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  6. Buch ; Online: Neural networks with late-phase weights

    von Oswald, Johannes / Kobayashi, Seijin / Meulemans, Alexander / Henning, Christian / Grewe, Benjamin F. / Sacramento, João

    2020  

    Abstract: The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in ... ...

    Abstract The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space. To avoid incurring increased computational costs, we investigate a family of low-dimensional late-phase weight models which interact multiplicatively with the remaining parameters. Our results show that augmenting standard models with late-phase weights improves generalization in established benchmarks such as CIFAR-10/100, ImageNet and enwik8. These findings are complemented with a theoretical analysis of a noisy quadratic problem which provides a simplified picture of the late phases of neural network learning.

    Comment: 25 pages, 6 figures
    Schlagwörter Computer Science - Machine Learning ; Computer Science - Computer Vision and Pattern Recognition ; Statistics - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2020-07-25
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  7. Buch ; Online: A Theoretical Framework for Target Propagation

    Meulemans, Alexander / Carzaniga, Francesco S. / Suykens, Johan A. K. / Sacramento, João / Grewe, Benjamin F.

    2020  

    Abstract: The success of deep learning, a brain-inspired form of AI, has sparked interest in understanding how the brain could similarly learn across multiple layers of neurons. However, the majority of biologically-plausible learning algorithms have not yet ... ...

    Abstract The success of deep learning, a brain-inspired form of AI, has sparked interest in understanding how the brain could similarly learn across multiple layers of neurons. However, the majority of biologically-plausible learning algorithms have not yet reached the performance of backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), a popular but not yet fully understood alternative to BP, from the standpoint of mathematical optimization. Our theory shows that TP is closely related to Gauss-Newton optimization and thus substantially differs from BP. Furthermore, our analysis reveals a fundamental limitation of difference target propagation (DTP), a well-known variant of TP, in the realistic scenario of non-invertible neural networks. We provide a first solution to this problem through a novel reconstruction loss that improves feedback weight training, while simultaneously introducing architectural flexibility by allowing for direct feedback connections from the output to each hidden layer. Our theory is corroborated by experimental results that show significant improvements in performance and in the alignment of forward weight updates with loss gradients, compared to DTP.

    Comment: 13 pages and 4 figures in main manuscript; 41 pages and 8 figures in supplementary material
    Schlagwörter Computer Science - Machine Learning ; Statistics - Machine Learning ; 68T07
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2020-06-25
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

  8. Buch ; Online: Continual Learning in Recurrent Neural Networks

    Ehret, Benjamin / Henning, Christian / Cervera, Maria R. / Meulemans, Alexander / von Oswald, Johannes / Grewe, Benjamin F.

    2020  

    Abstract: While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we ... ...

    Abstract While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. In contrast to feedforward networks, RNNs iteratively reuse a shared set of weights and require working memory to process input samples. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements, which lead to an increased need for stability at the cost of decreased plasticity for learning subsequent tasks. We additionally provide theoretical arguments supporting this interpretation by studying linear RNNs. Our study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data.

    Comment: Published at ICLR 2021
    Schlagwörter Computer Science - Machine Learning ; Statistics - Machine Learning
    Thema/Rubrik (Code) 006
    Erscheinungsdatum 2020-06-22
    Erscheinungsland us
    Dokumenttyp Buch ; Online
    Datenquelle BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

    Zusatzmaterialien

    Kategorien

Zum Seitenanfang