LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 169

Search options

  1. Article ; Online: Robust Losses for Learning Value Functions.

    Patterson, Andrew / Liao, Victor / White, Martha

    IEEE transactions on pattern analysis and machine intelligence

    2023  Volume 45, Issue 5, Page(s) 6157–6167

    Abstract: Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high- ... ...

    Abstract Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clipping gradients, clipping rewards, rescaling rewards, or clipping errors. While these strategies appear to be related to robust losses-like the Huber loss-they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings. We characterize the solutions of the robust losses, providing insight into the problem settings where the robust losses define notably better solutions than the mean squared Bellman error. Finally, we show that the resulting gradient-based algorithms are more stable, for both prediction and control, with less sensitivity to meta-parameters.
    Language English
    Publishing date 2023-04-03
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2022.3213503
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Neighborhood risk and prenatal care utilization in Rhode Island, 2005-2014.

    Habtemariam, Helena / Schlichting, Lauren E / Kole-White, Martha B / Berger, Blythe / Vivier, Patrick

    Birth (Berkeley, Calif.)

    2024  

    Abstract: Background: The importance of prenatal care is undeniable, as pregnant persons who receive on-time, adequate prenatal care have better maternal and infant health outcomes compared with those receiving late, less than adequate prenatal care. Previous ... ...

    Abstract Background: The importance of prenatal care is undeniable, as pregnant persons who receive on-time, adequate prenatal care have better maternal and infant health outcomes compared with those receiving late, less than adequate prenatal care. Previous studies assessing the relationship between neighborhood factors and maternal health outcomes have typically looked at singular neighborhood variables and their relationship with maternal health outcomes. In order to examine a greater number of place-based risk factors simultaneously, our analysis used a unique neighborhood risk index to assess the association between cumulative risk and prenatal care utilization, which no other studies have done.
    Methods: Data from Rhode Island Vital Statistics for births between 2005 and 2014 were used to assess the relationship between neighborhood risk and prenatal care utilization using two established indices. We assessed neighborhood risk with an index composed of eight socioeconomic block-group variables. A multivariate logistic regression model was used to examine the association between adequate use and neighborhood risk.
    Results: Individuals living in a high-risk neighborhood were less likely to have adequate or better prenatal care utilization according to both the APNCU Index (adjusted odds ratio [aOR] 0.91, 95% confidence interval [CI] 0.87-0.95) and the R-GINDEX (aOR 0.88, 95% CI 0.85-0.91) compared with those in low-risk neighborhoods.
    Conclusion: Understanding the impact of neighborhood-level factors on prenatal care use is a critical first step in ensuring that underserved neighborhoods are prioritized in interventions aimed at making access to prenatal care more equitable.
    Language English
    Publishing date 2024-01-11
    Publishing country United States
    Document type Journal Article
    ZDB-ID 604869-9
    ISSN 1523-536X ; 0730-7659
    ISSN (online) 1523-536X
    ISSN 0730-7659
    DOI 10.1111/birt.12810
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Guidance for compassionate restraint of small children to prevent injuries with epinephrine autoinjectors.

    White, Martha V

    Allergy and asthma proceedings

    2017  Volume 39, Issue 2, Page(s) 161–165

    Abstract: Background: Without securing a child properly, injuries can happen with the use of pediatric epinephrine autoinjectors (EAI), and lacerations and embedded needles have been reported. Health care providers should ensure that instruction is provided to ... ...

    Abstract Background: Without securing a child properly, injuries can happen with the use of pediatric epinephrine autoinjectors (EAI), and lacerations and embedded needles have been reported. Health care providers should ensure that instruction is provided to parents on how to hold a child during an injection with an EAI.
    Objective: To demonstrate the compassionate restraint of small children during an allergic emergency to ensure the safe use of an EAI.
    Methods: A patient was used to illustrate a compassionate restraint technique during a mock injection with an EAI.
    Results: One possible technique was illustrated here to reinforce the need for complete, yet compassionate restraint of small children during the use of an EAI. The exact position intended to be used by parents or caregivers will need to be practiced with their children to ensure a safe injection in the event of an allergic emergency.
    Conclusion: Reinforcement of proper EAI use and visual guidance that illustrate compassionate restraint can potentially prevent EAI-related injuries.
    MeSH term(s) Anaphylaxis/drug therapy ; Caregivers ; Child ; Child, Preschool ; Empathy ; Epinephrine/adverse effects ; Epinephrine/therapeutic use ; Female ; Humans ; Injections ; Male ; Parents ; Restraint, Physical/methods ; Self Administration ; Surveys and Questionnaires ; Wounds and Injuries/etiology ; Wounds and Injuries/prevention & control
    Chemical Substances Epinephrine (YKH834O4BH)
    Language English
    Publishing date 2017-11-29
    Publishing country United States
    Document type Journal Article
    ZDB-ID 1312445-6
    ISSN 1539-6304 ; 1088-5412
    ISSN (online) 1539-6304
    ISSN 1088-5412
    DOI 10.2500/aap.2018.39.4110
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

    Liu, Vincent / Chandak, Yash / Thomas, Philip / White, Martha

    2023  

    Abstract: In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical for policy evaluation, but existing estimators that reuse old ... ...

    Abstract In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical for policy evaluation, but existing estimators that reuse old data introduce large bias such that we can not obtain a valid confidence interval. Inspired from a related field called survey sampling, we introduce a variant of the doubly robust (DR) estimator, called the regression-assisted DR estimator, that can incorporate the past data without introducing a large bias. The estimator unifies several existing off-policy policy evaluation methods and improves on them with the use of auxiliary information and a regression approach. We prove that the new estimator is asymptotically unbiased, and provide a consistent variance estimator to a construct a large sample confidence interval. Finally, we empirically show that the new estimator improves estimation for the current and future policy values, and provides a tight and valid interval estimation in several nonstationary recommendation environments.

    Comment: AISTATS 2023
    Keywords Computer Science - Machine Learning
    Subject code 310
    Publishing date 2023-02-22
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Empirical Design in Reinforcement Learning

    Patterson, Andrew / Neumann, Samuel / White, Martha / White, Adam

    2023  

    Abstract: Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the ... ...

    Abstract Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the scale of typical experiments in reinforcement learning. It is now common to benchmark agents with millions of parameters against dozens of tasks, each using the equivalent of 30 days of experience. The scale of these experiments often conflict with the need for proper statistical evidence, especially when comparing algorithms. Recent studies have highlighted how popular algorithms are sensitive to hyper-parameter settings and implementation details, and that common empirical practice leads to weak statistical evidence (Machado et al., 2018; Henderson et al., 2018). Here we take this one step further. This manuscript represents both a call to action, and a comprehensive resource for how to do good experiments in reinforcement learning. In particular, we cover: the statistical assumptions underlying common performance measures, how to properly characterize performance variation and stability, hypothesis testing, special considerations for comparing multiple agents, baseline and illustrative example construction, and how to deal with hyper-parameters and experimenter bias. Throughout we highlight common mistakes found in the literature and the statistical consequences of those in example experiments. The objective of this document is to provide answers on how we can use our unprecedented compute to do good science in reinforcement learning, as well as stay alert to potential pitfalls in our empirical design.

    Comment: In submission to JMLR
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Subject code 006
    Publishing date 2023-04-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Scalable Real-Time Recurrent Learning Using Sparse Connections and Selective Learning

    Javed, Khurram / Shah, Haseeb / Sutton, Rich / White, Martha

    2023  

    Abstract: State construction from sensory observations is an important component of a reinforcement learning agent. One solution for state construction is to use recurrent neural networks. Back-propagation through time (BPTT), and real-time recurrent learning ( ... ...

    Abstract State construction from sensory observations is an important component of a reinforcement learning agent. One solution for state construction is to use recurrent neural networks. Back-propagation through time (BPTT), and real-time recurrent learning (RTRL) are two popular gradient-based methods for recurrent learning. BPTT requires the complete sequence of observations before computing gradients and is unsuitable for online real-time updates. RTRL can do online updates but scales poorly to large networks. In this paper, we propose two constraints that make RTRL scalable. We show that by either decomposing the network into independent modules, or learning the network incrementally, we can make RTRL scale linearly with the number of parameters. Unlike prior scalable gradient estimation algorithms, such as UORO and Truncated-BPTT, our algorithms do not add noise or bias to the gradient estimate. Instead, they trade-off the functional capacity of the network to achieve scalable learning. We demonstrate the effectiveness of our approach over Truncated-BPTT on a benchmark inspired by animal learning and by doing policy evaluation for pre-trained Rainbow-DQN agents in the Arcade Learning Environment (ALE).

    Comment: Scalable recurrent learning, Online learning, RTRL, Cascade correlation networks, Agent-state construction
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Subject code 006
    Publishing date 2023-01-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

    Zhu, Lingwei / Chen, Zheng / Schlegel, Matthew / White, Martha

    2023  

    Abstract: Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative ... ...

    Abstract Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence -- called the Tsallis KL divergence -- which use the $q$-logarithm in the definition. The approach is a strict generalization, as $q = 1$ corresponds to the standard KL divergence; $q > 1$ provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when $q >1$ could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches to incorporate KL regularization. We show that this generalized MVI($q$) obtains significant improvements over the standard MVI($q = 1$) across 35 Atari games.

    Comment: Accepted by NeurIPS 2023
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Subject code 006
    Publishing date 2023-01-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

    Liu, Vincent / Nagarajan, Prabhat / Patterson, Andrew / White, Martha

    2023  

    Abstract: Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of ... ...

    Abstract Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connecting OPS to off-policy policy evaluation (OPE) and Bellman error (BE) estimation. We first show a hardness result, that in the worst case, OPS is just as hard as OPE, by proving a reduction of OPE to OPS. As a result, no OPS method can be more sample efficient than OPE in the worst case. We then propose a BE method for OPS, called Identifiable BE Selection (IBES), that has a straightforward method for selecting its own hyperparameters. We highlight that using IBES for OPS generally has more requirements than OPE methods, but if satisfied, can be more sample efficient. We conclude with an empirical study comparing OPE and IBES, and by showing the difficulty of OPS on an offline Atari benchmark dataset.
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence
    Subject code 004
    Publishing date 2023-12-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Unifying task specification in reinforcement learning

    White, Martha

    2016  

    Abstract: Reinforcement learning tasks are typically specified as Markov decision processes. This formalism has been highly successful, though specifications often couple the dynamics of the environment and the learning objective. This lack of modularity can ... ...

    Abstract Reinforcement learning tasks are typically specified as Markov decision processes. This formalism has been highly successful, though specifications often couple the dynamics of the environment and the learning objective. This lack of modularity can complicate generalization of the task specification, as well as obfuscate connections between different task settings, such as episodic and continuing. In this work, we introduce the RL task formalism, that provides a unification through simple constructs including a generalization to transition-based discounting. Through a series of examples, we demonstrate the generality and utility of this formalism. Finally, we extend standard learning constructs, including Bellman operators, and extend some seminal theoretical results, including approximation errors bounds. Overall, we provide a well-understood and sound formalism on which to build theoretical results and simplify algorithm use and development.

    Comment: Published at the International Conference on Machine Learning, 2017. This version includes minor typo and error fixes
    Keywords Computer Science - Artificial Intelligence
    Subject code 006
    Publishing date 2016-09-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

    Daley, Brett / White, Martha / Amato, Christopher / Machado, Marlos C.

    2023  

    Abstract: Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past ... ...

    Abstract Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-policy algorithms rely on this mechanism, along with differing protocols for cutting the IS ratios to combat the variance of the IS estimator. Unfortunately, once a trace has been fully cut, the effect cannot be reversed. This has led to the development of credit-assignment strategies that account for multiple past experiences at a time. These trajectory-aware methods have not been extensively analyzed, and their theoretical justification remains uncertain. In this paper, we propose a multistep operator that can express both per-decision and trajectory-aware methods. We prove convergence conditions for our operator in the tabular setting, establishing the first guarantees for several existing methods as well as many new ones. Finally, we introduce Recency-Bounded Importance Sampling (RBIS), which leverages trajectory awareness to perform robustly across $\lambda$-values in an off-policy control task.

    Comment: ICML 2023. 8 pages, 2 figures. arXiv admin note: text overlap with arXiv:2112.12281
    Keywords Computer Science - Machine Learning
    Publishing date 2023-01-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top