LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 5 of total 5

Search options

  1. Book ; Online: Gradient-Based Language Model Red Teaming

    Wichers, Nevan / Denison, Carson / Beirami, Ahmad

    2024  

    Abstract: Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses. Red teaming is instrumental for both model alignment and evaluation, ... ...

    Abstract Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses. Red teaming is instrumental for both model alignment and evaluation, but is labor-intensive and difficult to scale when done by humans. In this paper, we present Gradient-Based Red Teaming (GBRT), a red teaming method for automatically generating diverse prompts that are likely to cause an LM to output unsafe responses. GBRT is a form of prompt learning, trained by scoring an LM response with a safety classifier and then backpropagating through the frozen safety classifier and LM to update the prompt. To improve the coherence of input prompts, we introduce two variants that add a realism loss and fine-tune a pretrained model to generate the prompts instead of learning the prompts directly. Our experiments show that GBRT is more effective at finding prompts that trigger an LM to generate unsafe responses than a strong reinforcement learning-based red teaming approach, and succeeds even when the LM has been fine-tuned to produce safer outputs.

    Comment: EACL 2024 main conference
    Keywords Computer Science - Computation and Language
    Subject code 629
    Publishing date 2024-01-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Population structure across scales facilitates coexistence and spatial heterogeneity of antibiotic-resistant infections.

    Krieger, Madison S / Denison, Carson E / Anderson, Thayer L / Nowak, Martin A / Hill, Alison L

    PLoS computational biology

    2020  Volume 16, Issue 7, Page(s) e1008010

    Abstract: Antibiotic-resistant infections are a growing threat to human health, but basic features of the eco-evolutionary dynamics remain unexplained. Most prominently, there is no clear mechanism for the long-term coexistence of both drug-sensitive and resistant ...

    Abstract Antibiotic-resistant infections are a growing threat to human health, but basic features of the eco-evolutionary dynamics remain unexplained. Most prominently, there is no clear mechanism for the long-term coexistence of both drug-sensitive and resistant strains at intermediate levels, a ubiquitous pattern seen in surveillance data. Here we show that accounting for structured or spatially-heterogeneous host populations and variability in antibiotic consumption can lead to persistent coexistence over a wide range of treatment coverages, drug efficacies, costs of resistance, and mixing patterns. Moreover, this mechanism can explain other puzzling spatiotemporal features of drug-resistance epidemiology that have received less attention, such as large differences in the prevalence of resistance between geographical regions with similar antibiotic consumption or that neighbor one another. We find that the same amount of antibiotic use can lead to very different levels of resistance depending on how treatment is distributed in a transmission network. We also identify parameter regimes in which population structure alone cannot support coexistence, suggesting the need for other mechanisms to explain the epidemiology of antibiotic resistance. Our analysis identifies key features of host population structure that can be used to assess resistance risk and highlights the need to include spatial or demographic heterogeneity in models to guide resistance management.
    MeSH term(s) Algorithms ; Anti-Bacterial Agents/pharmacology ; Drug Resistance, Bacterial ; Evolution, Molecular ; Genetics, Population ; Geography ; Humans ; Models, Theoretical ; Prevalence ; Regression Analysis ; Risk ; Spain/epidemiology ; Streptococcal Infections/epidemiology ; Streptococcal Infections/microbiology ; Streptococcus pneumoniae/drug effects ; Streptococcus pneumoniae/genetics
    Chemical Substances Anti-Bacterial Agents
    Language English
    Publishing date 2020-07-06
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2193340-6
    ISSN 1553-7358 ; 1553-734X
    ISSN (online) 1553-7358
    ISSN 1553-734X
    DOI 10.1371/journal.pcbi.1008010
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: How to DP-fy ML

    Ponomareva, Natalia / Hazimeh, Hussein / Kurakin, Alex / Xu, Zheng / Denison, Carson / McMahan, H. Brendan / Vassilvitskii, Sergei / Chien, Steve / Thakurta, Abhradeep

    A Practical Guide to Machine Learning with Differential Privacy

    2023  

    Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold ... ...

    Abstract ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are "safe" to use with DP. This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We include theory-focused sections that highlight important topics such as privacy accounting and its assumptions, and convergence. For a practitioner, we provide a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, and so we propose a set of specific best practices for stating guarantees.
    Keywords Computer Science - Machine Learning ; Computer Science - Cryptography and Security ; Statistics - Machine Learning
    Subject code 330
    Publishing date 2023-03-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Private Ad Modeling with DP-SGD

    Denison, Carson / Ghazi, Badih / Kamath, Pritish / Kumar, Ravi / Manurangsi, Pasin / Narra, Krishna Giri / Sinha, Amer / Varadarajan, Avinash V / Zhang, Chiyuan

    2022  

    Abstract: A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has not been previously applied to ads data, which are notorious for their ... ...

    Abstract A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has not been previously applied to ads data, which are notorious for their high class imbalance and sparse gradient updates. In this work we apply DP-SGD to several ad modeling tasks including predicting click-through rates, conversion rates, and number of conversion events, and evaluate their privacy-utility trade-off on real-world datasets. Our work is the first to empirically demonstrate that DP-SGD can provide both privacy and utility for ad modeling tasks.

    Comment: AdKDD 2023, 8 pages, 5 figures
    Keywords Computer Science - Machine Learning ; Computer Science - Cryptography and Security
    Publishing date 2022-11-21
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Sleeper Agents

    Hubinger, Evan / Denison, Carson / Mu, Jesse / Lambert, Mike / Tong, Meg / MacDiarmid, Monte / Lanham, Tamera / Ziegler, Daniel M. / Maxwell, Tim / Cheng, Newton / Jermyn, Adam / Askell, Amanda / Radhakrishnan, Ansh / Anil, Cem / Duvenaud, David / Ganguli, Deep / Barez, Fazl / Clark, Jack / Ndousse, Kamal /
    Sachan, Kshitij / Sellitto, Michael / Sharma, Mrinank / DasSarma, Nova / Grosse, Roger / Kravec, Shauna / Bai, Yuntao / Witten, Zachary / Favaro, Marina / Brauner, Jan / Karnofsky, Holden / Christiano, Paul / Bowman, Samuel R. / Graham, Logan / Kaplan, Jared / Mindermann, Sören / Greenblatt, Ryan / Shlegeris, Buck / Schiefer, Nicholas / Perez, Ethan

    Training Deceptive LLMs that Persist Through Safety Training

    2024  

    Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, ... ...

    Abstract Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

    Comment: updated to add missing acknowledgements
    Keywords Computer Science - Cryptography and Security ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Software Engineering
    Subject code 501
    Publishing date 2024-01-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top