LIVIVO - Search results -

Search results

Result 1 - 5 of total 5

Search options

Book ; Online: Gradient-Based Language Model Red Teaming

Wichers, Nevan / Denison, Carson / Beirami, Ahmad

2024

Abstract: Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses. Red teaming is instrumental for both model alignment and evaluation, ... ...

Abstract	Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses. Red teaming is instrumental for both model alignment and evaluation, but is labor-intensive and difficult to scale when done by humans. In this paper, we present Gradient-Based Red Teaming (GBRT), a red teaming method for automatically generating diverse prompts that are likely to cause an LM to output unsafe responses. GBRT is a form of prompt learning, trained by scoring an LM response with a safety classifier and then backpropagating through the frozen safety classifier and LM to update the prompt. To improve the coherence of input prompts, we introduce two variants that add a realism loss and fine-tune a pretrained model to generate the prompts instead of learning the prompts directly. Our experiments show that GBRT is more effective at finding prompts that trigger an LM to generate unsafe responses than a strong reinforcement learning-based red teaming approach, and succeeds even when the LM has been fine-tuned to produce safer outputs. Comment: EACL 2024 main conference
Keywords	Computer Science - Computation and Language
Subject code	629
Publishing date	2024-01-29
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Population structure across scales facilitates coexistence and spatial heterogeneity of antibiotic-resistant infections.

Krieger, Madison S / Denison, Carson E / Anderson, Thayer L / Nowak, Martin A / Hill, Alison L

PLoS computational biology

2020 Volume 16, Issue 7, Page(s) e1008010

Abstract: Antibiotic-resistant infections are a growing threat to human health, but basic features of the eco-evolutionary dynamics remain unexplained. Most prominently, there is no clear mechanism for the long-term coexistence of both drug-sensitive and resistant ...

Abstract	Antibiotic-resistant infections are a growing threat to human health, but basic features of the eco-evolutionary dynamics remain unexplained. Most prominently, there is no clear mechanism for the long-term coexistence of both drug-sensitive and resistant strains at intermediate levels, a ubiquitous pattern seen in surveillance data. Here we show that accounting for structured or spatially-heterogeneous host populations and variability in antibiotic consumption can lead to persistent coexistence over a wide range of treatment coverages, drug efficacies, costs of resistance, and mixing patterns. Moreover, this mechanism can explain other puzzling spatiotemporal features of drug-resistance epidemiology that have received less attention, such as large differences in the prevalence of resistance between geographical regions with similar antibiotic consumption or that neighbor one another. We find that the same amount of antibiotic use can lead to very different levels of resistance depending on how treatment is distributed in a transmission network. We also identify parameter regimes in which population structure alone cannot support coexistence, suggesting the need for other mechanisms to explain the epidemiology of antibiotic resistance. Our analysis identifies key features of host population structure that can be used to assess resistance risk and highlights the need to include spatial or demographic heterogeneity in models to guide resistance management.
MeSH term(s)	Algorithms ; Anti-Bacterial Agents/pharmacology ; Drug Resistance, Bacterial ; Evolution, Molecular ; Genetics, Population ; Geography ; Humans ; Models, Theoretical ; Prevalence ; Regression Analysis ; Risk ; Spain/epidemiology ; Streptococcal Infections/epidemiology ; Streptococcal Infections/microbiology ; Streptococcus pneumoniae/drug effects ; Streptococcus pneumoniae/genetics
Chemical Substances	Anti-Bacterial Agents
Language	English
Publishing date	2020-07-06
Publishing country	United States
Document type	Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
ZDB-ID	2193340-6
ISSN	1553-7358 ; 1553-734X
ISSN (online)	1553-7358
ISSN	1553-734X
DOI	10.1371/journal.pcbi.1008010
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Book ; Online: How to DP-fy ML

Ponomareva, Natalia / Hazimeh, Hussein / Kurakin, Alex / Xu, Zheng / Denison, Carson / McMahan, H. Brendan / Vassilvitskii, Sergei / Chien, Steve / Thakurta, Abhradeep

A Practical Guide to Machine Learning with Differential Privacy

2023

Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold ... ...

Abstract	ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are "safe" to use with DP. This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We include theory-focused sections that highlight important topics such as privacy accounting and its assumptions, and convergence. For a practitioner, we provide a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, and so we propose a set of specific best practices for stating guarantees.
Keywords	Computer Science - Machine Learning ; Computer Science - Cryptography and Security ; Statistics - Machine Learning
Subject code	330
Publishing date	2023-03-01
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Private Ad Modeling with DP-SGD

Denison, Carson / Ghazi, Badih / Kamath, Pritish / Kumar, Ravi / Manurangsi, Pasin / Narra, Krishna Giri / Sinha, Amer / Varadarajan, Avinash V / Zhang, Chiyuan

2022

Abstract: A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has not been previously applied to ads data, which are notorious for their ... ...

Abstract	A well-known algorithm in privacy-preserving ML is differentially private stochastic gradient descent (DP-SGD). While this algorithm has been evaluated on text and image data, it has not been previously applied to ads data, which are notorious for their high class imbalance and sparse gradient updates. In this work we apply DP-SGD to several ad modeling tasks including predicting click-through rates, conversion rates, and number of conversion events, and evaluate their privacy-utility trade-off on real-world datasets. Our work is the first to empirically demonstrate that DP-SGD can provide both privacy and utility for ad modeling tasks. Comment: AdKDD 2023, 8 pages, 5 figures
Keywords	Computer Science - Machine Learning ; Computer Science - Cryptography and Security
Publishing date	2022-11-21
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Sleeper Agents

Hubinger, Evan / Denison, Carson / Mu, Jesse / Lambert, Mike / Tong, Meg / MacDiarmid, Monte / Lanham, Tamera / Ziegler, Daniel M. / Maxwell, Tim / Cheng, Newton / Jermyn, Adam / Askell, Amanda / Radhakrishnan, Ansh / Anil, Cem / Duvenaud, David / Ganguli, Deep / Barez, Fazl / Clark, Jack / Ndousse, Kamal /

Training Deceptive LLMs that Persist Through Safety Training

2024

Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, ... ...

Abstract	Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety. Comment: updated to add missing acknowledgements
Keywords	Computer Science - Cryptography and Security ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Software Engineering
Subject code	501
Publishing date	2024-01-10
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Search results

Search options

Book ; Online: Gradient-Based Language Model Red Teaming

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Article ; Online: Population structure across scales facilitates coexistence and spatial heterogeneity of antibiotic-resistant infections.

More links

Kategorien

Order via subito

Book ; Online: How to DP-fy ML

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Private Ad Modeling with DP-SGD

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Sleeper Agents

Full text online

More links

Kategorien

Inter-library loan at ZB MED