LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 28

Search options

  1. Article ; Online: Generative artificial intelligence for de novo protein design.

    Winnifrith, Adam / Outeiral, Carlos / Hie, Brian L

    Current opinion in structural biology

    2024  Volume 86, Page(s) 102794

    Abstract: Engineering new molecules with desirable functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called 'de novo' design problem have recently been brought forward by ...

    Abstract Engineering new molecules with desirable functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called 'de novo' design problem have recently been brought forward by developments in artificial intelligence. Generative architectures, such as language models and diffusion processes, seem adept at generating novel, yet realistic proteins that display desirable properties and perform specified functions. State-of-the-art design protocols now achieve experimental success rates nearing 20%, thus widening the access to de novo designed proteins. Despite extensive progress, there are clear field-wide challenges, for example, in determining the best in silico metrics to prioritise designs for experimental testing, and in designing proteins that can undergo large conformational changes or be regulated by post-translational modifications. With an increase in the number of models being developed, this review provides a framework to understand how these tools fit into the overall process of de novo protein design. Throughout, we highlight the power of incorporating biochemical knowledge to improve performance and interpretability.
    Language English
    Publishing date 2024-04-24
    Publishing country England
    Document type Journal Article ; Review
    ZDB-ID 1068353-7
    ISSN 1879-033X ; 0959-440X
    ISSN (online) 1879-033X
    ISSN 0959-440X
    DOI 10.1016/j.sbi.2024.102794
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Book ; Online: Data for "Learning the language of viral evolution and escape"

    Brian Hie

    2020  

    Abstract: Training data from: Influenza A HA protein sequences from the NIAID Influenza Research Database (IRD) (http://www.fludb.org) HIV-1 Env protein sequences from the Los Alamos National Laboratory (LANL) HIV database (https://www.hiv.lanl.gov) Coronavidae ... ...

    Abstract Training data from: Influenza A HA protein sequences from the NIAID Influenza Research Database (IRD) (http://www.fludb.org) HIV-1 Env protein sequences from the Los Alamos National Laboratory (LANL) HIV database (https://www.hiv.lanl.gov) Coronavidae spike protein sequences from the Virus Pathogen Resource (ViPR) database (https://www.viprbrc.org/brc/home.spg?decorator=corona) SARS-CoV-2 Spike protein sequences from NCBI Virus (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/) SARS-CoV-2 Spike and other Betacoronavirus spike protein sequences from GISAID (https://www.gisaid.org/) Datasets for fitness and escape validation: Fitness single-residue DMS of HA H1 WSN33 from Doud and Bloom (2016) Fitness combinatorial DMS of antigenic site B in six HA H3 strains from Wu et al. (2020) Fitness single-residue DMS of Env BF520 and BG505 from Haddox et al. (2018) ACE2 binding affinity combinatorial DMS of Spike from Starr et al. (2020) Escape single-residue DMS of HA H1 WSN33 from Doud et al. (2018) Escape single-residue DMS of HA H3 Perth09 from Lee et al. (2019) Escape single-residue DMS of Env BG505 from Dingens et al. (2019) Escape mutations of Spike from Baum et al. (2020) Escape single-residue DMS of Spike from Greaney et al. (2020)
    Keywords covid19
    Publishing date 2020-09-14
    Publishing country eu
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Adaptive machine learning for protein engineering.

    Hie, Brian L / Yang, Kevin K

    Current opinion in structural biology

    2021  Volume 72, Page(s) 145–152

    Abstract: Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatorial ... ...

    Abstract Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatorial complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.
    MeSH term(s) Amino Acid Sequence ; Machine Learning ; Protein Engineering ; Proteins
    Chemical Substances Proteins
    Language English
    Publishing date 2021-12-09
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Review
    ZDB-ID 1068353-7
    ISSN 1879-033X ; 0959-440X
    ISSN (online) 1879-033X
    ISSN 0959-440X
    DOI 10.1016/j.sbi.2021.11.002
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Large language models for science and medicine.

    Telenti, Amalio / Auli, Michael / Hie, Brian L / Maher, Cyrus / Saria, Suchi / Ioannidis, John P A

    European journal of clinical investigation

    2024  , Page(s) e14183

    Abstract: Large language models (LLMs) are a type of machine learning model that learn statistical patterns over text, such as predicting the next words in a sequence of text. Both general purpose and task-specific LLMs have demonstrated potential across diverse ... ...

    Abstract Large language models (LLMs) are a type of machine learning model that learn statistical patterns over text, such as predicting the next words in a sequence of text. Both general purpose and task-specific LLMs have demonstrated potential across diverse applications. Science and medicine have many data types that are highly suitable for LLMs, such as scientific texts (publications, patents and textbooks), electronic medical records, large databases of DNA and protein sequences and chemical compounds. Carefully validated systems that can understand and reason across all these modalities may maximize benefits. Despite the inevitable limitations and caveats of any new technology and some uncertainties specific to LLMs, LLMs have the potential to be transformative in science and medicine.
    Language English
    Publishing date 2024-02-21
    Publishing country England
    Document type Journal Article ; Review
    ZDB-ID 186196-7
    ISSN 1365-2362 ; 0014-2972 ; 0960-135X
    ISSN (online) 1365-2362
    ISSN 0014-2972 ; 0960-135X
    DOI 10.1111/eci.14183
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins.

    Hie, Brian L / Yang, Kevin K / Kim, Peter S

    Cell systems

    2022  Volume 13, Issue 4, Page(s) 274–285.e6

    Abstract: The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we ... ...

    Abstract The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we demonstrate that by using a protein language model to predict the local evolution within protein families, we recover a dynamic "vector field" of protein evolution that we call evolutionary velocity (evo-velocity). Evo-velocity generalizes to evolution over vastly different timescales, from viral proteins evolving over years to eukaryotic proteins evolving over geologic eons, and can predict the evolutionary dynamics of proteins that were not used to develop the original model. Evo-velocity also yields new evolutionary insights by predicting strategies of viral-host immune escape, resolving conflicting theories on the evolution of serpins, and revealing a key role of horizontal gene transfer in the evolution of eukaryotic glycolysis.
    MeSH term(s) Amino Acid Sequence ; Evolution, Molecular ; Language ; Mutation/genetics ; Proteins/genetics
    Chemical Substances Proteins
    Language English
    Publishing date 2022-02-03
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2854138-8
    ISSN 2405-4720 ; 2405-4712
    ISSN (online) 2405-4720
    ISSN 2405-4712
    DOI 10.1016/j.cels.2022.01.003
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Book ; Online: Generative artificial intelligence for de novo protein design

    Winnifrith, Adam / Outeiral, Carlos / Hie, Brian

    2023  

    Abstract: Engineering new molecules with desirable functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called "de novo" design problem have recently been brought forward by ...

    Abstract Engineering new molecules with desirable functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called "de novo" design problem have recently been brought forward by developments in artificial intelligence. Generative architectures, such as language models and diffusion processes, seem adept at generating novel, yet realistic proteins that display desirable properties and perform specified functions. State-of-the-art design protocols now achieve experimental success rates nearing 20%, thus widening the access to de novo designed proteins. Despite extensive progress, there are clear field-wide challenges, for example in determining the best in silico metrics to prioritise designs for experimental testing, and in designing proteins that can undergo large conformational changes or be regulated by post-translational modifications and other cellular processes. With an increase in the number of models being developed, this review provides a framework to understand how these tools fit into the overall process of de novo protein design. Throughout, we highlight the power of incorporating biochemical knowledge to improve performance and interpretability.

    Comment: 32 pages, 5 figures, 1 table
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence ; Quantitative Biology - Biomolecules
    Subject code 006
    Publishing date 2023-10-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: Machine Learning for Protein Engineering.

    Johnston, Kadina E / Fannjiang, Clara / Wittmann, Bruce J / Hie, Brian L / Yang, Kevin K / Wu, Zachary

    ArXiv

    2023  

    Abstract: Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training ...

    Abstract Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training of machine learning models on protein sequence fitness data. This chapter highlights successful applications of machine learning to protein engineering and directed evolution, organized by the improvements that have been made with respect to each step of the directed evolution cycle. Additionally, we provide an outlook for the future based on the current direction of the field, namely in the development of calibrated models and in incorporating other modalities, such as protein structure.
    Language English
    Publishing date 2023-05-26
    Publishing country United States
    Document type Preprint
    ISSN 2331-8422
    ISSN (online) 2331-8422
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Inverse folding of protein complexes with a structure-informed language model enables unsupervised antibody evolution.

    Shanker, Varun R / Bruun, Theodora U J / Hie, Brian L / Kim, Peter S

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Large language models trained on sequence information alone are capable of learning high level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and ... ...

    Abstract Large language models trained on sequence information alone are capable of learning high level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here we show that a general protein language model augmented with protein structure backbone coordinates and trained on the inverse folding problem can guide evolution for diverse proteins without needing to explicitly model individual functional tasks. We demonstrate inverse folding to be an effective unsupervised, structure-based sequence optimization strategy that also generalizes to multimeric complexes by implicitly learning features of binding and amino acid epistasis. Using this approach, we screened ~30 variants of two therapeutic clinical antibodies used to treat SARS-CoV-2 infection and achieved up to 26-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants-of-concern BQ.1.1 and XBB.1.5, respectively. In addition to substantial overall improvements in protein function, we find inverse folding performs with leading experimental success rates among other reported machine learning-guided directed evolution methods, without requiring any task-specific training data.
    Language English
    Publishing date 2023-12-21
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.12.19.572475
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design.

    Hie, Brian / Bryson, Bryan D / Berger, Bonnie

    Cell systems

    2020  Volume 11, Issue 5, Page(s) 461–477.e9

    Abstract: Machine learning that generates biological hypotheses has transformative potential, but most learning algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution to address this issue is to ... ...

    Abstract Machine learning that generates biological hypotheses has transformative potential, but most learning algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution to address this issue is to quantify prediction uncertainty so that algorithms can gracefully handle novel phenomena that confound standard methods. Here, we demonstrate the broad utility of robust uncertainty prediction in biological discovery. By leveraging Gaussian process-based uncertainty prediction on modern pre-trained features, we train a model on just 72 compounds to make predictions over a 10,833-compound library, identifying and experimentally validating compounds with nanomolar affinity for diverse kinases and whole-cell growth inhibition of Mycobacterium tuberculosis. Uncertainty facilitates a tight iterative loop between computation and experimentation and generalizes across biological domains as diverse as protein engineering and single-cell transcriptomics. More broadly, our work demonstrates that uncertainty should play a key role in the increasing adoption of machine learning algorithms into the experimental lifecycle.
    MeSH term(s) Algorithms ; Computational Biology/methods ; Forecasting/methods ; Machine Learning/trends ; Normal Distribution ; Uncertainty
    Language English
    Publishing date 2020-10-15
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2854138-8
    ISSN 2405-4720 ; 2405-4712
    ISSN (online) 2405-4720
    ISSN 2405-4712
    DOI 10.1016/j.cels.2020.09.007
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities.

    Singh, Rohit / Hie, Brian L / Narayan, Ashwin / Berger, Bonnie

    Genome biology

    2021  Volume 22, Issue 1, Page(s) 131

    Abstract: A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay ... ...

    Abstract A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay multiple modalities simultaneously. We present Schema, which uses a principled metric learning strategy that identifies informative features in a modality to synthesize disparate modalities into a single coherent interpretation. We use Schema to infer cell types by integrating gene expression and chromatin accessibility data; demonstrate informative data visualizations that synthesize multiple modalities; perform differential gene expression analysis in the context of spatial variability; and estimate evolutionary pressure on peptide sequences.
    MeSH term(s) Chromatin/genetics ; Chromatin/metabolism ; Chromatin Assembly and Disassembly ; Computational Biology/methods ; Gene Expression Profiling/methods ; Gene Expression Regulation ; Machine Learning ; Organ Specificity/genetics ; Single-Cell Analysis/methods ; Transcriptome
    Chemical Substances Chromatin
    Language English
    Publishing date 2021-05-03
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/s13059-021-02313-2
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top