LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 129

Search options

  1. Book ; Online: Training Data Attribution for Diffusion Models

    Dai, Zheng / Gifford, David K

    2023  

    Abstract: Diffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact the samples ... ...

    Abstract Diffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact the samples produced by a trained diffusion model. The difficulty of relating diffusion model inputs and outputs poses significant challenges to model explainability and training data attribution. Here we propose a novel solution that reveals how training data influence the output of diffusion models through the use of ensembles. In our approach individual models in an encoded ensemble are trained on carefully engineered splits of the overall training data to permit the identification of influential training examples. The resulting model ensembles enable efficient ablation of training data influence, allowing us to assess the impact of training data on model outputs. We demonstrate the viability of these ensembles as generative models and the validity of our approach to assessing influence.

    Comment: 14 pages, 6 figures
    Keywords Statistics - Machine Learning ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2023-06-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Discovering differential genome sequence activity with interpretable and efficient deep learning.

    Hammelman, Jennifer / Gifford, David K

    PLoS computational biology

    2021  Volume 17, Issue 8, Page(s) e1009282

    Abstract: Discovering sequence features that differentially direct cells to alternate fates is key to understanding both cellular development and the consequences of disease related mutations. We introduce Expected Pattern Effect and Differential Expected Pattern ... ...

    Abstract Discovering sequence features that differentially direct cells to alternate fates is key to understanding both cellular development and the consequences of disease related mutations. We introduce Expected Pattern Effect and Differential Expected Pattern Effect, two black-box methods that can interpret genome regulatory sequences for cell type-specific or condition specific patterns. We show that these methods identify relevant transcription factor motifs and spacings that are predictive of cell state-specific chromatin accessibility. Finally, we integrate these methods into framework that is readily accessible to non-experts and available for download as a binary or installed via PyPI or bioconda at https://cgs.csail.mit.edu/deepaccess-package/.
    MeSH term(s) Deep Learning ; Genome, Human ; High-Throughput Nucleotide Sequencing ; Humans ; Neural Networks, Computer ; Sequence Analysis, DNA/methods ; Transcription Factors/metabolism
    Chemical Substances Transcription Factors
    Language English
    Publishing date 2021-08-09
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2193340-6
    ISSN 1553-7358 ; 1553-734X
    ISSN (online) 1553-7358
    ISSN 1553-734X
    DOI 10.1371/journal.pcbi.1009282
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: seqgra: principled selection of neural network architectures for genomics prediction tasks.

    Krismer, Konstantin / Hammelman, Jennifer / Gifford, David K

    Bioinformatics (Oxford, England)

    2022  Volume 38, Issue 9, Page(s) 2381–2388

    Abstract: Motivation: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their ... ...

    Abstract Motivation: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process.
    Results: We show that seqgra can be used to (i) generate data under the assumption of a hypothesized model of genome regulation, (ii) identify neural network architectures capable of recovering the rules of said model and (iii) analyze a model's predictive performance as a function of training set size and the complexity of the rules behind the simulated data.
    Availability and implementation: The source code of the seqgra package is hosted on GitHub (https://github.com/gifford-lab/seqgra). seqgra is a pip-installable Python package. Extensive documentation can be found at https://kkrismer.github.io/seqgra.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Genomics ; Neural Networks, Computer ; Software ; Chromatin ; Regulatory Sequences, Nucleic Acid
    Chemical Substances Chromatin
    Language English
    Publishing date 2022-02-19
    Publishing country England
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, N.I.H., Extramural
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btac101
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: spatzie: an R package for identifying significant transcription factor motif co-enrichment from enhancer-promoter interactions.

    Hammelman, Jennifer / Krismer, Konstantin / Gifford, David K

    Nucleic acids research

    2022  Volume 50, Issue 9, Page(s) e52

    Abstract: Genomic interactions provide important context to our understanding of the state of the genome. One question is whether specific transcription factor interactions give rise to genome organization. We introduce spatzie, an R package and a website that ... ...

    Abstract Genomic interactions provide important context to our understanding of the state of the genome. One question is whether specific transcription factor interactions give rise to genome organization. We introduce spatzie, an R package and a website that implements statistical tests for significant transcription factor motif cooperativity between enhancer-promoter interactions. We conducted controlled experiments under realistic simulated data from ChIP-seq to confirm spatzie is capable of discovering co-enriched motif interactions even in noisy conditions. We then use spatzie to investigate cell type specific transcription factor cooperativity within recent human ChIA-PET enhancer-promoter interaction data. The method is available online at https://spatzie.mit.edu.
    MeSH term(s) Chromatin Immunoprecipitation Sequencing ; Enhancer Elements, Genetic ; Genome ; Genomics ; Humans ; Promoter Regions, Genetic ; Software ; Transcription Factors/genetics ; Transcription Factors/metabolism
    Chemical Substances Transcription Factors
    Language English
    Publishing date 2022-01-05
    Publishing country England
    Document type Journal Article
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gkac036
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Book ; Online: Image classifiers can not be made robust to small perturbations

    Dai, Zheng / Gifford, David K.

    2021  

    Abstract: The sensitivity of image classifiers to small perturbations in the input is often viewed as a defect of their construction. We demonstrate that this sensitivity is a fundamental property of classifiers. For any arbitrary classifier over the set of $n$-by- ...

    Abstract The sensitivity of image classifiers to small perturbations in the input is often viewed as a defect of their construction. We demonstrate that this sensitivity is a fundamental property of classifiers. For any arbitrary classifier over the set of $n$-by-$n$ images, we show that for all but one class it is possible to change the classification of all but a tiny fraction of the images in that class with a perturbation of size $O(n^{1/\max{(p,1)}})$ when measured in any $p$-norm for $p \geq 0$. We then discuss how this phenomenon relates to human visual perception and the potential implications for the design considerations of computer vision systems.

    Comment: 8 pages, 2 figures
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Machine Learning ; Statistics - Machine Learning
    Publishing date 2021-12-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: DeepLigand: accurate prediction of MHC class I ligands using peptide embedding.

    Zeng, Haoyang / Gifford, David K

    Bioinformatics (Oxford, England)

    2019  Volume 35, Issue 14, Page(s) i278–i283

    Abstract: Motivation: The computational modeling of peptide display by class I major histocompatibility complexes (MHCs) is essential for peptide-based therapeutics design. Existing computational methods for peptide-display focus on modeling the peptide-MHC- ... ...

    Abstract Motivation: The computational modeling of peptide display by class I major histocompatibility complexes (MHCs) is essential for peptide-based therapeutics design. Existing computational methods for peptide-display focus on modeling the peptide-MHC-binding affinity. However, such models are not able to characterize the sequence features for the other cellular processes in the peptide display pathway that determines MHC ligand selection.
    Results: We introduce a semi-supervised model, DeepLigand that outperforms the state-of-the-art models in MHC Class I ligand prediction. DeepLigand combines a peptide language model and peptide binding affinity prediction to score MHC class I peptide presentation. The peptide language model characterizes sequence features that correspond to secondary factors in MHC ligand selection other than binding affinity. The peptide embedding is learned by pre-training on natural ligands, and can discriminate between ligands and non-ligands in the absence of binding affinity prediction. Although conventional affinity-based models fail to classify peptides with moderate affinities, DeepLigand discriminates ligands from non-ligands with consistently high accuracy.
    Availability and implementation: We make DeepLigand available at https://github.com/gifford-lab/DeepLigand.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Histocompatibility Antigens Class I ; Ligands ; Peptides/analysis ; Protein Binding ; Software
    Chemical Substances Histocompatibility Antigens Class I ; Ligands ; Peptides
    Language English
    Publishing date 2019-09-10
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btz330
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Quantification of Uncertainty in Peptide-MHC Binding Prediction Improves High-Affinity Peptide Selection for Therapeutic Design.

    Zeng, Haoyang / Gifford, David K

    Cell systems

    2019  Volume 9, Issue 2, Page(s) 159–166.e3

    Abstract: The computational identification of peptides that can bind the major histocompatibility complex (MHC) with high affinity is an essential step in developing personal immunotherapies and vaccines. We introduce PUFFIN, a deep residual network-based ... ...

    Abstract The computational identification of peptides that can bind the major histocompatibility complex (MHC) with high affinity is an essential step in developing personal immunotherapies and vaccines. We introduce PUFFIN, a deep residual network-based computational approach that quantifies uncertainty in peptide-MHC affinity prediction that arises from observational noise and the lack of relevant training examples. With PUFFIN's uncertainty metrics, we define binding likelihood, the probability a peptide binds to a given MHC allele at a specified affinity threshold. Compared to affinity point estimates, we find that binding likelihood correlates better with the observed affinity and reduces false positives in high-affinity peptide design. When applied to examine an existing peptide vaccine, PUFFIN identifies an alternative vaccine formulation with higher binding likelihood. PUFFIN is freely available for download at http://github.com/gifford-lab/PUFFIN.
    MeSH term(s) Algorithms ; Computational Biology/methods ; Databases, Protein ; Histocompatibility Antigens Class I/genetics ; Humans ; Major Histocompatibility Complex/physiology ; Peptides/metabolism ; Protein Binding/physiology ; Software ; Uncertainty
    Chemical Substances Histocompatibility Antigens Class I ; Peptides
    Language English
    Publishing date 2019-06-05
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 2854138-8
    ISSN 2405-4720 ; 2405-4712
    ISSN (online) 2405-4720
    ISSN 2405-4712
    DOI 10.1016/j.cels.2019.05.004
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding.

    Huisman, Brooke D / Dai, Zheng / Gifford, David K / Birnbaum, Michael E

    eLife

    2022  Volume 11

    Abstract: T cells play a critical role in the adaptive immune response, recognizing peptide antigens presented on the cell surface by major histocompatibility complex (MHC) proteins. While assessing peptides for MHC binding is an important component of probing ... ...

    Abstract T cells play a critical role in the adaptive immune response, recognizing peptide antigens presented on the cell surface by major histocompatibility complex (MHC) proteins. While assessing peptides for MHC binding is an important component of probing these interactions, traditional assays for testing peptides of interest for MHC binding are limited in throughput. Here, we present a yeast display-based platform for assessing the binding of tens of thousands of user-defined peptides in a high-throughput manner. We apply this approach to assess a tiled library covering the SARS-CoV-2 proteome and four dengue virus serotypes for binding to human class II MHCs, including HLA-DR401, -DR402, and -DR404. While the peptide datasets show broad agreement with previously described MHC-binding motifs, they additionally reveal experimentally validated computational false positives and false negatives. We therefore present this approach as able to complement current experimental datasets and computational predictions. Further, our yeast display approach underlines design considerations for epitope identification experiments and serves as a framework for examining relationships between viral conservation and MHC binding, which can be used to identify potentially high-interest peptide binders from viral proteins. These results demonstrate the utility of our approach to determine peptide-MHC binding interactions in a manner that can supplement and potentially enhance current algorithm-based approaches.
    MeSH term(s) COVID-19 ; Humans ; Peptides/metabolism ; Protein Binding ; Proteome/metabolism ; SARS-CoV-2 ; Saccharomyces cerevisiae/metabolism
    Chemical Substances Peptides ; Proteome
    Language English
    Publishing date 2022-07-04
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, N.I.H., Extramural
    ZDB-ID 2687154-3
    ISSN 2050-084X ; 2050-084X
    ISSN (online) 2050-084X
    ISSN 2050-084X
    DOI 10.7554/eLife.78589
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions.

    Yeo, Grace Hui Ting / Saksena, Sachit D / Gifford, David K

    Nature communications

    2021  Volume 12, Issue 1, Page(s) 3222

    Abstract: Existing computational methods that use single-cell RNA-sequencing (scRNA-seq) for cell fate prediction do not model how cells evolve stochastically and in physical time, nor can they predict how differentiation trajectories are altered by proposed ... ...

    Abstract Existing computational methods that use single-cell RNA-sequencing (scRNA-seq) for cell fate prediction do not model how cells evolve stochastically and in physical time, nor can they predict how differentiation trajectories are altered by proposed interventions. We introduce PRESCIENT (Potential eneRgy undErlying Single Cell gradIENTs), a generative modeling framework that learns an underlying differentiation landscape from time-series scRNA-seq data. We validate PRESCIENT on an experimental lineage tracing dataset, where we show that PRESCIENT is able to predict the fate biases of progenitor cells in hematopoiesis when accounting for cell proliferation, improving upon the best-performing existing method. We demonstrate how PRESCIENT can simulate trajectories for perturbed cells, recovering the expected effects of known modulators of cell fate in hematopoiesis and pancreatic β cell differentiation. PRESCIENT is able to accommodate complex perturbations of multiple genes, at different time points and from different starting cell populations, and is available at https://github.com/gifford-lab/prescient .
    MeSH term(s) Animals ; Cell Differentiation/genetics ; Cell Proliferation/genetics ; Cells, Cultured ; Computer Simulation ; Datasets as Topic ; Deep Learning ; Hematopoiesis/genetics ; Humans ; Insulin-Secreting Cells/physiology ; Mice ; Models, Genetic ; RNA-Seq ; Single-Cell Analysis/methods ; Software ; Stem Cells/physiology ; Stochastic Processes
    Language English
    Publishing date 2021-05-28
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-021-23518-w
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: IDR2D identifies reproducible genomic interactions.

    Krismer, Konstantin / Guo, Yuchun / Gifford, David K

    Nucleic acids research

    2020  Volume 48, Issue 6, Page(s) e31

    Abstract: Chromatin interaction data from protocols such as ChIA-PET, HiChIP and Hi-C provide valuable insights into genome organization and gene regulation, but can include spurious interactions that do not reflect underlying genome biology. We introduce an ... ...

    Abstract Chromatin interaction data from protocols such as ChIA-PET, HiChIP and Hi-C provide valuable insights into genome organization and gene regulation, but can include spurious interactions that do not reflect underlying genome biology. We introduce an extension of the Irreproducible Discovery Rate (IDR) method called IDR2D that identifies replicable interactions shared by chromatin interaction experiments. IDR2D provides a principled set of interactions and eliminates artifacts from single experiments. The method is available as a Bioconductor package for the R community, as well as an online service at https://idr2d.mit.edu.
    MeSH term(s) Chromatin/metabolism ; Chromatin Immunoprecipitation ; Chromosomes/genetics ; Genome ; Genomics/methods ; Reproducibility of Results ; Software
    Chemical Substances Chromatin
    Language English
    Publishing date 2020-02-01
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gkaa030
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top