LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 23

Search options

  1. Article ; Online: Robust chromatin state annotation.

    Foroozandeh Shahraki, Mehdi / Farahbod, Marjan / Libbrecht, Maxwell W

    Genome research

    2024  Volume 34, Issue 3, Page(s) 469–483

    Abstract: With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as ... ...

    Abstract With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%-69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (
    MeSH term(s) Chromatin/genetics ; Chromatin/metabolism ; Molecular Sequence Annotation ; Humans ; Reproducibility of Results ; Genomics/methods
    Chemical Substances Chromatin
    Language English
    Publishing date 2024-04-25
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 1284872-4
    ISSN 1549-5469 ; 1088-9051 ; 1054-9803
    ISSN (online) 1549-5469
    ISSN 1088-9051 ; 1054-9803
    DOI 10.1101/gr.278343.123
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Latent Representation of the Human Pan-Celltype Epigenome Through a Deep Recurrent Neural Network.

    Dsouza, Kevin B / Li, Adam Y / Bhargava, Vijay K / Libbrecht, Maxwell W

    IEEE/ACM transactions on computational biology and bioinformatics

    2022  Volume 19, Issue 4, Page(s) 2313–2323

    Abstract: The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were cell type-specific, applying ... ...

    Abstract The availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were cell type-specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-cell type representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural network autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions, and evolutionary conservation. These representations outperform existing methods in a majority of cell types while yielding smoother representations along the genomic axis due to their sequential nature.
    MeSH term(s) Epigenome ; Humans ; Neural Networks, Computer
    Language English
    Publishing date 2022-08-08
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 1557-9964
    ISSN (online) 1557-9964
    DOI 10.1109/TCBB.2021.3084147
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns.

    Libbrecht, Maxwell W / Chan, Rachel C W / Hoffman, Michael M

    PLoS computational biology

    2021  Volume 17, Issue 10, Page(s) e1009423

    Abstract: Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of ... ...

    Abstract Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
    MeSH term(s) Algorithms ; Chromatin/genetics ; Chromatin Immunoprecipitation Sequencing ; Genome/genetics ; Genomics/methods ; Histone Code ; Humans ; Molecular Sequence Annotation/methods ; Protein Binding
    Chemical Substances Chromatin
    Language English
    Publishing date 2021-10-14
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Review
    ZDB-ID 2193340-6
    ISSN 1553-7358 ; 1553-734X
    ISSN (online) 1553-7358
    ISSN 1553-734X
    DOI 10.1371/journal.pcbi.1009423
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: SigTools: exploratory visualization for genomic signals.

    Masoumi, Shohre / Libbrecht, Maxwell W / Wiese, Kay C

    Bioinformatics (Oxford, England)

    2021  Volume 38, Issue 4, Page(s) 1126–1128

    Abstract: Motivation: With the advancement of sequencing technologies, genomic data sets are constantly being expanded by high volumes of different data types. One recently introduced data type in genomic science is genomic signals, which are usually short-read ... ...

    Abstract Motivation: With the advancement of sequencing technologies, genomic data sets are constantly being expanded by high volumes of different data types. One recently introduced data type in genomic science is genomic signals, which are usually short-read coverage measurements over the genome. To understand and evaluate the results of such studies, one needs to understand and analyze the characteristics of the input data.
    Results: SigTools is an R-based genomic signals visualization package developed with two objectives: (i) to facilitate genomic signals exploration in order to uncover insights for later model training, refinement and development by including distribution and autocorrelation plots; (ii) to enable genomic signals interpretation by including correlation and aggregation plots. In addition, our corresponding web application, SigTools-Shiny, extends the accessibility scope of these modules to people who are more comfortable working with graphical user interfaces instead of command-line tools.
    Availability and implementation: SigTools source code, installation guide and manual is freely available on http://github.com/shohre73.
    MeSH term(s) Humans ; Genomics/methods ; Genome ; Software ; Sequence Analysis
    Language English
    Publishing date 2021-04-26
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btab742
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Automated identification of maximal differential cell populations in flow cytometry data.

    Yue, Alice / Chauve, Cedric / Libbrecht, Maxwell W / Brinkman, Ryan R

    Cytometry. Part A : the journal of the International Society for Analytical Cytology

    2021  Volume 101, Issue 2, Page(s) 177–184

    Abstract: We introduce a new cell population score called SpecEnr (specific enrichment) and describe a method that discovers robust and accurate candidate biomarkers from flow cytometry data. Our approach identifies a new class of candidate biomarkers we define as ...

    Abstract We introduce a new cell population score called SpecEnr (specific enrichment) and describe a method that discovers robust and accurate candidate biomarkers from flow cytometry data. Our approach identifies a new class of candidate biomarkers we define as driver cell populations, whose abundance is associated with a sample class (e.g., disease), but not as a result of a change in a related population. We show that the driver cell populations we find are also easily interpretable using a lattice-based visualization tool. Our method is implemented in the R package flowGraph, freely available on GitHub (github.com/aya49/flowGraph) and on BioConductor.
    MeSH term(s) Biomarkers ; Flow Cytometry/methods ; Software
    Chemical Substances Biomarkers
    Language English
    Publishing date 2021-10-22
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2099868-5
    ISSN 1552-4930 ; 0196-4763 ; 1552-4922
    ISSN (online) 1552-4930
    ISSN 0196-4763 ; 1552-4922
    DOI 10.1002/cyto.a.24503
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Continuous chromatin state feature annotation of the human epigenome.

    Daneshpajouh, Habib / Chen, Bowen / Shokraneh, Neda / Masoumi, Shohre / Wiese, Kay C / Libbrecht, Maxwell W

    Bioinformatics (Oxford, England)

    2022  Volume 38, Issue 11, Page(s) 3029–3036

    Abstract: Motivation: Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These methods take as input a set of sequencing-based assays of epigenomic activity, such as ChIP-seq measurements of ... ...

    Abstract Motivation: Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These methods take as input a set of sequencing-based assays of epigenomic activity, such as ChIP-seq measurements of histone modification and transcription factor binding. They output an annotation of the genome that assigns a chromatin state label to each genomic position. Existing SAGA methods have several limitations caused by the discrete annotation framework: such annotations cannot easily represent varying strengths of genomic elements, and they cannot easily represent combinatorial elements that simultaneously exhibit multiple types of activity. To remedy these limitations, we propose an annotation strategy that instead outputs a vector of chromatin state features at each position rather than a single discrete label. Continuous modeling is common in other fields, such as in topic modeling of text documents. We propose a method, epigenome-ssm-nonneg, that uses a non-negative state space model to efficiently annotate the genome with chromatin state features. We also propose several measures of the quality of a chromatin state feature annotation and we compare the performance of several alternative methods according to these quality measures.
    Results: We show that chromatin state features from epigenome-ssm-nonneg are more useful for several downstream applications than both continuous and discrete alternatives, including their ability to identify expressed genes and enhancers. Therefore, we expect that these continuous chromatin state features will be valuable reference annotations to be used in visualization and downstream analysis.
    Availability and implementation: Source code for epigenome-ssm is available at https://github.com/habibdanesh/epigenome-ssm and Zenodo (DOI: 10.5281/zenodo.6507585).
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Humans ; Chromatin ; Epigenome ; Epigenomics/methods ; Genomics/methods ; Software
    Chemical Substances Chromatin
    Language English
    Publishing date 2022-04-21
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btac283
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Obtaining genetics insights from deep learning via explainable artificial intelligence.

    Novakovsky, Gherman / Dexter, Nick / Libbrecht, Maxwell W / Wasserman, Wyeth W / Mostafavi, Sara

    Nature reviews. Genetics

    2022  Volume 24, Issue 2, Page(s) 125–137

    Abstract: Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For ... ...

    Abstract Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.
    MeSH term(s) Artificial Intelligence ; Deep Learning ; Genomics
    Language English
    Publishing date 2022-10-03
    Publishing country England
    Document type Journal Article ; Review ; Research Support, Non-U.S. Gov't
    ZDB-ID 2035157-4
    ISSN 1471-0064 ; 1471-0056
    ISSN (online) 1471-0064
    ISSN 1471-0056
    DOI 10.1038/s41576-022-00532-2
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Book ; Online: Segmentation and genome annotation algorithms

    Libbrecht, Maxwell W / Chan, Rachel CW / Hoffman, Michael M

    2021  

    Abstract: Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of ... ...

    Abstract Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, catalogue existing large-scale reference annotations, and discuss the outlook for future work.
    Keywords Quantitative Biology - Genomics ; Computer Science - Machine Learning
    Subject code 004 ; 006
    Publishing date 2021-01-03
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article ; Online: Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.

    Libbrecht, Maxwell W / Bilmes, Jeffrey A / Noble, William Stafford

    Proteins

    2018  Volume 86, Issue 4, Page(s) 454–466

    Abstract: Selecting a non-redundant representative subset of sequences is a common step in many bioinformatics workflows, such as the creation of non-redundant training sets for sequence and structural models or selection of "operational taxonomic units" from ... ...

    Abstract Selecting a non-redundant representative subset of sequences is a common step in many bioinformatics workflows, such as the creation of non-redundant training sets for sequence and structural models or selection of "operational taxonomic units" from metagenomics data. Previous methods for this task, such as CD-HIT, PISCES, and UCLUST, apply a heuristic threshold-based algorithm that has no theoretical guarantees. We propose a new approach based on submodular optimization. Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success for other representative set selection problems. We demonstrate that the submodular optimization approach results in representative protein sequence subsets with greater structural diversity than sets chosen by existing methods, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by competing approaches. We also show how the optimization framework allows us to design a mixture objective function that performs well for both large and small representative sets. The framework we describe is the best possible in polynomial time (under some assumptions), and it is flexible and intuitive because it applies a suite of generic methods to optimize one of a variety of objective functions.
    MeSH term(s) Algorithms ; Cluster Analysis ; Proteins/chemistry ; Proteomics/methods ; Sequence Analysis, Protein/methods
    Chemical Substances Proteins
    Language English
    Publishing date 2018-02-01
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 806683-8
    ISSN 1097-0134 ; 0887-3585
    ISSN (online) 1097-0134
    ISSN 0887-3585
    DOI 10.1002/prot.25461
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: Learning representations of chromatin contacts using a recurrent neural network identifies genomic drivers of conformation.

    Dsouza, Kevin B / Maslova, Alexandra / Al-Jibury, Ediem / Merkenschlager, Matthias / Bhargava, Vijay K / Libbrecht, Maxwell W

    Nature communications

    2022  Volume 13, Issue 1, Page(s) 3704

    Abstract: Despite the availability of chromatin conformation capture experiments, discerning the relationship between the 1D genome and 3D conformation remains a challenge, which limits our understanding of their affect on gene expression and disease. We propose ... ...

    Abstract Despite the availability of chromatin conformation capture experiments, discerning the relationship between the 1D genome and 3D conformation remains a challenge, which limits our understanding of their affect on gene expression and disease. We propose Hi-C-LSTM, a method that produces low-dimensional latent representations that summarize intra-chromosomal Hi-C contacts via a recurrent long short-term memory neural network model. We find that these representations contain all the information needed to recreate the observed Hi-C matrix with high accuracy, outperforming existing methods. These representations enable the identification of a variety of conformation-defining genomic elements, including nuclear compartments and conformation-related transcription factors. They furthermore enable in-silico perturbation experiments that measure the influence of cis-regulatory elements on conformation.
    MeSH term(s) Chromatin/genetics ; Genomics ; Learning ; Molecular Conformation ; Neural Networks, Computer
    Chemical Substances Chromatin
    Language English
    Publishing date 2022-06-28
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-022-31337-w
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top