LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 576

Search options

  1. Article ; Online: Sketching and sampling approaches for fast and accurate long read classification.

    Das, Arun / Schatz, Michael C

    BMC bioinformatics

    2022  Volume 23, Issue 1, Page(s) 452

    Abstract: Background: In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the ... ...

    Abstract Background: In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read.
    Results: Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy.
    Conclusions: The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .
    MeSH term(s) Sequence Analysis, DNA/methods ; High-Throughput Nucleotide Sequencing/methods ; Software ; Metagenomics/methods ; Metagenome ; Algorithms
    Language English
    Publishing date 2022-10-31
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-022-05014-0
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: BEATRICE: Bayesian Fine-mapping from Summary Data using Deep Variational Inference.

    Ghosal, Sayan / Schatz, Michael C / Venkataraman, Archana

    bioRxiv : the preprint server for biology

    2023  

    Abstract: We introduce a novel framework BEATRICE to identify putative causal variants from GWAS summary statistics (https://github.com/sayangsep/Beatrice-Finemapping). Identifying causal variants is challenging due to their sparsity and to highly correlated ... ...

    Abstract We introduce a novel framework BEATRICE to identify putative causal variants from GWAS summary statistics (https://github.com/sayangsep/Beatrice-Finemapping). Identifying causal variants is challenging due to their sparsity and to highly correlated variants in the nearby regions. To account for these challenges, our approach relies on a hierarchical Bayesian model that imposes a binary concrete prior on the set of causal variants. We derive a variational algorithm for this fine-mapping problem by minimizing the KL divergence between an approximate density and the posterior probability distribution of the causal configurations. Correspondingly, we use a deep neural network as an inference machine to estimate the parameters of our proposal distribution. Our stochastic optimization procedure allows us to simultaneously sample from the space of causal configurations. We use these samples to compute the posterior inclusion probabilities and determine credible sets for each causal variant. We conduct a detailed simulation study to quantify the performance of our framework across different numbers of causal variants and different noise paradigms, as defined by the relative genetic contributions of causal and non-causal variants. Using this simulated data, we perform a comparative analysis against two state-of-the-art baseline methods for fine-mapping. We demonstrate that BEATRICE achieves uniformly better coverage with comparable power and set sizes, and that the performance gain increases with the number of causal variants. Thus, BEATRICE is a valuable tool to identify causal variants from eQTL and GWAS summary statistics across complex diseases and traits.
    Language English
    Publishing date 2023-12-14
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.03.24.534116
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Democratizing long-read genome assembly.

    Kirsche, Melanie / Schatz, Michael C

    Cell systems

    2021  Volume 12, Issue 10, Page(s) 945–947

    Abstract: De novo assembled genomes serve as the backbone for modern genomics. In an article in this issue of Cell Systems, Ekim et al. present the mdBG assembler that can assemble genomes 100-fold faster than previous methods, including a human genome in under 10  ...

    Abstract De novo assembled genomes serve as the backbone for modern genomics. In an article in this issue of Cell Systems, Ekim et al. present the mdBG assembler that can assemble genomes 100-fold faster than previous methods, including a human genome in under 10 min, which unlocks pan-genomics for many species.
    MeSH term(s) Genome, Human/genetics ; Genomics ; High-Throughput Nucleotide Sequencing ; Humans ; Sequence Analysis, DNA
    Language English
    Publishing date 2021-10-21
    Publishing country United States
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S. ; Comment
    ZDB-ID 2854138-8
    ISSN 2405-4720 ; 2405-4712
    ISSN (online) 2405-4720
    ISSN 2405-4712
    DOI 10.1016/j.cels.2021.09.010
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing.

    Kovaka, Sam / Ou, Shujun / Jenike, Katharine M / Schatz, Michael C

    Nature methods

    2023  Volume 20, Issue 1, Page(s) 12–16

    MeSH term(s) Transcriptome ; Sequence Analysis, DNA ; High-Throughput Nucleotide Sequencing
    Language English
    Publishing date 2023-01-01
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2169522-2
    ISSN 1548-7105 ; 1548-7091
    ISSN (online) 1548-7105
    ISSN 1548-7091
    DOI 10.1038/s41592-022-01716-8
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Sketching and sampling approaches for fast and accurate long read classification

    Arun Das / Michael C. Schatz

    BMC Bioinformatics, Vol 23, Iss 1, Pp 1-

    2022  Volume 23

    Abstract: Abstract Background In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to ... ...

    Abstract Abstract Background In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read. Results Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a “screen”) of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read’s similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy. Conclusions The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, ...
    Keywords Sketching ; Sampling ; Classification ; MinHash ; Metagenomics ; Computer applications to medicine. Medical informatics ; R858-859.7 ; Biology (General) ; QH301-705.5
    Subject code 004
    Language English
    Publishing date 2022-10-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: A master regulator of regeneration.

    Alonge, Michael / Schatz, Michael C

    Science (New York, N.Y.)

    2019  Volume 363, Issue 6432, Page(s) 1152–1153

    MeSH term(s) Cell Differentiation ; Regeneration
    Language English
    Publishing date 2019-03-14
    Publishing country United States
    Document type Journal Article ; Comment
    ZDB-ID 128410-1
    ISSN 1095-9203 ; 0036-8075
    ISSN (online) 1095-9203
    ISSN 0036-8075
    DOI 10.1126/science.aaw6258
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Nanopore sequencing meets epigenetics.

    Schatz, Michael C

    Nature methods

    2017  Volume 14, Issue 4, Page(s) 347–348

    MeSH term(s) DNA Methylation ; Epigenesis, Genetic ; Epigenomics ; Nanopores ; Sequence Analysis, DNA
    Language English
    Publishing date 2017-03-30
    Publishing country United States
    Document type Journal Article ; Comment
    ZDB-ID 2169522-2
    ISSN 1548-7105 ; 1548-7091
    ISSN (online) 1548-7105
    ISSN 1548-7091
    DOI 10.1038/nmeth.4240
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: In memory of James Taylor: the birth of Galaxy.

    Nekrutenko, Anton / Schatz, Michael C

    Genome biology

    2020  Volume 21, Issue 1, Page(s) 105

    MeSH term(s) Genomics/history ; History, 21st Century ; United States
    Language English
    Publishing date 2020-04-30
    Publishing country England
    Document type Biography ; Editorial ; Historical Article
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/s13059-020-02016-0
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Graph genomes article collection.

    Schatz, Michael C / Cosgrove, Andrew

    Genome biology

    2019  Volume 20, Issue 1, Page(s) 25

    MeSH term(s) Computer Graphics ; Genome ; Genomics ; Humans
    Language English
    Publishing date 2019-02-01
    Publishing country England
    Document type Editorial
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1465-6914 ; 1465-6906
    ISSN (online) 1474-760X ; 1465-6914
    ISSN 1465-6906
    DOI 10.1186/s13059-019-1636-0
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment.

    Kovaka, Sam / Hook, Paul W / Jenike, Katharine M / Shivakumar, Vikram / Morina, Luke B / Razaghi, Roham / Timp, Winston / Schatz, Michael C

    bioRxiv : the preprint server for biology

    2024  

    Abstract: Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic/transcriptomic and epigenetic information without additional library preparation. Presently, only a limited set of ... ...

    Abstract Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic/transcriptomic and epigenetic information without additional library preparation. Presently, only a limited set of modifications can be directly basecalled (e.g. 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis, and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods, and a reproducible
    Language English
    Publishing date 2024-03-10
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2024.03.05.583511
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top