LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 3 of total 3

Search options

  1. Article ; Online: ViPRA-Haplo: De Novo Reconstruction of Viral Populations Using Paired End Sequencing Data.

    Li, Weiling / Malhotra, Raunaq / Wu, Steven / Jha, Manjari / Rodrigo, Allen / Poss, Mary / Acharya, Raj

    IEEE/ACM transactions on computational biology and bioinformatics

    2024  Volume PP

    Abstract: We present ViPRA-Haplo, a de novo strain-specific assembly workflow for reconstructing viral haplotypes in a viral population from paired-end next generation sequencing (NGS) data. The proposed Viral Path Reconstruction Algorithm (ViPRA) generates a ... ...

    Abstract We present ViPRA-Haplo, a de novo strain-specific assembly workflow for reconstructing viral haplotypes in a viral population from paired-end next generation sequencing (NGS) data. The proposed Viral Path Reconstruction Algorithm (ViPRA) generates a subset of paths from a De Bruijn graph of reads using the pairing information of reads. The paths generated by ViPRA are an over-estimation of the true contigs. We propose two refinement methods to obtain an optimal set of contigs representing viral haplotypes. The first method clusters paths reconstructed by ViPRA using VSEARCH [1] based on sequence similarity, while the second method, MLEHaplo, generates a maximum likelihood estimate of viral populations. We evaluated our pipeline on both simulated and real viral quasispecies data from HIV (and real data from SARS-COV-2). Experimental results show that ViPRA-Haplo, although still an overestimation in the number of true contigs, outperforms the existing tool, PEHaplo, providing up to 9% better genome coverage on HIV real data. In addition, ViPRA-Haplo also retains higher diversity of the viral population as demonstrated by the presence of a higher percentage of contigs less than 1000 base pairs (bps), which also contain k-mers with counts less than 100 (representing rarer sequences), which are absent in PEHaplo. For SARS-CoV-2 sequencing data, ViPRA-Haplo reconstructs contigs that cover more than 90% of the reference genome and were able to validate known SARS-CoV-2 strains in the sequencing data.
    Language English
    Publishing date 2024-03-07
    Publishing country United States
    Document type Journal Article
    ISSN 1557-9964
    ISSN (online) 1557-9964
    DOI 10.1109/TCBB.2024.3374595
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: A random forest classifier for detecting rare variants in NGS data from viral populations.

    Malhotra, Raunaq / Jha, Manjari / Poss, Mary / Acharya, Raj

    Computational and structural biotechnology journal

    2017  Volume 15, Page(s) 388–395

    Abstract: We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length ... ...

    Abstract We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of
    Language English
    Publishing date 2017-07-19
    Publishing country Netherlands
    Document type Journal Article
    ZDB-ID 2694435-2
    ISSN 2001-0370
    ISSN 2001-0370
    DOI 10.1016/j.csbj.2017.07.001
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: A Generalized Lattice based Probabilistic Approach for Metagenomic Clustering.

    Jha, Manjari / Malhotra, Raunaq / Acharya, Raj

    IEEE/ACM transactions on computational biology and bioinformatics

    2016  Volume 14, Issue 4, Page(s) 749–761

    Abstract: Metagenomics involves the analysis of genomes of microorganisms sampled directly from their environment. Next Generation Sequencing allows a high-throughput sampling of small segments from genomes in the metagenome to generate reads. To study the ... ...

    Abstract Metagenomics involves the analysis of genomes of microorganisms sampled directly from their environment. Next Generation Sequencing allows a high-throughput sampling of small segments from genomes in the metagenome to generate reads. To study the properties and relationships of the microorganisms present, clustering can be performed based on the inherent composition of the sampled reads for unknown species. We propose a two-dimensional lattice based probabilistic model for clustering metagenomic datasets. The occurrence of a species in the metagenome is estimated using a lattice of probabilistic distributions over small sized genomic sequences. The two dimensions denote distributions for different sizes and groups of words respectively. The lattice structure allows for additional support for a node from its neighbors when the probabilistic support for the species using the parameters of the current node is deemed insufficient. We also show convergence for our algorithm. We test our algorithm on simulated metagenomic data containing bacterial species and observe more than 85% precision. We also evaluate our algorithm on an in vitro-simulated bacterial metagenome and on human patient data, and show a better clustering than other algorithms even for short reads and varied abundance. The software and datasets can be downloaded from https:// github.com/lattclus/lattice-metage.
    Language English
    Publishing date 2016-05-05
    Publishing country United States
    Document type Journal Article
    ISSN 1557-9964
    ISSN (online) 1557-9964
    DOI 10.1109/TCBB.2016.2563422
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top