LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 109

Search options

  1. Article: Stepwise evolution and exceptional conservation of ORF1a/b overlap in coronaviruses.

    Mei, Han / Nekrutenko, Anton

    bioRxiv : the preprint server for biology

    2021  

    Abstract: The programmed frameshift element (PFE) rerouting translation ... ...

    Abstract The programmed frameshift element (PFE) rerouting translation from
    Language English
    Publishing date 2021-06-15
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2021.06.14.448413
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Sequencing error profiles of Illumina sequencing instruments.

    Stoler, Nicholas / Nekrutenko, Anton

    NAR genomics and bioinformatics

    2021  Volume 3, Issue 1, Page(s) lqab019

    Abstract: Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public ... ...

    Abstract Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one.
    Language English
    Publishing date 2021-03-27
    Publishing country England
    Document type Journal Article
    ISSN 2631-9268
    ISSN (online) 2631-9268
    DOI 10.1093/nargab/lqab019
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: In memory of James Taylor: the birth of Galaxy.

    Nekrutenko, Anton / Schatz, Michael C

    Genome biology

    2020  Volume 21, Issue 1, Page(s) 105

    MeSH term(s) Genomics/history ; History, 21st Century ; United States
    Language English
    Publishing date 2020-04-30
    Publishing country England
    Document type Biography ; Editorial ; Historical Article
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/s13059-020-02016-0
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Stepwise evolution and exceptional conservation of ORF1a/b overlap in coronaviruses

    Mei, Han / Nekrutenko, Anton

    bioRxiv

    Abstract: The programmed frameshift element (PFE) rerouting translation from ORF1a to ORF1b is essential for propagation of coronaviruses. A combination of genomic features that make up PFE--the overlap between the two reading frames, a slippery sequence, as well ... ...

    Abstract The programmed frameshift element (PFE) rerouting translation from ORF1a to ORF1b is essential for propagation of coronaviruses. A combination of genomic features that make up PFE--the overlap between the two reading frames, a slippery sequence, as well as an ensemble of complex secondary structure elements--puts severe constraints on this region as most possible nucleotide substitution may disrupt one or more of these elements. The vast amount of SARS-CoV-2 sequencing data generated within the past year provides an opportunity to assess evolutionary dynamics of PFE in great detail. Here we performed a comparative analysis of all available coronaviral genomic data available to date. We show that the overlap between ORF1a and b evolved as a set of discrete 7, 16, 22, 25, and 31 nucleotide stretches with a well defined phylogenetic specificity. We further examined sequencing data from over 350,000 complete genomes and 55,000 raw read datasets to demonstrate exceptional conservation of the PFE region.
    Keywords covid19
    Language English
    Publishing date 2021-06-15
    Publisher Cold Spring Harbor Laboratory
    Document type Article ; Online
    DOI 10.1101/2021.06.14.448413
    Database COVID19

    Kategorien

  5. Article ; Online: Reproducible and accessible analysis of transposon insertion sequencing in Galaxy for qualitative essentiality analyses.

    Larivière, Delphine / Wickham, Laura / Keiler, Kenneth / Nekrutenko, Anton

    BMC microbiology

    2021  Volume 21, Issue 1, Page(s) 168

    Abstract: Background: Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains ... ...

    Abstract Background: Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such "problem areas" is the analysis of Transposon Insertion Sequencing (TIS) data. TIS allows probing of almost the entire genome of a microorganism by introducing random insertions of transposon-derived constructs. The impact of the insertions on the survival and growth under specific conditions provides precise information about genes affecting specific phenotypic characteristics. A wide array of tools has been developed to analyze TIS data. Among the variety of options available, it is often difficult to identify which one can provide a reliable and reproducible analysis.
    Results: Here we sought to understand the challenges and propose reliable practices for the analysis of TIS experiments. Using data from two recent TIS studies, we have developed a series of workflows that include multiple tools for data de-multiplexing, promoter sequence identification, transposon flank alignment, and read count repartition across the genome. Particular attention was paid to quality control procedures, such as determining the optimal tool parameters for the analysis and removal of contamination.
    Conclusions: Our work provides an assessment of the currently available tools for TIS data analysis. It offers ready to use workflows that can be invoked by anyone in the world using our public Galaxy platform ( https://usegalaxy.org ). To lower the entry barriers, we have also developed interactive tutorials explaining details of TIS data analysis procedures at https://bit.ly/gxy-tis .
    MeSH term(s) Base Sequence ; DNA Transposable Elements ; Escherichia coli/genetics ; Gene Library ; Genome, Bacterial ; Genomics/instrumentation ; Genomics/methods ; Genomics/standards ; Mutagenesis, Insertional ; Promoter Regions, Genetic ; Software ; Staphylococcus aureus/genetics
    Chemical Substances DNA Transposable Elements
    Language English
    Publishing date 2021-06-05
    Publishing country England
    Document type Evaluation Study ; Journal Article ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2041505-9
    ISSN 1471-2180 ; 1471-2180
    ISSN (online) 1471-2180
    ISSN 1471-2180
    DOI 10.1186/s12866-021-02184-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Stepwise Evolution and Exceptional Conservation of ORF1a/b Overlap in Coronaviruses.

    Mei, Han / Kosakovsky Pond, Sergei / Nekrutenko, Anton

    Molecular biology and evolution

    2021  Volume 38, Issue 12, Page(s) 5678–5684

    Abstract: The programmed frameshift element (PFE) rerouting translation from ORF1a to ORF1b is essential for the propagation of coronaviruses. The combination of genomic features that make up PFE-the overlap between the two reading frames, a slippery sequence, as ... ...

    Abstract The programmed frameshift element (PFE) rerouting translation from ORF1a to ORF1b is essential for the propagation of coronaviruses. The combination of genomic features that make up PFE-the overlap between the two reading frames, a slippery sequence, as well as an ensemble of complex secondary structure elements-places severe constraints on this region as most possible nucleotide substitution may disrupt one or more of these elements. The vast amount of SARS-CoV-2 sequencing data generated within the past year provides an opportunity to assess the evolutionary dynamics of PFE in great detail. Here, we performed a comparative analysis of all available coronaviral genomic data available to date. We show that the overlap between ORF1a and ORF1b evolved as a set of discrete 7, 16, 22, 25, and 31 nucleotide stretches with a well-defined phylogenetic specificity. We further examined sequencing data from over 1,500,000 complete genomes and 55,000 raw read data sets to demonstrate exceptional conservation and detect signatures of selection within the PFE region.
    MeSH term(s) Coronavirus/genetics ; Nucleotides ; Open Reading Frames ; Phylogeny ; SARS-CoV-2/genetics
    Chemical Substances Nucleotides
    Language English
    Publishing date 2021-09-01
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 998579-7
    ISSN 1537-1719 ; 0737-4038
    ISSN (online) 1537-1719
    ISSN 0737-4038
    DOI 10.1093/molbev/msab265
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Predicting runtimes of bioinformatics tools based on historical data: five years of Galaxy usage.

    Tyryshkina, Anastasia / Coraor, Nate / Nekrutenko, Anton

    Bioinformatics (Oxford, England)

    2019  Volume 35, Issue 18, Page(s) 3453–3460

    Abstract: Motivation: One of the many technical challenges that arises when scheduling bioinformatics analyses at scale is determining the appropriate amount of memory and processing resources. Both over- and under-allocation leads to an inefficient use of ... ...

    Abstract Motivation: One of the many technical challenges that arises when scheduling bioinformatics analyses at scale is determining the appropriate amount of memory and processing resources. Both over- and under-allocation leads to an inefficient use of computational infrastructure. Over allocation locks resources that could otherwise be used for other analyses. Under-allocation causes job failure and requires analyses to be repeated with a larger memory or runtime allowance. We address this challenge by using a historical dataset of bioinformatics analyses run on the Galaxy platform to demonstrate the feasibility of an online service for resource requirement estimation.
    Results: Here we introduced the Galaxy job run dataset and tested popular machine learning models on the task of resource usage prediction. We include three popular forest models: the extra trees regressor, the gradient boosting regressor and the random forest regressor, and find that random forests perform best in the runtime prediction task. We also present two methods of choosing walltimes for previously unseen jobs. Quantile regression forests are more accurate in their predictions, and grant the ability to improve performance by changing the confidence of the estimates. However, the sizes of the confidence intervals are variable and cannot be absolutely constrained. Random forest classifiers address this problem by providing control over the size of the prediction intervals with an accuracy that is comparable to that of the regressor. We show that estimating the memory requirements of a job is possible using the same methods, which as far as we know, has not been done before. Such estimation can be highly beneficial for accurate resource allocation.
    Availability and implementation: Source code available at https://github.com/atyryshkina/algorithm-performance-analysis, implemented in Python.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Computational Biology ; Machine Learning ; Software
    Language English
    Publishing date 2019-02-19
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btz054
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Erratum: Increased yields of duplex sequencing data by a series of quality control tools.

    Povysil, Gundula / Heinzl, Monika / Salazar, Renato / Stoler, Nicholas / Nekrutenko, Anton / Tiemann-Boege, Irene

    NAR genomics and bioinformatics

    2021  Volume 3, Issue 1, Page(s) lqab014

    Abstract: This corrects the article DOI: 10.1093/nargab/lqab002.]. ...

    Abstract [This corrects the article DOI: 10.1093/nargab/lqab002.].
    Language English
    Publishing date 2021-03-01
    Publishing country England
    Document type Published Erratum
    ISSN 2631-9268
    ISSN (online) 2631-9268
    DOI 10.1093/nargab/lqab014
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Increased yields of duplex sequencing data by a series of quality control tools.

    Povysil, Gundula / Heinzl, Monika / Salazar, Renato / Stoler, Nicholas / Nekrutenko, Anton / Tiemann-Boege, Irene

    NAR genomics and bioinformatics

    2021  Volume 3, Issue 1, Page(s) lqab002

    Abstract: Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small ... ...

    Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.
    Language English
    Publishing date 2021-02-09
    Publishing country England
    Document type Journal Article
    ISSN 2631-9268
    ISSN (online) 2631-9268
    DOI 10.1093/nargab/lqab002
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: No more business as usual: agile and effective responses to emerging pathogen threats require open data and open analytics

    Nekrutenko, Anton / Kosakovsky Pond, Sergei L

    bioRxiv

    Abstract: The current state of much of the Wuhan pneumonia virus (COVID-19) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health ... ...

    Abstract The current state of much of the Wuhan pneumonia virus (COVID-19) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies, and requires unimpeded access to data, analysis tools, and computational infrastructure. Here we show that community efforts in developing open analytical software tools over the past ten years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all COVID-19 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and to (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.
    Keywords covid19
    Publisher BioRxiv; MedRxiv
    Document type Article ; Online
    DOI 10.1101/2020.02.21.959973
    Database COVID19

    Kategorien

To top