LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 44

Search options

  1. Article ; Online: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR.

    Nawrocki, Eric P

    NAR genomics and bioinformatics

    2023  Volume 5, Issue 1, Page(s) lqad002

    Abstract: In 2020 and 2021, >1.5 million SARS-CoV-2 sequences were submitted to GenBank. The initial version (v1.0) of the VADR (Viral Annotation DefineR) software package that GenBank uses to automatically validate and annotate incoming viral sequences is too ... ...

    Abstract In 2020 and 2021, >1.5 million SARS-CoV-2 sequences were submitted to GenBank. The initial version (v1.0) of the VADR (Viral Annotation DefineR) software package that GenBank uses to automatically validate and annotate incoming viral sequences is too slow and memory intensive to process many thousands of SARS-CoV-2 sequences in a reasonable amount of time. Additionally, long stretches of ambiguous N nucleotides, which are common in many SARS-CoV-2 sequences, prevent VADR from accurate validation and annotation. VADR has been updated to more accurately and rapidly annotate SARS-CoV-2 sequences. Stretches of consecutive Ns are now identified and temporarily replaced with expected nucleotides to facilitate processing, and the slowest steps have been overhauled using
    Language English
    Publishing date 2023-01-20
    Publishing country England
    Document type Journal Article
    ISSN 2631-9268
    ISSN (online) 2631-9268
    DOI 10.1093/nargab/lqad002
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR.

    Nawrocki, Eric P

    bioRxiv : the preprint server for biology

    2022  

    Abstract: Background: In 2020 and 2021, more than 1.5 million SARS-CoV-2 sequences were submitted to GenBank. The initial version (v1.0) of the VADR (Viral Annotation DefineR) software package that GenBank uses to automatically validate and annotate incoming ... ...

    Abstract Background: In 2020 and 2021, more than 1.5 million SARS-CoV-2 sequences were submitted to GenBank. The initial version (v1.0) of the VADR (Viral Annotation DefineR) software package that GenBank uses to automatically validate and annotate incoming viral sequences is too slow and memory intensive to process many thousands of SARS-CoV-2 sequences in a reasonable amount of time. Additionally, long stretches of ambiguous N nucleotides, which are common in many SARS-CoV-2 sequences, prevent VADR from accurate validation and annotation.
    Results: VADR has been updated to more accurately and rapidly annotate SARS-CoV-2 sequences. Stretches of consecutive Ns are now identified and temporarily replaced with expected nucleotides to facilitate processing, and the slowest steps have been overhauled using
    Conclusion: VADR is now nearly 1000 times faster than it was in early 2020 for processing SARS-CoV-2 sequences submitted to GenBank. It has been used to screen and annotate more than 1.5 million SARS-CoV-2 sequences since June 2020, and it is now efficient enough to cope with the current rate of hundreds of thousands of submitted sequences per month. Version 1.4.1 is freely available ( https://github.com/ncbi/vadr ) for local installation and use.
    Language English
    Publishing date 2022-04-27
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2022.04.25.489427
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR

    Nawrocki, Eric P

    bioRxiv

    Abstract: Background: In 2020 and 2021, more than 1.5 million SARS-CoV-2 sequences were submitted to GenBank. The initial version (v1.0) of the VADR (Viral Annotation DefineR) software package that GenBank uses to automatically validate and annotate incoming viral ...

    Abstract Background: In 2020 and 2021, more than 1.5 million SARS-CoV-2 sequences were submitted to GenBank. The initial version (v1.0) of the VADR (Viral Annotation DefineR) software package that GenBank uses to automatically validate and annotate incoming viral sequences is too slow and memory intensive to process many thousands of SARS-CoV-2 sequences in a reasonable amount of time. Additionally, long stretches of ambiguous N nucleotides, which are common in many SARS-CoV-2 sequences, prevent VADR from accurate validation and annotation. Results: VADR has been updated to more accurately and rapidly annotate SARS-CoV-2 sequences. Stretches of consecutive Ns are now identified and temporarily replaced with expected nucleotides to facilitate processing, and the slowest steps have been overhauled using blastn and glsearch, increasing speed, reducing the memory requirement from 64Gb to 2Gb per thread, and allowing simple, coarse-grained parallelization on multiple processors per host. Conclusion: VADR is now nearly 1000 times faster than it was in early 2020 for processing SARS-CoV-2 sequences submitted to GenBank. It has been used to screen and annotate more than 1.5 million SARS-CoV-2 sequences since June 2020, and it is now efficient enough to cope with the current rate of hundreds of thousands of submitted sequences per month. Version 1.4.1 is freely available (https://github.com/ncbi/vadr) for local installation and use.
    Keywords covid19
    Language English
    Publishing date 2022-04-27
    Publisher Cold Spring Harbor Laboratory
    Document type Article ; Online
    DOI 10.1101/2022.04.25.489427
    Database COVID19

    Kategorien

  4. Article: Influenza sequence validation and annotation using VADR.

    Calhoun, Vincent C / Hatcher, Eneida L / Yankie, Linda / Nawrocki, Eric P

    bioRxiv : the preprint server for biology

    2024  

    Abstract: Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLAN has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions, and has been publicly available as a ... ...

    Abstract Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLAN has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions, and has been publicly available as a webserver but not as a standalone tool. VADR is a general sequence validation and annotation software package used by GenBank for Norovirus, Dengue virus and SARS-CoV-2 virus sequence processing that is available as a standalone tool. We have created VADR influenza models based on the FLAN reference sequences and adapted VADR to accurately annotate influenza sequences. VADR and FLAN show consistent results on the vast majority of influenza sequences, and when they disagree VADR is usually correct. VADR can also accurately process influenza D sequences as well as influenza A H17, H18, H19, N10 and N11 subtype sequences, which FLAN cannot. VADR 1.6.3 and the associated influenza models are now freely available for users to download and use.
    Language English
    Publishing date 2024-03-25
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2024.03.21.585980
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Annotating functional RNAs in genomes using Infernal.

    Nawrocki, Eric P

    Methods in molecular biology (Clifton, N.J.)

    2014  Volume 1097, Page(s) 163–197

    Abstract: Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying ... ...

    Abstract Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying RNAs, including specific tools tailored to a particular RNA family as well as more general ones designed to work for any family. Many of these tools utilize covariance models (CMs), statistical models of the conserved sequence, and structure of an RNA family. In this chapter, as an illustrative example, the Infernal software package and CMs from the Rfam database are used to identify RNAs in the genome of the archaeon Methanobrevibacter ruminantium, uncovering some additional RNAs not present in the genome's initial annotation. Analysis of the results and comparison with family-specific methods demonstrate some important strengths and weaknesses of this general approach.
    MeSH term(s) Computational Biology/methods ; Genomics/methods ; Molecular Sequence Annotation/methods ; RNA/chemistry ; RNA/genetics ; Sequence Analysis, RNA/methods ; Software
    Chemical Substances RNA (63231-63-0)
    Language English
    Publishing date 2014
    Publishing country United States
    Document type Journal Article
    ISSN 1940-6029
    ISSN (online) 1940-6029
    DOI 10.1007/978-1-62703-709-9_9
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Group I introns are widespread in archaea.

    Nawrocki, Eric P / Jones, Thomas A / Eddy, Sean R

    Nucleic acids research

    2018  Volume 46, Issue 15, Page(s) 7970–7976

    Abstract: Group I catalytic introns have been found in bacterial, viral, organellar, and some eukaryotic genomes, but not in archaea. All known archaeal introns are bulge-helix-bulge (BHB) introns, with the exception of a few group II introns. It has been proposed ...

    Abstract Group I catalytic introns have been found in bacterial, viral, organellar, and some eukaryotic genomes, but not in archaea. All known archaeal introns are bulge-helix-bulge (BHB) introns, with the exception of a few group II introns. It has been proposed that BHB introns arose from extinct group I intron ancestors, much like eukaryotic spliceosomal introns are thought to have descended from group II introns. However, group I introns have little sequence conservation, making them difficult to detect with standard sequence similarity searches. Taking advantage of recent improvements in a computational homology search method that accounts for both conserved sequence and RNA secondary structure, we have identified 39 group I introns in a wide range of archaeal phyla, including examples of group I introns and BHB introns in the same host gene.
    MeSH term(s) Archaea/classification ; Archaea/enzymology ; Archaea/genetics ; Base Sequence ; Introns/genetics ; Nucleic Acid Conformation ; Phylogeny ; RNA, Archaeal/chemistry ; RNA, Archaeal/classification ; RNA, Archaeal/genetics ; RNA, Catalytic/chemistry ; RNA, Catalytic/classification ; RNA, Catalytic/genetics ; Species Specificity
    Chemical Substances RNA, Archaeal ; RNA, Catalytic
    Language English
    Publishing date 2018-05-22
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, N.I.H., Intramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gky414
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Ribovore: ribosomal RNA sequence analysis for GenBank submissions and database curation.

    Schäffer, Alejandro A / McVeigh, Richard / Robbertse, Barbara / Schoch, Conrad L / Johnston, Anjanette / Underwood, Beverly A / Karsch-Mizrachi, Ilene / Nawrocki, Eric P

    BMC bioinformatics

    2021  Volume 22, Issue 1, Page(s) 400

    Abstract: Background: The DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is ... ...

    Abstract Background: The DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is typically used to identify bacterial and archaeal species. The nuclear 18S SSU rRNA gene, and 28S large subunit (LSU) rRNA gene have been used as DNA barcodes and for phylogenetic studies in different eukaryote taxonomic groups. Because of their popularity, the National Center for Biotechnology Information (NCBI) receives a disproportionate number of rRNA sequence submissions and BLAST queries. These sequences vary in quality, length, origin (nuclear, mitochondria, plastid), and organism source and can represent any region of the ribosomal cistron.
    Results: To improve the timely verification of quality, origin and loci boundaries, we developed Ribovore, a software package for sequence analysis of rRNA sequences. The ribotyper and ribosensor programs are used to validate incoming sequences of bacterial and archaeal SSU rRNA. The ribodbmaker program is used to create high-quality datasets of rRNAs from different taxonomic groups. Key algorithmic steps include comparing candidate sequences against rRNA sequence profile hidden Markov models (HMMs) and covariance models of rRNA sequence and secondary-structure conservation, as well as other tests. Nine freely available blastn rRNA databases created and maintained with Ribovore are used for checking incoming GenBank submissions and used by the blastn browser interface at NCBI. Since 2018, Ribovore has been used to analyze more than 50 million prokaryotic SSU rRNA sequences submitted to GenBank, and to select at least 10,435 fungal rRNA RefSeq records from type material of 8350 taxa.
    Conclusion: Ribovore combines single-sequence and profile-based methods to improve GenBank processing and analysis of rRNA sequences. It is a standalone, portable, and extensible software package for the alignment, classification and validation of rRNA sequences. Researchers planning on submitting SSU rRNA sequences to GenBank are encouraged to download and use Ribovore to analyze their sequences prior to submission to determine which sequences are likely to be automatically accepted into GenBank.
    MeSH term(s) DNA, Ribosomal ; Databases, Nucleic Acid ; Phylogeny ; RNA, Ribosomal ; RNA, Ribosomal, 16S/genetics ; RNA, Ribosomal, 18S/genetics ; Sequence Analysis, RNA
    Chemical Substances DNA, Ribosomal ; RNA, Ribosomal ; RNA, Ribosomal, 16S ; RNA, Ribosomal, 18S
    Language English
    Publishing date 2021-08-12
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-021-04316-z
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Infernal 1.1: 100-fold faster RNA homology searches.

    Nawrocki, Eric P / Eddy, Sean R

    Bioinformatics (Oxford, England)

    2013  Volume 29, Issue 22, Page(s) 2933–2935

    Abstract: Summary: Infernal builds probabilistic profiles of the sequence and secondary structure of an RNA family called covariance models (CMs) from structurally annotated multiple sequence alignments given as input. Infernal uses CMs to search for new family ... ...

    Abstract Summary: Infernal builds probabilistic profiles of the sequence and secondary structure of an RNA family called covariance models (CMs) from structurally annotated multiple sequence alignments given as input. Infernal uses CMs to search for new family members in sequence databases and to create potentially large multiple sequence alignments. Version 1.1 of Infernal introduces a new filter pipeline for RNA homology search based on accelerated profile hidden Markov model (HMM) methods and HMM-banded CM alignment methods. This enables ∼100-fold acceleration over the previous version and ∼10 000-fold acceleration over exhaustive non-filtered CM searches.
    Availability: Source code, documentation and the benchmark are downloadable from http://infernal.janelia.org. Infernal is freely licensed under the GNU GPLv3 and should be portable to any POSIX-compliant operating system, including Linux and Mac OS/X. Documentation includes a user's guide with a tutorial, a discussion of file formats and user options and additional details on methods implemented in the software.
    Contact: nawrockie@janelia.hhmi.org
    MeSH term(s) Algorithms ; Nucleic Acid Conformation ; RNA/chemistry ; Sequence Alignment/methods ; Sequence Analysis, RNA ; Sequence Homology, Nucleic Acid ; Software
    Chemical Substances RNA (63231-63-0)
    Language English
    Publishing date 2013-09-04
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btt509
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Special focus: bioinformatics.

    Nawrocki, Eric P / Burge, Sarah W

    RNA biology

    2013  Volume 10, Issue 7, Page(s) 1160

    MeSH term(s) Computational Biology/methods ; RNA/chemistry ; RNA/physiology
    Chemical Substances RNA (63231-63-0)
    Language English
    Publishing date 2013-08-17
    Publishing country United States
    Document type Editorial
    ISSN 1555-8584
    ISSN (online) 1555-8584
    DOI 10.4161/rna.25606
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: Computational identification of functional RNA homologs in metagenomic data.

    Nawrocki, Eric P / Eddy, Sean R

    RNA biology

    2013  Volume 10, Issue 7, Page(s) 1170–1179

    Abstract: A key step toward understanding a metagenomics data set is the identification of functional sequence elements within it, such as protein coding genes and structural RNAs. Relative to protein coding genes, structural RNAs are more difficult to identify ... ...

    Abstract A key step toward understanding a metagenomics data set is the identification of functional sequence elements within it, such as protein coding genes and structural RNAs. Relative to protein coding genes, structural RNAs are more difficult to identify because of their reduced alphabet size, lack of open reading frames, and short length. Infernal is a software package that implements "covariance models" (CMs) for RNA homology search, which harness both sequence and structural conservation when searching for RNA homologs. Thanks to the added statistical signal inherent in the secondary structure conservation of many RNA families, Infernal is more powerful than sequence-only based methods such as BLAST and profile HMMs. Together with the Rfam database of CMs, Infernal is a useful tool for identifying RNAs in metagenomics data sets.
    MeSH term(s) Algorithms ; Computational Biology/methods ; Databases, Nucleic Acid ; Metagenomics ; Nucleic Acid Conformation ; RNA/chemistry ; RNA/genetics ; Search Engine ; Sequence Homology, Nucleic Acid ; Software
    Chemical Substances RNA (63231-63-0)
    Language English
    Publishing date 2013-05-20
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Review
    ISSN 1555-8584
    ISSN (online) 1555-8584
    DOI 10.4161/rna.25038
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top