LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 18

Search options

  1. Article ; Online: Collection and curation of prokaryotic genome assemblies from type strains at NCBI.

    Kannan, Sivakumar / Sharma, Shobha / Ciufo, Stacy / Clark, Karen / Turner, Seán / Kitts, Paul A / Schoch, Conrad L / DiCuccio, Michael / Kimchi, Avi

    International journal of systematic and evolutionary microbiology

    2023  Volume 73, Issue 1

    Abstract: The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for ...

    Abstract The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.
    MeSH term(s) Sequence Analysis, DNA ; Reproducibility of Results ; RNA, Ribosomal, 16S/genetics ; Phylogeny ; Base Composition ; DNA, Bacterial/genetics ; Bacterial Typing Techniques ; Fatty Acids/chemistry ; Databases, Nucleic Acid
    Chemical Substances RNA, Ribosomal, 16S ; DNA, Bacterial ; Fatty Acids
    Language English
    Publishing date 2023-02-20
    Publishing country England
    Document type Journal Article
    ZDB-ID 2002336-4
    ISSN 1466-5034 ; 1466-5026
    ISSN (online) 1466-5034
    ISSN 1466-5026
    DOI 10.1099/ijsem.0.005707
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening

    Schäffer, Alejandro A / Nawrocki, Eric P / Choi, Yoon / Kitts, Paul A / Karsch-Mizrachi, Ilene / McVeigh, Richard / Hancock, John

    Bioinformatics. 2018 Mar. 01, v. 34, no. 5

    2018  

    Abstract: Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for ... ...

    Abstract Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches. A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions. Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. Supplementary data are available at Bioinformatics online.
    Keywords Escherichia coli ; National Center for Biotechnology Information ; bioinformatics ; computer software ; genetic databases ; genetic vectors ; nucleotide sequences ; retrospective studies ; screening
    Language English
    Dates of publication 2018-0301
    Size p. 755-759.
    Publishing place Oxford University Press
    Document type Article
    ZDB-ID 1468345-3
    ISSN 1460-2059 ; 1367-4811 ; 1367-4803
    ISSN (online) 1460-2059 ; 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btx669
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  3. Article ; Online: VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

    Schäffer, Alejandro A / Nawrocki, Eric P / Choi, Yoon / Kitts, Paul A / Karsch-Mizrachi, Ilene / McVeigh, Richard

    Bioinformatics (Oxford, England)

    2017  Volume 34, Issue 5, Page(s) 755–759

    Abstract: Motivation: Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted ... ...

    Abstract Motivation: Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches.
    Results: A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions.
    Availability and implementation: Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy.
    Contact: aschaffe@helix.nih.gov.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Bacteria ; Databases, Nucleic Acid/standards ; Eukaryota ; Sequence Analysis, DNA/methods ; Software
    Language English
    Publishing date 2017-10-25
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Intramural
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btx669
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Database resources of the National Center for Biotechnology Information.

    Sayers, Eric W / Beck, Jeff / Brister, J Rodney / Bolton, Evan E / Canese, Kathi / Comeau, Donald C / Funk, Kathryn / Ketter, Anne / Kim, Sunghwan / Kimchi, Avi / Kitts, Paul A / Kuznetsov, Anatoliy / Lathrop, Stacy / Lu, Zhiyong / McGarvey, Kelly / Madden, Thomas L / Murphy, Terence D / O'Leary, Nuala / Phan, Lon /
    Schneider, Valerie A / Thibaud-Nissen, Françoise / Trawick, Bart W / Pruitt, Kim D / Ostell, James

    Nucleic acids research

    2019  Volume 48, Issue D1, Page(s) D9–D16

    Abstract: The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in ...

    Abstract The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
    MeSH term(s) Computational Biology/methods ; Computational Biology/organization & administration ; Databases, Genetic ; Databases, Nucleic Acid ; Genomics/methods ; Humans ; National Library of Medicine (U.S.) ; PubMed ; United States ; Web Browser
    Language English
    Publishing date 2019-10-11
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Intramural
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gkz899
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Database resources of the National Center for Biotechnology Information.

    Sayers, Eric W / Agarwala, Richa / Bolton, Evan E / Brister, J Rodney / Canese, Kathi / Clark, Karen / Connor, Ryan / Fiorini, Nicolas / Funk, Kathryn / Hefferon, Timothy / Holmes, J Bradley / Kim, Sunghwan / Kimchi, Avi / Kitts, Paul A / Lathrop, Stacy / Lu, Zhiyong / Madden, Thomas L / Marchler-Bauer, Aron / Phan, Lon /
    Schneider, Valerie A / Schoch, Conrad L / Pruitt, Kim D / Ostell, James

    Nucleic acids research

    2018  Volume 47, Issue D1, Page(s) D23–D28

    Abstract: The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in ...

    Abstract The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
    MeSH term(s) Animals ; Biotechnology/methods ; Biotechnology/organization & administration ; Databases, Chemical ; Databases, Genetic ; Humans ; Software ; United States/epidemiology ; Web Browser
    Language English
    Publishing date 2018-11-05
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Intramural
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gky1069
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Assembly: a resource for assembled genomes at NCBI.

    Kitts, Paul A / Church, Deanna M / Thibaud-Nissen, Françoise / Choi, Jinna / Hem, Vichet / Sapojnikov, Victor / Smith, Robert G / Tatusova, Tatiana / Xiang, Charlie / Zherikov, Andrey / DiCuccio, Michael / Murphy, Terence D / Pruitt, Kim D / Kimchi, Avi

    Nucleic acids research

    2016  Volume 44, Issue D1, Page(s) D73–80

    Abstract: The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or ... ...

    Abstract The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.
    MeSH term(s) Animals ; Databases, Nucleic Acid ; Genome ; Genomics ; Humans ; Internet ; Mice
    Language English
    Publishing date 2016-01-04
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Intramural
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gkv1226
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Extending reference assembly models.

    Church, Deanna M / Schneider, Valerie A / Steinberg, Karyn Meltz / Schatz, Michael C / Quinlan, Aaron R / Chin, Chen-Shan / Kitts, Paul A / Aken, Bronwen / Marth, Gabor T / Hoffman, Michael M / Herrero, Javier / Mendoza, M Lisandra Zepeda / Durbin, Richard / Flicek, Paul

    Genome biology

    2015  Volume 16, Page(s) 13

    Abstract: The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human ... ...

    Abstract The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
    MeSH term(s) Computational Biology/methods ; Databases, Genetic ; Genome, Human ; Genomics/methods ; Humans ; Software
    Language English
    Publishing date 2015-01-24
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, N.I.H., Intramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/s13059-015-0587-3
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Extending reference assembly models

    Church, Deanna M / Schneider, Valerie A / Steinberg, Karyn Meltz / Schatz, Michael C / Quinlan, Aaron R / Chin, Chen-Shan / Kitts, Paul A / Aken, Bronwen / Marth, Gabor T / Hoffman, Michael M / Herrero, Javier / Mendoza, M Lisandra Zepeda / Durbin, Richard / Flicek, Paul

    Genome biology. 2015 Dec., v. 16, no. 1

    2015  

    Abstract: The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human ... ...

    Abstract The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
    Keywords genome ; humans ; models ; sequence diversity
    Language English
    Dates of publication 2015-12
    Size p. 13.
    Publishing place BioMed Central
    Document type Article
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1465-6906
    ISSN (online) 1474-760X
    ISSN 1465-6906
    DOI 10.1186/s13059-015-0587-3
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  9. Article ; Online: Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

    Schneider, Valerie A / Graves-Lindsay, Tina / Howe, Kerstin / Bouk, Nathan / Chen, Hsiu-Chuan / Kitts, Paul A / Murphy, Terence D / Pruitt, Kim D / Thibaud-Nissen, Françoise / Albracht, Derek / Fulton, Robert S / Kremitzki, Milinn / Magrini, Vincent / Markovic, Chris / McGrath, Sean / Steinberg, Karyn Meltz / Auger, Kate / Chow, William / Collins, Joanna /
    Harden, Glenn / Hubbard, Timothy / Pelan, Sarah / Simpson, Jared T / Threadgold, Glen / Torrance, James / Wood, Jonathan M / Clarke, Laura / Koren, Sergey / Boitano, Matthew / Peluso, Paul / Li, Heng / Chin, Chen-Shan / Phillippy, Adam M / Durbin, Richard / Wilson, Richard K / Flicek, Paul / Eichler, Evan E / Church, Deanna M

    Genome research

    2017  Volume 27, Issue 5, Page(s) 849–864

    Abstract: The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses ... ...

    Abstract The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
    MeSH term(s) Contig Mapping/methods ; Contig Mapping/standards ; Genome, Human ; Genomics/methods ; Genomics/standards ; Haploidy ; Haplotypes ; Humans ; Polymorphism, Genetic ; Reference Standards ; Sequence Analysis, DNA/methods ; Sequence Analysis, DNA/standards ; Software
    Language English
    Publishing date 2017-04-10
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Intramural ; Research Support, Non-U.S. Gov't ; Research Support, N.I.H., Extramural
    ZDB-ID 1284872-4
    ISSN 1549-5469 ; 1088-9051 ; 1054-9803
    ISSN (online) 1549-5469
    ISSN 1088-9051 ; 1054-9803
    DOI 10.1101/gr.213611.116
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: RefSeq: an update on mammalian reference sequences.

    Pruitt, Kim D / Brown, Garth R / Hiatt, Susan M / Thibaud-Nissen, Françoise / Astashyn, Alexander / Ermolaeva, Olga / Farrell, Catherine M / Hart, Jennifer / Landrum, Melissa J / McGarvey, Kelly M / Murphy, Michael R / O'Leary, Nuala A / Pujar, Shashikant / Rajput, Bhanu / Rangwala, Sanjida H / Riddick, Lillian D / Shkeda, Andrei / Sun, Hanzhen / Tamez, Pamela /
    Tully, Raymond E / Wallin, Craig / Webb, David / Weber, Janet / Wu, Wendy / DiCuccio, Michael / Kitts, Paul / Maglott, Donna R / Murphy, Terence D / Ostell, James M

    Nucleic acids research

    2013  Volume 42, Issue Database issue, Page(s) D756–63

    Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and ... ...

    Abstract The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.
    MeSH term(s) Animals ; Databases, Genetic ; Eukaryota/genetics ; Exons ; Genome ; Genomics/standards ; Humans ; Internet ; Mammals/genetics ; Molecular Sequence Annotation ; Proteins/chemistry ; Proteins/genetics ; RNA/chemistry ; Reference Standards
    Chemical Substances Proteins ; RNA (63231-63-0)
    Language English
    Publishing date 2013-11-19
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Intramural
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gkt1114
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top