LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 46

Search options

  1. Article ; Online: Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates.

    Kunin, Victor / Engelbrektson, Anna / Ochman, Howard / Hugenholtz, Philip

    Environmental microbiology

    2010  Volume 12, Issue 1, Page(s) 118–123

    Abstract: Massively parallel pyrosequencing of the small subunit (16S) ribosomal RNA gene has revealed that the extent of rare microbial populations in several environments, the 'rare biosphere', is orders of magnitude higher than previously thought. One important ...

    Abstract Massively parallel pyrosequencing of the small subunit (16S) ribosomal RNA gene has revealed that the extent of rare microbial populations in several environments, the 'rare biosphere', is orders of magnitude higher than previously thought. One important caveat with this method is that sequencing error could artificially inflate diversity estimates. Although the per-base error of 16S rDNA amplicon pyrosequencing has been shown to be as good as or lower than Sanger sequencing, no direct assessments of pyrosequencing errors on diversity estimates have been reported. Using only Escherichia coli MG1655 as a reference template, we find that 16S rDNA diversity is grossly overestimated unless relatively stringent read quality filtering and low clustering thresholds are applied. In particular, the common practice of removing reads with unresolved bases and anomalous read lengths is insufficient to ensure accurate estimates of microbial diversity. Furthermore, common and reproducible homopolymer length errors can result in relatively abundant spurious phylotypes further confounding data interpretation. We suggest that stringent quality-based trimming of 16S pyrotags and clustering thresholds no greater than 97% identity should be used to avoid overestimates of the rare biosphere.
    MeSH term(s) Biodiversity ; Cluster Analysis ; DNA, Bacterial/genetics ; Escherichia coli/genetics ; Genes, Bacterial ; Genetic Variation ; RNA, Ribosomal, 16S/genetics ; Sequence Alignment ; Sequence Analysis, DNA/methods
    Chemical Substances DNA, Bacterial ; RNA, Ribosomal, 16S
    Language English
    Publishing date 2010-01
    Publishing country England
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2020213-1
    ISSN 1462-2920 ; 1462-2912
    ISSN (online) 1462-2920
    ISSN 1462-2912
    DOI 10.1111/j.1462-2920.2009.02051.x
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Effects of OTU clustering and PCR artifacts on microbial diversity estimates.

    Patin, Nastassia V / Kunin, Victor / Lidström, Ulrika / Ashby, Matthew N

    Microbial ecology

    2012  Volume 65, Issue 3, Page(s) 709–719

    Abstract: Next-generation sequencing has increased the coverage of microbial diversity surveys by orders of magnitude, but differentiating artifacts from rare environmental sequences remains a challenge. Clustering 16S rRNA sequences into operational taxonomic ... ...

    Abstract Next-generation sequencing has increased the coverage of microbial diversity surveys by orders of magnitude, but differentiating artifacts from rare environmental sequences remains a challenge. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) organizes sequence data into groups of 97 % identity, helping to reduce data volumes and avoid analyzing sequencing artifacts by grouping them with real sequences. Here, we analyze sequence abundance distributions across environmental samples and show that 16S rRNA sequences of >99 % identity can represent functionally distinct microorganisms, rendering OTU clustering problematic when the goal is an accurate analysis of organism distribution. Strict postsequencing quality control (QC) filters eliminated the most prevalent artifacts without clustering. Further experiments proved that DNA polymerase errors in polymerase chain reaction (PCR) generate a significant number of substitution errors, most of which pass QC filters. Based on our findings, we recommend minimizing the number of PCR cycles in DNA library preparation and applying strict postsequencing QC filters to reduce the most prevalent artifacts while maintaining a high level of accuracy in diversity estimates. We further recommend correlating rare and abundant sequences across environmental samples, rather than clustering into OTUs, to identify remaining sequence artifacts without losing the resolution afforded by high-throughput sequencing.
    MeSH term(s) Actinomycetales/classification ; Actinomycetales/genetics ; Actinomycetales/isolation & purification ; Biodiversity ; DNA Primers/genetics ; DNA, Bacterial/genetics ; High-Throughput Nucleotide Sequencing ; Polymerase Chain Reaction/methods ; Polymerase Chain Reaction/standards ; RNA, Ribosomal, 16S/genetics
    Chemical Substances DNA Primers ; DNA, Bacterial ; RNA, Ribosomal, 16S
    Language English
    Publishing date 2012-12-12
    Publishing country United States
    Document type Journal Article
    ZDB-ID 1462065-0
    ISSN 1432-184X ; 0095-3628
    ISSN (online) 1432-184X
    ISSN 0095-3628
    DOI 10.1007/s00248-012-0145-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Clustering the annotation space of proteins.

    Kunin, Victor / Ouzounis, Christos A

    BMC bioinformatics

    2005  Volume 6, Page(s) 24

    Abstract: Background: Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas.: Results: Here we report a new approach, named CLAN, which clusters proteins ... ...

    Abstract Background: Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas.
    Results: Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within minutes. It is also accurate, recovering consistent protein families agreeing on average in more than 97% with sequence-based protein families from Pfam. Discrepancies between sequence- and annotation-based clusters were scrutinized and the reasons reported. We demonstrate examples for each of these cases, and thoroughly discuss an example of a propagated error in SwissProt: a vacuolar ATPase subunit M9.2 erroneously annotated as vacuolar ATP synthase subunit H. CLAN algorithm is available from the authors and the CLAN database is accessible at http://maine.ebi.ac.uk:8000/cgi-bin/clan/ClanSearch.pl
    Conclusions: CLAN creates refined function-and-sequence specific protein families that can be used for identification and annotation of unknown family members. It also allows easy identification of erroneous annotations by spotting inconsistencies between similarities on annotation and sequence levels.
    MeSH term(s) Adenosine Triphosphatases/chemistry ; Adenosine Triphosphate/chemistry ; Algorithms ; Cluster Analysis ; Computational Biology/methods ; Computer Graphics ; Databases, Factual ; Databases, Genetic ; Databases, Protein ; False Negative Reactions ; Genome ; Humans ; Information Storage and Retrieval ; Internet ; Models, Statistical ; Programming Languages ; Protein Folding ; Proteins/chemistry ; Reproducibility of Results ; Sequence Alignment ; Sequence Analysis, Protein ; Software ; Structural Homology, Protein ; User-Computer Interface ; Vacuolar Proton-Translocating ATPases/chemistry
    Chemical Substances Proteins ; Adenosine Triphosphate (8L70Q75FXE) ; Adenosine Triphosphatases (EC 3.6.1.-) ; Vacuolar Proton-Translocating ATPases (EC 3.6.1.-)
    Language English
    Publishing date 2005-02-09
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/1471-2105-6-24
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article: Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates

    Kunin, Victor / Engelbrektson, Anna / Ochman, Howard / Hugenholtz, Philip

    Environmental microbiology. 2010 Jan., v. 12, no. 1

    2010  

    Abstract: Massively parallel pyrosequencing of the small subunit (16S) ribosomal RNA gene has revealed that the extent of rare microbial populations in several environments, the 'rare biosphere', is orders of magnitude higher than previously thought. One important ...

    Abstract Massively parallel pyrosequencing of the small subunit (16S) ribosomal RNA gene has revealed that the extent of rare microbial populations in several environments, the 'rare biosphere', is orders of magnitude higher than previously thought. One important caveat with this method is that sequencing error could artificially inflate diversity estimates. Although the per-base error of 16S rDNA amplicon pyrosequencing has been shown to be as good as or lower than Sanger sequencing, no direct assessments of pyrosequencing errors on diversity estimates have been reported. Using only Escherichia coli MG1655 as a reference template, we find that 16S rDNA diversity is grossly overestimated unless relatively stringent read quality filtering and low clustering thresholds are applied. In particular, the common practice of removing reads with unresolved bases and anomalous read lengths is insufficient to ensure accurate estimates of microbial diversity. Furthermore, common and reproducible homopolymer length errors can result in relatively abundant spurious phylotypes further confounding data interpretation. We suggest that stringent quality-based trimming of 16S pyrotags and clustering thresholds no greater than 97% identity should be used to avoid overestimates of the rare biosphere.
    Keywords Escherichia coli ; filters ; genes ; ribosomal DNA ; ribosomal RNA
    Language English
    Dates of publication 2010-01
    Size p. 118-123.
    Publisher Blackwell Publishing Ltd
    Publishing place Oxford, UK
    Document type Article
    ZDB-ID 2020213-1
    ISSN 1462-2920 ; 1462-2912
    ISSN (online) 1462-2920
    ISSN 1462-2912
    DOI 10.1111/j.1462-2920.2009.02051.x
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  5. Article ; Online: CRISPR--a widespread system that provides acquired resistance against phages in bacteria and archaea.

    Sorek, Rotem / Kunin, Victor / Hugenholtz, Philip

    Nature reviews. Microbiology

    2008  Volume 6, Issue 3, Page(s) 181–186

    Abstract: Arrays of clustered, regularly interspaced short palindromic repeats (CRISPRs) are widespread in the genomes of many bacteria and almost all archaea. These arrays are composed of direct repeats that are separated by similarly sized non-repetitive spacers. ...

    Abstract Arrays of clustered, regularly interspaced short palindromic repeats (CRISPRs) are widespread in the genomes of many bacteria and almost all archaea. These arrays are composed of direct repeats that are separated by similarly sized non-repetitive spacers. CRISPR arrays, together with a group of associated proteins, confer resistance to phages, possibly by an RNA-interference-like mechanism. This Progress discusses the structure and function of this newly recognized antiviral mechanism.
    MeSH term(s) Archaea/genetics ; Archaea/virology ; Bacteria/genetics ; Bacteria/virology ; Bacterial Proteins/genetics ; Bacteriophages/physiology ; DNA, Intergenic ; Gene Silencing ; Genome, Archaeal ; Genome, Bacterial ; Interspersed Repetitive Sequences/physiology ; Multigene Family/genetics ; Viral Interference
    Chemical Substances Bacterial Proteins ; DNA, Intergenic
    Language English
    Publishing date 2008-03
    Publishing country England
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S. ; Review
    ZDB-ID 2139054-X
    ISSN 1740-1534 ; 1740-1526
    ISSN (online) 1740-1534
    ISSN 1740-1526
    DOI 10.1038/nrmicro1793
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article: GeneTRACE-reconstruction of gene content of ancestral species.

    Kunin, Victor / Ouzounis, Christos A

    Bioinformatics (Oxford, England)

    2003  Volume 19, Issue 11, Page(s) 1412–1416

    Abstract: While current computational methods allow the reconstruction of individual ancestral protein sequences, reconstruction of complete gene content of ancestral species is not yet an established task. In this paper, we describe GENETRACE, an efficient linear- ...

    Abstract While current computational methods allow the reconstruction of individual ancestral protein sequences, reconstruction of complete gene content of ancestral species is not yet an established task. In this paper, we describe GENETRACE, an efficient linear-time algorithm that allows the reconstruction of evolutionary history of individual protein families as well as the complete gene content of ancestral species. The performance of the method was validated with a simulated evolution program called SimulEv. Our results indicate that given a set of correct phylogenetic profiles and a correct species tree, ancestral gene content can be reconstructed with sensitivity and selectivity of more than 90%. SimulEv simulations were also used to evaluate performance of the reconstruction of gene content-based phylogenetic trees, suggesting that these trees may be accurate at the terminal branches but suffer from long branch attraction near the root of the tree.
    MeSH term(s) Algorithms ; Computer Simulation ; Evolution, Molecular ; Gene Expression Profiling/methods ; Genetic Variation/genetics ; Genome ; Linear Models ; Models, Genetic ; Models, Statistical ; Phylogeny ; Proteins/chemistry ; Proteins/genetics ; Reproducibility of Results ; Sensitivity and Specificity ; Sequence Alignment/methods ; Sequence Analysis, Protein/methods
    Chemical Substances Proteins
    Language English
    Publishing date 2003-05-16
    Publishing country England
    Document type Comparative Study ; Evaluation Studies ; Journal Article ; Research Support, Non-U.S. Gov't ; Validation Studies
    ZDB-ID 1422668-6
    ISSN 1367-4803
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btg174
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: The balance of driving forces during genome evolution in prokaryotes.

    Kunin, Victor / Ouzounis, Christos A

    Genome research

    2003  Volume 13, Issue 7, Page(s) 1589–1594

    Abstract: Genomes are shaped by evolutionary processes such as gene genesis, horizontal gene transfer (HGT), and gene loss. To quantify the relative contributions of these processes, we analyze the distribution of 12,762 protein families on a phylogenetic tree, ... ...

    Abstract Genomes are shaped by evolutionary processes such as gene genesis, horizontal gene transfer (HGT), and gene loss. To quantify the relative contributions of these processes, we analyze the distribution of 12,762 protein families on a phylogenetic tree, derived from entire genomes of 41 Bacteria and 10 Archaea. We show that gene loss is the most important factor in shaping genome content, being up to three times more frequent than HGT, followed by gene genesis, which may contribute up to twice as many genes as HGT. We suggest that gene gain and gene loss in prokaryotes are balanced; thus, on average, prokaryotic genome size is kept constant. Despite the importance of HGT, our results indicate that the majority of protein families have only been transmitted by vertical inheritance. To test our method, we present a study of strain-specific genes of Helicobacter pylori, and demonstrate correct predictions of gene loss and HGT for at least 81% of validated cases. This approach indicates that it is possible to trace genome content history and quantify the factors that shape contemporary prokaryotic genomes.
    MeSH term(s) Archaeal Proteins/genetics ; Bacterial Proteins/genetics ; Computational Biology/methods ; Computational Biology/statistics & numerical data ; Databases, Protein ; Evolution, Molecular ; Gene Amplification/genetics ; Gene Deletion ; Gene Transfer, Horizontal/genetics ; Genes, Archaeal/genetics ; Genes, Bacterial/genetics ; Genome, Archaeal ; Genome, Bacterial ; Helicobacter pylori/genetics ; Models, Genetic ; Phylogeny ; Prokaryotic Cells/chemistry ; Prokaryotic Cells/metabolism ; Species Specificity
    Chemical Substances Archaeal Proteins ; Bacterial Proteins
    Language English
    Publishing date 2003-06-25
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Validation Study
    ZDB-ID 1284872-4
    ISSN 1549-5469 ; 1088-9051 ; 1054-9803
    ISSN (online) 1549-5469
    ISSN 1088-9051 ; 1054-9803
    DOI 10.1101/gr.1092603
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Clustering the annotation space of proteins

    Ouzounis Christos A / Kunin Victor

    BMC Bioinformatics, Vol 6, Iss 1, p

    2005  Volume 24

    Abstract: Abstract Background Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. Results Here we report a new approach, named CLAN, which clusters proteins ... ...

    Abstract Abstract Background Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. Results Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within minutes. It is also accurate, recovering consistent protein families agreeing on average in more than 97% with sequence-based protein families from Pfam. Discrepancies between sequence- and annotation-based clusters were scrutinized and the reasons reported. We demonstrate examples for each of these cases, and thoroughly discuss an example of a propagated error in SwissProt: a vacuolar ATPase subunit M9.2 erroneously annotated as vacuolar ATP synthase subunit H. CLAN algorithm is available from the authors and the CLAN database is accessible at http://maine.ebi.ac.uk:8000/cgi-bin/clan/ClanSearch.pl Conclusions CLAN creates refined function-and-sequence specific protein families that can be used for identification and annotation of unknown family members. It also allows easy identification of erroneous annotations by spotting inconsistencies between similarities on annotation and sequence levels.
    Keywords Computer applications to medicine. Medical informatics ; R858-859.7 ; Biology (General) ; QH301-705.5
    Subject code 005
    Language English
    Publishing date 2005-02-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article ; Online: Evolutionary conservation of sequence and secondary structures in CRISPR repeats.

    Kunin, Victor / Sorek, Rotem / Hugenholtz, Philip

    Genome biology

    2007  Volume 8, Issue 4, Page(s) R61

    Abstract: Background: Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes ... ...

    Abstract Background: Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been recently shown that CRISPR provides acquired resistance against viruses in prokaryotes.
    Results: Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. Some of the clusters present stable, highly conserved RNA secondary structures, while others lack detectable structures. Stable secondary structures exhibit multiple compensatory base changes in the stem region, indicating evolutionary and functional conservation.
    Conclusion: We show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification, including specific relationships between CRISPR and CAS subtypes.
    MeSH term(s) Archaea/genetics ; Bacteria/genetics ; Base Sequence ; Cluster Analysis ; Conserved Sequence ; Evolution, Molecular ; Genome, Bacterial ; Multigene Family ; Nucleic Acid Conformation ; RNA, Archaeal/chemistry ; RNA, Bacterial/chemistry ; Repetitive Sequences, Nucleic Acid ; Sequence Analysis, RNA ; Software
    Chemical Substances RNA, Archaeal ; RNA, Bacterial
    Language English
    Publishing date 2007
    Publishing country England
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1465-6914 ; 1465-6906
    ISSN (online) 1474-760X ; 1465-6914
    ISSN 1465-6906
    DOI 10.1186/gb-2007-8-4-r61
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: Evolutionary conservation of sequence and secondary structures in CRISPR repeats

    Kunin, Victor / Philip Hugenholtz / Rotem Sorek

    Genome biology. 2007 June, v. 8, no. 4

    2007  

    Abstract: BACKGROUND: Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes ... ...

    Abstract BACKGROUND: Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been recently shown that CRISPR provides acquired resistance against viruses in prokaryotes. RESULTS: Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. Some of the clusters present stable, highly conserved RNA secondary structures, while others lack detectable structures. Stable secondary structures exhibit multiple compensatory base changes in the stem region, indicating evolutionary and functional conservation. CONCLUSION: We show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification, including specific relationships between CRISPR and CAS subtypes.
    Keywords genes ; germplasm conservation ; prokaryotic cells ; RNA ; sequence homology ; viruses
    Language English
    Dates of publication 2007-06
    Size p. 1543.
    Publishing place Springer-Verlag
    Document type Article
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1465-6914 ; 1465-6906
    ISSN (online) 1474-760X ; 1465-6914
    ISSN 1465-6906
    DOI 10.1186/gb-2007-8-4-r61
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

To top