LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 50

Search options

  1. Article ; Online: UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation.

    Jackman, Shaun D / Bohlmann, Joerg / Birol, İnanç

    PloS one

    2015  Volume 10, Issue 5, Page(s) e0128026

    Abstract: When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. ... ...

    Abstract When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper.
    MeSH term(s) Genome, Human ; Humans ; Molecular Sequence Annotation/methods ; Sequence Analysis, DNA/methods ; Software
    Language English
    Publishing date 2015
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 1932-6203
    ISSN (online) 1932-6203
    DOI 10.1371/journal.pone.0128026
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Assembling genomes using short-read sequencing technology.

    Jackman, Shaun D / Birol, Inanç

    Genome biology

    2010  Volume 11, Issue 1, Page(s) 202

    Abstract: Gigabase-scale genome assemblies are now feasible using short-read sequencing technology, bringing the cost of such projects below the million-dollar mark. ...

    Abstract Gigabase-scale genome assemblies are now feasible using short-read sequencing technology, bringing the cost of such projects below the million-dollar mark.
    MeSH term(s) Animals ; Chromosome Mapping ; Computational Biology/methods ; Contig Mapping ; Genetic Techniques ; Genome ; Genome, Fungal ; Genome, Plant ; Humans ; Models, Genetic ; Pseudomonas/genetics ; Sequence Analysis, DNA/methods ; Ursidae/genetics ; Zea mays/genetics
    Language English
    Publishing date 2010-01-28
    Publishing country England
    Document type Journal Article ; Review
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/gb-2010-11-1-202
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: UniqTag

    Shaun D Jackman / Joerg Bohlmann / İnanç Birol

    PLoS ONE, Vol 10, Iss 5, p e

    Content-Derived Unique and Stable Identifiers for Gene Annotation.

    2015  Volume 0128026

    Abstract: When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. ... ...

    Abstract When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper.
    Keywords Medicine ; R ; Science ; Q
    Subject code 612
    Language English
    Publishing date 2015-01-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: ORCA: a comprehensive bioinformatics container environment for education and research.

    Jackman, Shaun D / Mozgacheva, Tatyana / Chen, Susie / O'Huiginn, Brendan / Bailey, Lance / Birol, Inanc / Jones, Steven J M

    Bioinformatics (Oxford, England)

    2019  Volume 35, Issue 21, Page(s) 4448–4450

    Abstract: Summary: The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education ... ...

    Abstract Summary: The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education and research. The ORCA environment on a server is implemented using Docker containers, but without requiring users to interact directly with Docker, suitable for novices who may not yet have familiarity with managing containers. ORCA has been used successfully to provide a private bioinformatics environment to external collaborators at a large genome institute, for teaching an undergraduate class on bioinformatics targeted at biologists, and to provide a ready-to-go bioinformatics suite for a hackathon. Using ORCA eliminates time that would be spent debugging software installation issues, so that time may be better spent on education and research.
    Availability and implementation: The ORCA Docker image is available at https://hub.docker.com/r/bcgsc/orca/. The source code of ORCA is available at https://github.com/bcgsc/orca under the MIT license.
    MeSH term(s) Computational Biology ; Genome ; Software
    Language English
    Publishing date 2019-04-19
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btz278
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Complete Mitochondrial Genome of a Gymnosperm, Sitka Spruce (Picea sitchensis), Indicates a Complex Physical Structure.

    Jackman, Shaun D / Coombe, Lauren / Warren, René L / Kirk, Heather / Trinh, Eva / MacLeod, Tina / Pleasance, Stephen / Pandoh, Pawan / Zhao, Yongjun / Coope, Robin J / Bousquet, Jean / Bohlmann, Joerg / Jones, Steven J M / Birol, Inanc

    Genome biology and evolution

    2020  Volume 12, Issue 7, Page(s) 1174–1179

    Abstract: Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, ... ...

    Abstract Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the complete 5.5-Mb mitochondrial genome of Sitka spruce (Picea sitchensis), to date, one of the largest mitochondrial genomes of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads based on sequence homology to the white spruce mitochondrial genome. The assembly graph shows a multipartite genome structure, composed of one smaller 168-kb circular segment of DNA, and a larger 5.4-Mb single component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.
    Language English
    Publishing date 2020-05-25
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 1759-6653
    ISSN (online) 1759-6653
    DOI 10.1093/gbe/evaa108
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: The western redcedar genome reveals low genetic diversity in a self-compatible conifer.

    Shalev, Tal J / Gamal El-Dien, Omnia / Yuen, Macaire M S / Shengqiang, Shu / Jackman, Shaun D / Warren, René L / Coombe, Lauren / van der Merwe, Lise / Stewart, Ada / Boston, Lori B / Plott, Christopher / Jenkins, Jerry / He, Guifen / Yan, Juying / Yan, Mi / Guo, Jie / Breinholt, Jesse W / Neves, Leandro G / Grimwood, Jane /
    Rieseberg, Loren H / Schmutz, Jeremy / Birol, Inanc / Kirst, Matias / Yanchuk, Alvin D / Ritland, Carol / Russell, John H / Bohlmann, Joerg

    Genome research

    2022  Volume 32, Issue 10, Page(s) 1952–1964

    Abstract: We assembled the 9.8-Gbp genome of western redcedar (WRC; ...

    Abstract We assembled the 9.8-Gbp genome of western redcedar (WRC;
    MeSH term(s) Tracheophyta/genetics ; Self-Fertilization/genetics ; Alleles ; Heterozygote ; Polymorphism, Genetic ; Genetic Variation ; Selection, Genetic
    Language English
    Publishing date 2022-09-15
    Publishing country United States
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, Non-U.S. Gov't
    ZDB-ID 1284872-4
    ISSN 1549-5469 ; 1088-9051 ; 1054-9803
    ISSN (online) 1549-5469
    ISSN 1088-9051 ; 1054-9803
    DOI 10.1101/gr.276358.121
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.

    Coombe, Lauren / Zhang, Jessica / Vandervalk, Benjamin P / Chu, Justin / Jackman, Shaun D / Birol, Inanc / Warren, René L

    BMC bioinformatics

    2018  Volume 19, Issue 1, Page(s) 234

    Abstract: Background: The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an ... ...

    Abstract Background: The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time.
    Results: Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13).
    Conclusions: ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.
    MeSH term(s) Chromosomes, Human/genetics ; Genome, Human ; Genomics/methods ; High-Throughput Nucleotide Sequencing/methods ; Humans ; Sequence Analysis, DNA/methods ; Software
    Language English
    Publishing date 2018-06-20
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-018-2243-x
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Complete Chloroplast Genome Sequence of a White Spruce (Picea glauca, Genotype WS77111) from Eastern Canada.

    Lin, Diana / Coombe, Lauren / Jackman, Shaun D / Gagalova, Kristina K / Warren, René L / Hammond, S Austin / Kirk, Heather / Pandoh, Pawan / Zhao, Yongjun / Moore, Richard A / Mungall, Andrew J / Ritland, Carol / Jaquish, Barry / Isabel, Nathalie / Bousquet, Jean / Jones, Steven J M / Bohlmann, Joerg / Birol, Inanc

    Microbiology resource announcements

    2019  Volume 8, Issue 23

    Abstract: Here, we present the complete chloroplast genome sequence of white spruce ( ...

    Abstract Here, we present the complete chloroplast genome sequence of white spruce (
    Language English
    Publishing date 2019-06-06
    Publishing country United States
    Document type Journal Article
    ISSN 2576-098X
    ISSN (online) 2576-098X
    DOI 10.1128/MRA.00381-19
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Sealer: a scalable gap-closing application for finishing draft genomes.

    Paulino, Daniel / Warren, René L / Vandervalk, Benjamin P / Raymond, Anthony / Jackman, Shaun D / Birol, Inanç

    BMC bioinformatics

    2015  Volume 16, Page(s) 230

    Abstract: Background: While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low ... ...

    Abstract Background: While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment "gaps" - uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes.
    Results: Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8% and 13.8% of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively - a feat that is not possible with other leading tools with the breadth of data used in our study.
    Conclusion: Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release.
    MeSH term(s) Algorithms ; Computational Biology/methods ; Genome, Human ; Genome, Plant ; High-Throughput Nucleotide Sequencing ; Humans ; Internet ; Pinaceae/genetics ; Sequence Analysis, DNA ; User-Computer Interface
    Language English
    Publishing date 2015-07-25
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-015-0663-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: Assembling genomes using short-read sequencing technology

    Jackman, Shaun D / İnanç Birol

    Genome biology. 2010 Jan., v. 11, no. 1

    2010  

    Abstract: Gigabase-scale genome assemblies are now feasible using short-read sequencing technology, bringing the cost of such projects below the million-dollar mark. ...

    Abstract Gigabase-scale genome assemblies are now feasible using short-read sequencing technology, bringing the cost of such projects below the million-dollar mark.
    Keywords genome ; genome assembly ; nucleotide sequences
    Language English
    Dates of publication 2010-01
    Size p. 2395.
    Publishing place Springer-Verlag
    Document type Article
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1465-6914 ; 1465-6906
    ISSN (online) 1474-760X ; 1465-6914
    ISSN 1465-6906
    DOI 10.1186/gb-2010-11-1-202
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

To top