LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 7 of total 7

Search options

  1. Article ; Online: Assessing graph-based read mappers against a baseline approach highlights strengths and weaknesses of current methods.

    Grytten, Ivar / Rand, Knut D / Nederbragt, Alexander J / Sandve, Geir K

    BMC genomics

    2020  Volume 21, Issue 1, Page(s) 282

    Abstract: Background: Graph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show ... ...

    Abstract Background: Graph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show that mapping to graph-based reference genomes can improve accuracy as compared to methods using linear references. Both of these methods index the sequences for most paths up to a certain length in the graph in order to enable direct mapping of reads containing common variants. However, the combinatorial explosion of possible paths through nearby variants also leads to a huge search space and an increased chance of false positive alignments to highly variable regions.
    Results: We here assess three prominent graph-based read mappers against a hybrid baseline approach that combines an initial path determination with a tuned linear read mapping method. We show, using a previously proposed benchmark, that this simple approach is able to improve overall accuracy of read-mapping to graph-based reference genomes.
    Conclusions: Our method is implemented in a tool Two-step Graph Mapper, which is available at https://github.com/uio-bmi/two_step_graph_mapperalong with data and scripts for reproducing the experiments. Our method highlights characteristics of the current generation of graph-based read mappers and shows potential for improvement for future graph-based read mappers.
    MeSH term(s) Computational Biology/methods ; Genome, Human ; High-Throughput Nucleotide Sequencing/methods ; Humans ; Sequence Alignment
    Language English
    Publishing date 2020-04-06
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041499-7
    ISSN 1471-2164 ; 1471-2164
    ISSN (online) 1471-2164
    ISSN 1471-2164
    DOI 10.1186/s12864-020-6685-y
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Graph Peak Caller: Calling ChIP-seq peaks on graph-based reference genomes.

    Grytten, Ivar / Rand, Knut D / Nederbragt, Alexander J / Storvik, Geir O / Glad, Ingrid K / Sandve, Geir K

    PLoS computational biology

    2019  Volume 15, Issue 2, Page(s) e1006731

    Abstract: Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow de novo assembly of graph-based reference ... ...

    Abstract Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow de novo assembly of graph-based reference genomes, alignment of new read sets to the graph representation as well as certain analyses like variant calling and haplotyping. We here present a first method for calling ChIP-Seq peaks on read data aligned to a graph-based reference genome. The method is a graph generalization of the peak caller MACS2, and is implemented in an open source tool, Graph Peak Caller. By using the existing tool vg to build a pan-genome of Arabidopsis thaliana, we validate our approach by showing that Graph Peak Caller with a pan-genome reference graph can trace variants within peaks that are not part of the linear reference genome, and find peaks that in general are more motif-enriched than those found by MACS2.
    MeSH term(s) Algorithms ; Arabidopsis/genetics ; Chromatin Immunoprecipitation/methods ; Genome/genetics ; Genomics/methods ; Protein Binding ; Sequence Analysis, DNA/methods ; Software ; Transcription Factors
    Chemical Substances Transcription Factors
    Language English
    Publishing date 2019-02-19
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2193340-6
    ISSN 1553-7358 ; 1553-734X
    ISSN (online) 1553-7358
    ISSN 1553-734X
    DOI 10.1371/journal.pcbi.1006731
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Graph Peak Caller

    Ivar Grytten / Knut D Rand / Alexander J Nederbragt / Geir O Storvik / Ingrid K Glad / Geir K Sandve

    PLoS Computational Biology, Vol 15, Iss 2, p e

    Calling ChIP-seq peaks on graph-based reference genomes.

    2019  Volume 1006731

    Abstract: Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow de novo assembly of graph-based reference ... ...

    Abstract Graph-based representations are considered to be the future for reference genomes, as they allow integrated representation of the steadily increasing data on individual variation. Currently available tools allow de novo assembly of graph-based reference genomes, alignment of new read sets to the graph representation as well as certain analyses like variant calling and haplotyping. We here present a first method for calling ChIP-Seq peaks on read data aligned to a graph-based reference genome. The method is a graph generalization of the peak caller MACS2, and is implemented in an open source tool, Graph Peak Caller. By using the existing tool vg to build a pan-genome of Arabidopsis thaliana, we validate our approach by showing that Graph Peak Caller with a pan-genome reference graph can trace variants within peaks that are not part of the linear reference genome, and find peaks that in general are more motif-enriched than those found by MACS2.
    Keywords Biology (General) ; QH301-705.5
    Subject code 511
    Language English
    Publishing date 2019-02-01T00:00:00Z
    Publisher Public Library of Science (PLoS)
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: Coordinates and intervals in graph-based reference genomes.

    Rand, Knut D / Grytten, Ivar / Nederbragt, Alexander J / Storvik, Geir O / Glad, Ingrid K / Sandve, Geir K

    BMC bioinformatics

    2017  Volume 18, Issue 1, Page(s) 263

    Abstract: Background: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as ... ...

    Abstract Background: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes.
    Results: We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable.
    Conclusion: More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .
    MeSH term(s) Algorithms ; Computer Graphics ; Genetic Loci ; Genome, Human ; Genomics/methods ; Humans ; Internet ; RNA, Messenger/genetics ; RNA, Messenger/metabolism ; Sequence Analysis, DNA ; Software
    Chemical Substances RNA, Messenger
    Language English
    Publishing date 2017-05-18
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-017-1678-9
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis.

    Salvatore, Stefania / Dagestad Rand, Knut / Grytten, Ivar / Ferkingstad, Egil / Domanska, Diana / Holden, Lars / Gheorghe, Marius / Mathelier, Anthony / Glad, Ingrid / Kjetil Sandve, Geir

    Briefings in bioinformatics

    2019  Volume 21, Issue 5, Page(s) 1523–1530

    Abstract: The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation and ... ...

    Abstract The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation and phenotypic traits. Such relations are typically investigated by comparatively assessing genomic co-occurrence. Technically, this corresponds to assessing the similarity of pairs of genome-wide binary vectors. A variety of similarity measures have been proposed for this problem in other fields like ecology. However, while several of these measures have been employed for assessing genomic co-occurrence, their appropriateness for the genomic setting has never been investigated. We show that the choice of similarity measure may strongly influence results and propose two alternative modelling assumptions that can be used to guide this choice. On both simulated and real genomic data, the Jaccard index is strongly altered by dataset size and should be used with caution. The Forbes coefficient (fold change) and tetrachoric correlation are less influenced by dataset size, but one should be aware of increased variance for small datasets. All results on simulated and real data can be inspected and reproduced at https://hyperbrowser.uio.no/sim-measure.
    MeSH term(s) Algorithms ; Datasets as Topic ; Gene Expression Regulation ; Genetic Variation ; Genomics/methods ; Humans
    Language English
    Publishing date 2019-10-10
    Publishing country England
    Document type Journal Article ; Review
    ZDB-ID 2068142-2
    ISSN 1477-4054 ; 1467-5463
    ISSN (online) 1477-4054
    ISSN 1467-5463
    DOI 10.1093/bib/bbz083
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires.

    Pavlović, Milena / Scheffer, Lonneke / Motwani, Keshav / Kanduri, Chakravarthi / Kompova, Radmila / Vazov, Nikolay / Waagan, Knut / Bernal, Fabian L M / Costa, Alexandre Almeida / Corrie, Brian / Akbar, Rahmad / Al Hajj, Ghadi S / Balaban, Gabriel / Brusko, Todd M / Chernigovskaya, Maria / Christley, Scott / Cowell, Lindsay G / Frank, Robert / Grytten, Ivar /
    Gundersen, Sveinung / Haff, Ingrid Hobæk / Hovig, Eivind / Hsieh, Ping-Han / Klambauer, Günter / Kuijjer, Marieke L / Lund-Andersen, Christin / Martini, Antonio / Minotto, Thomas / Pensar, Johan / Rand, Knut / Riccardi, Enrico / Robert, Philippe A / Rocha, Artur / Slabodkin, Andrei / Snapkov, Igor / Sollid, Ludvig M / Titov, Dmytro / Weber, Cédric R / Widrich, Michael / Yaari, Gur / Greiff, Victor / Sandve, Geir Kjetil

    Nature machine intelligence

    2021  Volume 3, Issue 11, Page(s) 936–944

    Abstract: Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal ... ...

    Abstract Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel deep learning method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.
    Language English
    Publishing date 2021-11-16
    Publishing country England
    Document type Journal Article
    ISSN 2522-5839
    ISSN (online) 2522-5839
    DOI 10.1038/s42256-021-00413-z
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.

    Simovski, Boris / Vodák, Daniel / Gundersen, Sveinung / Domanska, Diana / Azab, Abdulrahman / Holden, Lars / Holden, Marit / Grytten, Ivar / Rand, Knut / Drabløs, Finn / Johansen, Morten / Mora, Antonio / Lund-Andersen, Christin / Fromm, Bastian / Eskeland, Ragnhild / Gabrielsen, Odd Stokke / Ferkingstad, Egil / Nakken, Sigve / Bengtsen, Mads /
    Nederbragt, Alexander Johan / Thorarensen, Hildur Sif / Akse, Johannes Andreas / Glad, Ingrid / Hovig, Eivind / Sandve, Geir Kjetil

    GigaScience

    2017  Volume 6, Issue 7, Page(s) 1–12

    Abstract: Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell ... ...

    Abstract Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation.
    Findings: We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered.
    Conclusions: Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.
    MeSH term(s) Datasets as Topic/standards ; Epigenesis, Genetic ; Epigenomics/methods ; Epigenomics/standards ; Genome, Human ; Humans ; Software ; Whole Genome Sequencing/methods ; Whole Genome Sequencing/standards
    Language English
    Publishing date 2017-04-29
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2708999-X
    ISSN 2047-217X ; 2047-217X
    ISSN (online) 2047-217X
    ISSN 2047-217X
    DOI 10.1093/gigascience/gix032
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top