LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 18

Search options

  1. Article ; Online: Detecting differential transcript usage in complex diseases with SPIT.

    Erdogdu, Beril / Varabyou, Ales / Hicks, Stephanie C / Salzberg, Steven L / Pertea, Mihaela

    Cell reports methods

    2024  Volume 4, Issue 3, Page(s) 100736

    Abstract: Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and developmental stages, contributing to the complexity and diversity of biological systems. In abnormal cells, it can also lead to ...

    Abstract Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and developmental stages, contributing to the complexity and diversity of biological systems. In abnormal cells, it can also lead to deficiencies in protein function and underpin disease pathogenesis. Analyzing DTU via RNA sequencing (RNA-seq) data is vital, but the genetic heterogeneity in populations with complex diseases presents an intricate challenge due to diverse causal events and undetermined subtypes. Although the majority of common diseases in humans are categorized as complex, state-of-the-art DTU analysis methods often overlook this heterogeneity in their models. We therefore developed SPIT, a statistical tool that identifies predominant subgroups in transcript usage within a population along with their distinctive sets of DTU events. This study provides comprehensive assessments of SPIT's methodology and applies it to analyze brain samples from individuals with schizophrenia, revealing previously unreported DTU events in six candidate genes.
    MeSH term(s) Humans ; Gene Expression Profiling/methods ; Sequence Analysis, RNA ; RNA
    Chemical Substances RNA (63231-63-0)
    Language English
    Publishing date 2024-03-19
    Publishing country United States
    Document type Journal Article
    ISSN 2667-2375
    ISSN (online) 2667-2375
    DOI 10.1016/j.crmeth.2024.100736
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage.

    Varabyou, Ales / Erdogdu, Beril / Salzberg, Steven L / Pertea, Mihaela

    Nature computational science

    2023  Volume 3, Issue 8, Page(s) 700–708

    Abstract: ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA ... ...

    Abstract ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
    Language English
    Publishing date 2023-07-31
    Publishing country United States
    Document type Journal Article
    ISSN 2662-8457
    ISSN (online) 2662-8457
    DOI 10.1038/s43588-023-00496-1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article: Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage.

    Varabyou, Ales / Erdogdu, Beril / Salzberg, Steven L / Pertea, Mihaela

    bioRxiv : the preprint server for biology

    2023  

    Abstract: ORFanage is a system designed to assign open reading frames (ORFs) to both known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of ... ...

    Abstract ORFanage is a system designed to assign open reading frames (ORFs) to both known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing (RNA-seq) experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the RefSeq and GENCODE human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
    Language English
    Publishing date 2023-03-25
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.03.23.533704
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article: Detecting differential transcript usage in complex diseases with SPIT.

    Erdogdu, Beril / Varabyou, Ales / Hicks, Stephanie C / Salzberg, Steven L / Pertea, Mihaela

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and different developmental stages, thereby contributing to the complexity and diversity of biological systems. In abnormal cells, ... ...

    Abstract Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and different developmental stages, thereby contributing to the complexity and diversity of biological systems. In abnormal cells, it can also lead to deficiencies in protein function, potentially leading to pathogenesis of diseases. Detecting such events for single-gene genetic traits is relatively uncomplicated; however, the heterogeneity of populations with complex diseases presents an intricate challenge due to the presence of diverse causal events and undetermined subtypes. SPIT is the first statistical tool that quantifies the heterogeneity in transcript usage within a population and identifies predominant subgroups along with their distinctive sets of DTU events. We provide comprehensive assessments of SPIT's methodology in both single-gene and complex traits and report the results of applying SPIT to analyze brain samples from individuals with schizophrenia. Our analysis reveals previously unreported DTU events in six candidate genes.
    Language English
    Publishing date 2023-07-10
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.07.10.548289
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets.

    Varabyou, Ales / Pertea, Geo / Pockrandt, Christopher / Pertea, Mihaela

    Bioinformatics (Oxford, England)

    2021  Volume 37, Issue 20, Page(s) 3650–3651

    Abstract: Summary: Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual ... ...

    Abstract Summary: Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input.
    Availability and implementation: TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush).
    Supplementary information: Supplementary data are available at Bioinformatics online.
    Language English
    Publishing date 2021-05-07
    Publishing country England
    Document type Journal Article
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btab342
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Effects of transcriptional noise on estimates of gene and transcript expression in RNA sequencing experiments.

    Varabyou, Ales / Salzberg, Steven L / Pertea, Mihaela

    Genome research

    2020  Volume 31, Issue 2, Page(s) 301–308

    Abstract: RNA sequencing is widely used to measure gene expression across a vast range of animal and plant tissues and conditions. Most studies of computational methods for gene expression analysis use simulated data to evaluate the accuracy of these methods. ... ...

    Abstract RNA sequencing is widely used to measure gene expression across a vast range of animal and plant tissues and conditions. Most studies of computational methods for gene expression analysis use simulated data to evaluate the accuracy of these methods. These simulations typically include reads generated from known genes at varying levels of expression. Until now, simulations did not include reads from noisy transcripts, which might include erroneous transcription, erroneous splicing, and other processes that affect transcription in living cells. Here we examine the effects of realistic amounts of transcriptional noise on the ability of leading computational methods to assemble and quantify the genes and transcripts in an RNA sequencing experiment. We show that the inclusion of noise leads to systematic errors in the ability of these programs to measure expression, including systematic underestimates of transcript abundance levels and large increases in the number of false-positive genes and transcripts. Our results also suggest that alignment-free computational methods sometimes fail to detect transcripts expressed at relatively low levels.
    Language English
    Publishing date 2020-12-23
    Publishing country United States
    Document type Journal Article
    ZDB-ID 1284872-4
    ISSN 1549-5469 ; 1088-9051 ; 1054-9803
    ISSN (online) 1549-5469
    ISSN 1088-9051 ; 1054-9803
    DOI 10.1101/gr.266213.120
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie.

    Varabyou, Ales / Pockrandt, Christopher / Salzberg, Steven L / Pertea, Mihaela

    Genetics

    2021  Volume 218, Issue 3

    Abstract: The ability to detect recombination in pathogen genomes is crucial to the accuracy of phylogenetic analysis and consequently to forecasting the spread of infectious diseases and to developing therapeutics and public health policies. However, in case of ... ...

    Abstract The ability to detect recombination in pathogen genomes is crucial to the accuracy of phylogenetic analysis and consequently to forecasting the spread of infectious diseases and to developing therapeutics and public health policies. However, in case of the SARS-CoV-2, the low divergence of near-identical genomes sequenced over a short period of time makes conventional analysis infeasible. Using a novel method, we identified 225 anomalous SARS-CoV-2 genomes of likely recombinant origins out of the first 87,695 genomes to be released, several of which have persisted in the population. Bolotie is specifically designed to perform a rapid search for inter-clade recombination events over extremely large datasets, facilitating analysis of novel isolates in seconds. In cases where raw sequencing data were available, we were able to rule out the possibility that these samples represented co-infections by analyzing the underlying sequence reads. The Bolotie software and other data from our study are available at https://github.com/salzberg-lab/bolotie.
    MeSH term(s) Genome, Viral ; Phylogeny ; Recombination, Genetic ; SARS-CoV-2 ; Software
    Language English
    Publishing date 2021-05-12
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2167-2
    ISSN 1943-2631 ; 0016-6731
    ISSN (online) 1943-2631
    ISSN 0016-6731
    DOI 10.1093/genetics/iyab074
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie.

    Varabyou, Ales / Pockrandt, Christopher / Salzberg, Steven L / Pertea, Mihaela

    bioRxiv : the preprint server for biology

    2020  

    Abstract: The ability to detect recombination in pathogen genomes is crucial to the accuracy of phylogenetic analysis and consequently to forecasting the spread of infectious diseases and to developing therapeutics and public health policies. However, previous ... ...

    Abstract The ability to detect recombination in pathogen genomes is crucial to the accuracy of phylogenetic analysis and consequently to forecasting the spread of infectious diseases and to developing therapeutics and public health policies. However, previous methods for detecting recombination and reassortment events cannot handle the computational requirements of analyzing tens of thousands of genomes, a scenario that has now emerged in the effort to track the spread of the SARS-CoV-2 virus. Furthermore, the low divergence of near-identical genomes sequenced in short periods of time presents a statistical challenge not addressed by available methods. In this work we present Bolotie, an efficient method designed to detect recombination and reassortment events between clades of viral genomes. We applied our method to a large collection of SARS-CoV-2 genomes and discovered hundreds of isolates that are likely of a recombinant origin. In cases where raw sequencing data was available, we were able to rule out the possibility that these samples represented co-infections by analyzing the underlying sequence reads. Our findings further show that several recombinants appear to have persisted in the population.
    Keywords covid19
    Language English
    Publishing date 2020-09-21
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2020.09.21.300913
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Structure-guided isoform identification for the human transcriptome.

    Sommer, Markus J / Cha, Sooyoung / Varabyou, Ales / Rincon, Natalia / Park, Sukhwan / Minkin, Ilia / Pertea, Mihaela / Steinegger, Martin / Salzberg, Steven L

    eLife

    2022  Volume 11

    Abstract: Recently developed methods to predict three-dimensional protein structure with high accuracy have opened new avenues for genome and proteome research. We explore a new hypothesis in genome annotation, namely whether computationally predicted structures ... ...

    Abstract Recently developed methods to predict three-dimensional protein structure with high accuracy have opened new avenues for genome and proteome research. We explore a new hypothesis in genome annotation, namely whether computationally predicted structures can help to identify which of multiple possible gene isoforms represents a functional protein product. Guided by protein structure predictions, we evaluated over 230,000 isoforms of human protein-coding genes assembled from over 10,000 RNA sequencing experiments across many human tissues. From this set of assembled transcripts, we identified hundreds of isoforms with more confidently predicted structure and potentially superior function in comparison to canonical isoforms in the latest human gene database. We illustrate our new method with examples where structure provides a guide to function in combination with expression and evolutionary evidence. Additionally, we provide the complete set of structures as a resource to better understand the function of human genes and their isoforms. These results demonstrate the promise of protein structure prediction as a genome annotation tool, allowing us to refine even the most highly curated catalog of human proteins. More generally we demonstrate a practical, structure-guided approach that can be used to enhance the annotation of any genome.
    MeSH term(s) Humans ; Transcriptome ; Molecular Sequence Annotation ; Protein Isoforms/genetics ; Genome ; Sequence Analysis, RNA
    Chemical Substances Protein Isoforms
    Language English
    Publishing date 2022-12-15
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Research Support, N.I.H., Extramural
    ZDB-ID 2687154-3
    ISSN 2050-084X ; 2050-084X
    ISSN (online) 2050-084X
    ISSN 2050-084X
    DOI 10.7554/eLife.82556
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure.

    Varabyou, Ales / Sommer, Markus J / Erdogdu, Beril / Shinder, Ida / Minkin, Ilia / Chao, Kuan-Hao / Park, Sukhwan / Heinz, Jakob / Pockrandt, Christopher / Shumate, Alaina / Rincon, Natalia / Puiu, Daniela / Steinegger, Martin / Salzberg, Steven L / Pertea, Mihaela

    Genome biology

    2023  Volume 24, Issue 1, Page(s) 249

    Abstract: CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques ... ...

    Abstract CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .
    MeSH term(s) Humans ; Phylogeny ; Proteins/genetics ; Genome, Human ; Algorithms ; Software ; Molecular Sequence Annotation
    Chemical Substances Proteins
    Language English
    Publishing date 2023-10-30
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, Non-U.S. Gov't
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/s13059-023-03088-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top