LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 8 of total 8

Search options

  1. Article ; Online: Robust expansion of phylogeny for fast-growing genome sequence data.

    Ye, Yongtao / Shum, Marcus H / Tsui, Joseph L / Yu, Guangchuang / Smith, David K / Zhu, Huachen / Wu, Joseph T / Guan, Yi / Lam, Tommy Tsan-Yuk

    PLoS computational biology

    2024  Volume 20, Issue 2, Page(s) e1011871

    Abstract: Massive sequencing of SARS-CoV-2 genomes has urged novel methods that employ existing phylogenies to add new samples efficiently instead of de novo inference. 'TIPars' was developed for such challenge integrating parsimony analysis with pre-computed ... ...

    Abstract Massive sequencing of SARS-CoV-2 genomes has urged novel methods that employ existing phylogenies to add new samples efficiently instead of de novo inference. 'TIPars' was developed for such challenge integrating parsimony analysis with pre-computed ancestral sequences. It took about 21 seconds to insert 100 SARS-CoV-2 genomes into a 100k-taxa reference tree using 1.4 gigabytes. Benchmarking on four datasets, TIPars achieved the highest accuracy for phylogenies of moderately similar sequences. For highly similar and divergent scenarios, fully parsimony-based and likelihood-based phylogenetic placement methods performed the best respectively while TIPars was the second best. TIPars accomplished efficient and accurate expansion of phylogenies of both similar and divergent sequences, which would have broad biological applications beyond SARS-CoV-2. TIPars is accessible from https://tipars.hku.hk/ and source codes are available at https://github.com/id-bioinfo/TIPars.
    MeSH term(s) Phylogeny ; Likelihood Functions ; Genome ; Software ; SARS-CoV-2/genetics
    Language English
    Publishing date 2024-02-08
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2193340-6
    ISSN 1553-7358 ; 1553-734X
    ISSN (online) 1553-7358
    ISSN 1553-734X
    DOI 10.1371/journal.pcbi.1011871
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: PnpProbs: a better multiple sequence alignment tool by better handling of guide trees.

    Ye, Yongtao / Lam, Tak-Wah / Ting, Hing-Fung

    BMC bioinformatics

    2016  Volume 17 Suppl 8, Page(s) 285

    Abstract: Background: This paper describes a new MSA tool called PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies sequences into two types: normally related and distantly related. For normally related ...

    Abstract Background: This paper describes a new MSA tool called PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies sequences into two types: normally related and distantly related. For normally related sequences, it uses an adaptive approach to construct the guide tree needed for progressive alignment; it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the better method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree and uses instead some non-progressive alignment method to generate the alignment.
    Results: To evaluate PnpProbs, we have compared it with thirteen other popular MSA tools, and PnpProbs has the best alignment scores in all but one test. We have also used it for phylogenetic analysis, and found that the phylogenetic trees constructed from PnpProbs' alignments are closest to the model trees.
    Conclusions: By combining the strength of the progressive and non-progressive alignment methods, we have developed an MSA tool called PnpProbs. We have compared PnpProbs with thirteen other popular MSA tools and our results showed that our tool usually constructed the best alignments.
    MeSH term(s) Algorithms ; Amino Acid Sequence ; Computer Simulation ; Databases, Protein ; Phylogeny ; Sequence Alignment/methods ; Software ; Time Factors
    Language English
    Publishing date 2016-08-31
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-016-1121-7
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Robust expansion of phylogeny for fast-growing genome sequence data

    Ye, Yongtao / Shum, Marcus / Tsui, Joseph / Yu, Guangchuang / Smith, David / Zhu, Huachen / Wu, Joseph / Guan, Yi / Lam, Tommy Tsan-Yuk

    bioRxiv

    Abstract: Massive sequencing of SARS-CoV-2 genomes has led to a great demand for adding new samples to a reference phylogeny instead of building the tree from scratch. To address such challenge, we proposed an algorithm 9TIPars9 by integrating parsimony analysis ... ...

    Abstract Massive sequencing of SARS-CoV-2 genomes has led to a great demand for adding new samples to a reference phylogeny instead of building the tree from scratch. To address such challenge, we proposed an algorithm 9TIPars9 by integrating parsimony analysis with pre-computed ancestral sequences. Compared to four state-of-the-art methods on four benchmark datasets (SARS-CoV-2, Influenza virus, Newcastle disease virus and 16S rRNA genes), TIPars achieved the best performance in most tests. It took only 21 seconds to insert 100 SARS-CoV-2 genomes to a 100k-taxa reference tree using near 1.4 gigabytes of memory. Its efficient and accurate phylogenetic placements and incrementation for phylogenies with highly similar and divergent sequences suggest that it will be useful in a wide range of studies including pathogen molecular epidemiology, microbiome diversity and systematics.
    Keywords covid19
    Language English
    Publishing date 2022-01-03
    Publisher Cold Spring Harbor Laboratory
    Document type Article ; Online
    DOI 10.1101/2021.12.30.474610
    Database COVID19

    Kategorien

  4. Article ; Online: Improving multiple sequence alignment by using better guide trees.

    Zhan, Qing / Ye, Yongtao / Lam, Tak-Wah / Yiu, Siu-Ming / Wang, Yadong / Ting, Hing-Fung

    BMC bioinformatics

    2015  Volume 16 Suppl 5, Page(s) S4

    Abstract: Progressive sequence alignment is one of the most commonly used method for multiple sequence alignment. Roughly speaking, the method first builds a guide tree, and then aligns the sequences progressively according to the topology of the tree. It is ... ...

    Abstract Progressive sequence alignment is one of the most commonly used method for multiple sequence alignment. Roughly speaking, the method first builds a guide tree, and then aligns the sequences progressively according to the topology of the tree. It is believed that guide trees are very important to progressive alignment; a better guide tree will give an alignment with higher accuracy. Recently, we have proposed an adaptive method for constructing guide trees. This paper studies the quality of the guide trees constructed by such method. Our study showed that our adaptive method can be used to improve the accuracy of many different progressive MSA tools. In fact, we give evidences showing that the guide trees constructed by the adaptive method are among the best.
    MeSH term(s) Computational Biology/methods ; Computer Simulation ; Databases, Genetic ; Evolution, Molecular ; Humans ; Phylogeny ; Sequence Alignment/methods ; Sequence Analysis, DNA ; Software
    Language English
    Publishing date 2015-03-18
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/1471-2105-16-S5-S4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: GLProbs: Aligning Multiple Sequences Adaptively.

    Ye, Yongtao / Cheung, David Wai-lok / Wang, Yadong / Yiu, Siu-Ming / Zhan, Qing / Lam, Tak-Wah / Ting, Hing-Fung

    IEEE/ACM transactions on computational biology and bioinformatics

    2015  Volume 12, Issue 1, Page(s) 67–78

    Abstract: This paper introduces a simple and effective approach to improve the accuracy of multiple sequence alignment. We use a natural measure to estimate the similarity of the input sequences, and based on this measure, we align the input sequences differently. ...

    Abstract This paper introduces a simple and effective approach to improve the accuracy of multiple sequence alignment. We use a natural measure to estimate the similarity of the input sequences, and based on this measure, we align the input sequences differently. For example, for inputs with high similarity, we consider the whole sequences and align them globally, while for those with moderately low similarity, we may ignore the flank regions and align them locally. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with about one dozen leading alignment tools on three benchmark alignment databases, and GLProbs's alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic trees construction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging.
    MeSH term(s) Algorithms ; Amino Acid Sequence ; Computational Biology/methods ; Markov Chains ; Molecular Sequence Data ; Phylogeny ; Protein Structure, Secondary ; Proteins/chemistry ; Proteins/classification ; Sequence Alignment/methods ; Sequence Analysis, Protein/methods ; Software
    Chemical Substances Proteins
    Language English
    Publishing date 2015-01
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 1557-9964
    ISSN (online) 1557-9964
    DOI 10.1109/TCBB.2014.2316820
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Predicting RNA Secondary Structures: One-grammar-fits-all Solution

    Li, Menglu / Cheng, Micheal / Ye, Yongtao / Hon, Wk / Ting, Hf / Lam, Tw / Tang, Cy / Wong, Thomas / Yiu, Sm

    Bioinformatics Research and Applications

    Abstract: RNA secondary structures are known to be important in many biological processes. Many available programs have been developed for RNA secondary structure prediction. Based on our knowledge, however, there still exist secondary structures of known RNA ... ...

    Abstract RNA secondary structures are known to be important in many biological processes. Many available programs have been developed for RNA secondary structure prediction. Based on our knowledge, however, there still exist secondary structures of known RNA sequences which cannot be covered by these algorithms. In this paper, we provide an efficient algorithm that can handle all RNA secondary structures found in Rfam database. We designed a new stochastic context-free grammar named Rectangle Tree Grammar (RTG) which significantly expands the classes of structures that can be modelled. Our algorithm runs in O(n (6)) time and the accuracy is reasonably high, with average PPV and sensitivity over 75%. In addition, the structures that RTG predicts are very similar to the real ones.
    Keywords covid19
    Publisher PMC
    Document type Article ; Online
    DOI 10.1007/978-3-319-19048-8_18
    Database COVID19

    Kategorien

  7. Article ; Online: Redefining the structural motifs that determine RNA binding and RNA editing by pentatricopeptide repeat proteins in land plants.

    Cheng, Shifeng / Gutmann, Bernard / Zhong, Xiao / Ye, Yongtao / Fisher, Mark F / Bai, Fengqi / Castleden, Ian / Song, Yue / Song, Bo / Huang, Jiaying / Liu, Xin / Xu, Xun / Lim, Boon L / Bond, Charles S / Yiu, Siu-Ming / Small, Ian

    The Plant journal : for cell and molecular biology

    2016  Volume 85, Issue 4, Page(s) 532–547

    Abstract: The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. ... ...

    Abstract The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. Almost all of them are post-translationally targeted to plastids and mitochondria, where they play important roles in post-transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide-binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C-terminus of many RNA-editing factors. We show that the super-helical RNA-binding surface of RNA-editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre-requisite for accurate large-scale computational predictions of PPR targets. We have created a web portal (http://www.plantppr.com) that provides open access to these resources for the community.
    MeSH term(s) Amino Acid Motifs ; Amino Acid Sequence ; Embryophyta/genetics ; Embryophyta/metabolism ; Mitochondria/metabolism ; Models, Molecular ; Models, Structural ; Molecular Sequence Annotation ; Plant Proteins/chemistry ; Plant Proteins/genetics ; Plant Proteins/metabolism ; Plastids/metabolism ; Protein Transport ; RNA Editing/genetics ; RNA Recognition Motif Proteins/chemistry ; RNA Recognition Motif Proteins/genetics ; RNA Recognition Motif Proteins/metabolism ; RNA, Plant/genetics ; Sequence Alignment
    Chemical Substances Plant Proteins ; RNA Recognition Motif Proteins ; RNA, Plant
    Language English
    Publishing date 2016-02
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1088037-9
    ISSN 1365-313X ; 0960-7412
    ISSN (online) 1365-313X
    ISSN 0960-7412
    DOI 10.1111/tpj.13121
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Redefining the structural motifs that determine RNA binding and RNA editing by pentatricopeptide repeat proteins in land plants

    Cheng, Shifeng / Gutmann, Bernard / Zhong, Xiao / Ye, Yongtao / Fisher, Mark F. / Bai, Fengqi / Castleden, Ian / Song, Yue / Song, Bo / Huang, Jiaying / Liu, Xin / Xu, Xun / Lim, Boon L. / Bond, Charles S. / Yiu, Siu‐Ming / Small, Ian

    plant journal

    Volume v. 85,, Issue no. 4

    Abstract: The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30–40 amino acid motifs that form an extended binding surface capable of sequence‐specific recognition of RNA strands. ... ...

    Abstract The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30–40 amino acid motifs that form an extended binding surface capable of sequence‐specific recognition of RNA strands. Almost all of them are post‐translationally targeted to plastids and mitochondria, where they play important roles in post‐transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide‐binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C‐terminus of many RNA‐editing factors. We show that the super‐helical RNA‐binding surface of RNA‐editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre‐requisite for accurate large‐scale computational predictions of PPR targets. We have created a web portal (http://www.plantppr.com) that provides open access to these resources for the community.
    Keywords data collection ; embryophytes ; plastids ; genes ; prediction ; models ; mitochondria ; RNA ; introns ; amino acid motifs ; RNA editing ; genomics ; plant proteins ; Internet
    Language English
    Document type Article
    ISSN 0960-7412
    Database AGRIS - International Information System for the Agricultural Sciences and Technology

    More links

    Kategorien

To top