LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 354

Search options

  1. Article ; Online: Harnessing the predicted maize pan-interactome for putative gene function prediction and prioritization of candidate genes for important traits.

    Poretsky, Elly / Cagirici, H Busra / Andorf, Carson M / Sen, Taner Z

    G3 (Bethesda, Md.)

    2024  

    Abstract: The recent assembly and annotation of the 26 maize nested association mapping (NAM) population founder inbreds have enabled large-scale pan-genomic comparative studies. These studies have expanded our understanding of agronomically important traits by ... ...

    Abstract The recent assembly and annotation of the 26 maize nested association mapping (NAM) population founder inbreds have enabled large-scale pan-genomic comparative studies. These studies have expanded our understanding of agronomically important traits by integrating pan-transcriptomic data with trait-specific gene candidates from previous association mapping results. In contrast to the availability of pan-transcriptomic data, obtaining reliable protein-protein interaction (PPI) data has remained a challenge due to its high cost and complexity. We generated predicted PPI networks for each of the 26 genomes using the established STRING database. The individual genome-interactomes were then integrated to generate core- and pan-interactomes. We deployed the PPI clustering algorithm ClusterONE to identify numerous PPI clusters that were functionally annotated using gene ontology (GO) functional enrichment, demonstrating a diverse range of enriched GO terms across different clusters. Additional cluster annotations were generated by integrating gene co-expression data and gene description annotations, providing additional useful information. We show that the functionally annotated PPI clusters establish a useful framework for protein function prediction and prioritization of candidate genes of interest. Our study not only provides a comprehensive resource of predicted PPI networks for 26 maize genomes, but also offers annotated interactome clusters for predicting protein functions and prioritizing gene candidates. The source code for the Python implementation of the analysis workflow and a standalone web application for accessing the analysis results are available at https://github.com/eporetsky/PanPPI.
    Language English
    Publishing date 2024-03-16
    Publishing country England
    Document type Journal Article
    ZDB-ID 2629978-1
    ISSN 2160-1836 ; 2160-1836
    ISSN (online) 2160-1836
    ISSN 2160-1836
    DOI 10.1093/g3journal/jkae059
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.

    Poretsky, Elly / Andorf, Carson M / Sen, Taner Z

    Plant direct

    2023  Volume 7, Issue 12, Page(s) e554

    Abstract: Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein-protein interactions, and ... ...

    Abstract Protein phosphorylation is a dynamic and reversible post-translational modification that regulates a variety of essential biological processes. The regulatory role of phosphorylation in cellular signaling pathways, protein-protein interactions, and enzymatic activities has motivated extensive research efforts to understand its functional implications. Experimental protein phosphorylation data in plants remains limited to a few species, necessitating a scalable and accurate prediction method. Here, we present PhosBoost, a machine-learning approach that leverages protein language models and gradient-boosting trees to predict protein phosphorylation from experimentally derived data. Trained on data obtained from a comprehensive plant phosphorylation database, qPTMplants, we compared the performance of PhosBoost to existing protein phosphorylation prediction methods, PhosphoLingo and DeepPhos. For serine and threonine prediction, PhosBoost achieved higher recall than PhosphoLingo and DeepPhos (.78, .56, and .14, respectively) while maintaining a competitive area under the precision-recall curve (.54, .56, and .42, respectively). PhosphoLingo and DeepPhos failed to predict any tyrosine phosphorylation sites, while PhosBoost achieved a recall score of .6. Despite the precision-recall tradeoff, PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores. A sequence-based pairwise alignment step improved prediction results for all classifiers by effectively increasing the number of inferred positive phosphosites. We provide evidence to show that PhosBoost models are transferable across species and scalable for genome-wide protein phosphorylation predictions. PhosBoost is freely and publicly available on GitHub.
    Language English
    Publishing date 2023-12-20
    Publishing country England
    Document type Journal Article
    ISSN 2475-4455
    ISSN (online) 2475-4455
    DOI 10.1002/pld3.554
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: G4Boost: a machine learning-based tool for quadruplex identification and stability prediction.

    Cagirici, H Busra / Budak, Hikmet / Sen, Taner Z

    BMC bioinformatics

    2022  Volume 23, Issue 1, Page(s) 240

    Abstract: Background: G-quadruplexes (G4s), formed within guanine-rich nucleic acids, are secondary structures involved in important biological processes. Although every G4 motif has the potential to form a stable G4 structure, not every G4 motif would, and ... ...

    Abstract Background: G-quadruplexes (G4s), formed within guanine-rich nucleic acids, are secondary structures involved in important biological processes. Although every G4 motif has the potential to form a stable G4 structure, not every G4 motif would, and accurate energy-based methods are needed to assess their structural stability. Here, we present a decision tree-based prediction tool, G4Boost, to identify G4 motifs and predict their secondary structure folding probability and thermodynamic stability based on their sequences, nucleotide compositions, and estimated structural topologies.
    Results: G4Boost predicted the quadruplex folding state with an accuracy greater then 93% and an F1-score of 0.96, and the folding energy with an RMSE of 4.28 and R
    Conclusion: G4Boost outperformed the three machine-learning based prediction tools, DeepG4, Quadron, and G4RNA Screener, in terms of both accuracy and F1-score, and can be highly useful for G4 prediction to understand gene regulation across species including plants and humans.
    MeSH term(s) G-Quadruplexes ; Gene Expression Regulation ; Guanine/chemistry ; Humans ; Machine Learning ; Thermodynamics
    Chemical Substances Guanine (5Z93L87A1R)
    Language English
    Publishing date 2022-06-18
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-022-04782-z
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: G4Boost

    H. Busra Cagirici / Hikmet Budak / Taner Z. Sen

    BMC Bioinformatics, Vol 23, Iss 1, Pp 1-

    a machine learning-based tool for quadruplex identification and stability prediction

    2022  Volume 18

    Abstract: Abstract Background G-quadruplexes (G4s), formed within guanine-rich nucleic acids, are secondary structures involved in important biological processes. Although every G4 motif has the potential to form a stable G4 structure, not every G4 motif would, ... ...

    Abstract Abstract Background G-quadruplexes (G4s), formed within guanine-rich nucleic acids, are secondary structures involved in important biological processes. Although every G4 motif has the potential to form a stable G4 structure, not every G4 motif would, and accurate energy-based methods are needed to assess their structural stability. Here, we present a decision tree-based prediction tool, G4Boost, to identify G4 motifs and predict their secondary structure folding probability and thermodynamic stability based on their sequences, nucleotide compositions, and estimated structural topologies. Results G4Boost predicted the quadruplex folding state with an accuracy greater then 93% and an F1-score of 0.96, and the folding energy with an RMSE of 4.28 and R2 of 0.95 only by the means of sequence intrinsic feature. G4Boost was successfully applied and validated to predict the stability of experimentally-determined G4 structures, including for plants and humans. Conclusion G4Boost outperformed the three machine-learning based prediction tools, DeepG4, Quadron, and G4RNA Screener, in terms of both accuracy and F1-score, and can be highly useful for G4 prediction to understand gene regulation across species including plants and humans.
    Keywords G-quadruplex ; Machine learning ; Topology ; Stability ; Energy ; Plants ; Computer applications to medicine. Medical informatics ; R858-859.7 ; Biology (General) ; QH301-705.5
    Subject code 006
    Language English
    Publishing date 2022-06-01T00:00:00Z
    Publisher BMC
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach.

    Cho, Kyoung Tak / Sen, Taner Z / Andorf, Carson M

    Frontiers in artificial intelligence

    2022  Volume 5, Page(s) 830170

    Abstract: Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given ...

    Abstract Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes.
    Language English
    Publishing date 2022-05-26
    Publishing country Switzerland
    Document type Journal Article
    ISSN 2624-8212
    ISSN (online) 2624-8212
    DOI 10.3389/frai.2022.830170
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Genome-Wide Discovery of G-Quadruplexes in Wheat: Distribution and Putative Functional Roles.

    Cagirici, H Busra / Sen, Taner Z

    G3 (Bethesda, Md.)

    2020  Volume 10, Issue 6, Page(s) 2021–2032

    Abstract: G-quadruplexes are nucleic acid secondary structures formed by a stack of square planar G-quartets. G-quadruplexes were implicated in many biological functions including telomere maintenance, replication, transcription, and translation, in many species ... ...

    Abstract G-quadruplexes are nucleic acid secondary structures formed by a stack of square planar G-quartets. G-quadruplexes were implicated in many biological functions including telomere maintenance, replication, transcription, and translation, in many species including humans and plants. For wheat, however, though it is one of the world's most important staple food, no G-quadruplex studies have been reported to date. Here, we computationally identify putative G4 structures (G4s) in wheat genome for the first time and compare its distribution across the genome against five other genomes (human, maize, Arabidopsis, rice, and sorghum). We identified close to 1 million G4 motifs with a density of 76 G4s/Mb across the whole genome and 93 G4s/Mb over genic regions. Remarkably, G4s were enriched around three regions, two located on the antisense and one on the sense strand at the following positions: 1) the transcription start site (TSS) (antisense), 2) the first coding domain sequence (CDS) (antisense), and 3) the start codon (sense). Functional enrichment analysis revealed that the gene models containing G4 motifs within these peaks were associated with specific gene ontology (GO) terms, such as developmental process, localization, and cellular component organization or biogenesis. We investigated genes encoding MADS-box transcription factors and showed examples of G4 motifs within critical regulatory regions in the VRN-1 genes in wheat. Furthermore, comparison with other plants showed that monocots share a similar distribution of G4s, but Arabidopsis shows a unique G4 distribution. Our study shows for the first time the prevalence and possible functional roles of G4s in wheat.
    MeSH term(s) G-Quadruplexes ; Humans ; Regulatory Sequences, Nucleic Acid ; Transcription Initiation Site ; Triticum/genetics ; Zea mays
    Language English
    Publishing date 2020-06-01
    Publishing country England
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2629978-1
    ISSN 2160-1836 ; 2160-1836
    ISSN (online) 2160-1836
    ISSN 2160-1836
    DOI 10.1534/g3.120.401288
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: GrainGenes: Tools and Content to Assist Breeders Improving Oat Quality.

    Blake, Victoria C / Wight, Charlene P / Yao, Eric / Sen, Taner Z

    Foods (Basel, Switzerland)

    2022  Volume 11, Issue 7

    Abstract: GrainGenes is the USDA-ARS database and Web resource for wheat, barley, oat, rye, and their relatives. As a community Web hub and database for small grains, GrainGenes strives to provide resources for researchers, students, and plant breeders to improve ... ...

    Abstract GrainGenes is the USDA-ARS database and Web resource for wheat, barley, oat, rye, and their relatives. As a community Web hub and database for small grains, GrainGenes strives to provide resources for researchers, students, and plant breeders to improve traits such as quality, yield, and disease resistance. Quantitative trait loci (QTL), genes, and genetic maps for quality attributes in GrainGenes represent the historical approach to mapping genes for groat percentage, test weight, protein, fat, and β-glucan content in oat (
    Language English
    Publishing date 2022-03-23
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2704223-6
    ISSN 2304-8158
    ISSN 2304-8158
    DOI 10.3390/foods11070914
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Multiple Variant Calling Pipelines in Wheat Whole Exome Sequencing.

    Cagirici, H Busra / Akpinar, Bala Ani / Sen, Taner Z / Budak, Hikmet

    International journal of molecular sciences

    2021  Volume 22, Issue 19

    Abstract: The highly challenging hexaploid wheat ( ...

    Abstract The highly challenging hexaploid wheat (
    MeSH term(s) Genome, Plant ; Polymorphism, Single Nucleotide ; Polyploidy ; Triticum/genetics ; Exome Sequencing
    Language English
    Publishing date 2021-09-27
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2019364-6
    ISSN 1422-0067 ; 1422-0067 ; 1661-6596
    ISSN (online) 1422-0067
    ISSN 1422-0067 ; 1661-6596
    DOI 10.3390/ijms221910400
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Predicting Tissue-Specific mRNA and Protein Abundance in Maize

    Kyoung Tak Cho / Taner Z. Sen / Carson M. Andorf

    Frontiers in Artificial Intelligence, Vol

    A Machine Learning Approach

    2022  Volume 5

    Abstract: Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given ...

    Abstract Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes.
    Keywords maize genetics ; gene expression ; protein abundance ; mRNA abundance ; machine learning ; Electronic computers. Computer science ; QA75.5-76.95
    Subject code 612 ; 006
    Language English
    Publishing date 2022-05-01T00:00:00Z
    Publisher Frontiers Media S.A.
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: Genome-wide discovery of G-quadruplexes in barley

    H. Busra Cagirici / Hikmet Budak / Taner Z. Sen

    Scientific Reports, Vol 11, Iss 1, Pp 1-

    2021  Volume 15

    Abstract: Abstract G-quadruplexes (G4s) are four-stranded nucleic acid structures with closely spaced guanine bases forming square planar G-quartets. Aberrant formation of G4 structures has been associated with genomic instability. However, most plant species are ... ...

    Abstract Abstract G-quadruplexes (G4s) are four-stranded nucleic acid structures with closely spaced guanine bases forming square planar G-quartets. Aberrant formation of G4 structures has been associated with genomic instability. However, most plant species are lacking comprehensive studies of G4 motifs. In this study, genome-wide identification of G4 motifs in barley was performed, followed by a comparison of genomic distribution and molecular functions to other monocot species, such as wheat, maize, and rice. Similar to the reports on human and some plants like wheat, G4 motifs peaked around the 5′ untranslated region (5′ UTR), the first coding domain sequence, and the first intron start sites on antisense strands. Our comparative analyses in human, Arabidopsis, maize, rice, and sorghum demonstrated that the peak points could be erroneously merged into a single peak when large window sizes are used. We also showed that the G4 distributions around genic regions are relatively similar in the species studied, except in the case of Arabidopsis. G4 containing genes in monocots showed conserved molecular functions for transcription initiation and hydrolase activity. Additionally, we provided examples of imperfect G4 motifs.
    Keywords Medicine ; R ; Science ; Q
    Subject code 580
    Language English
    Publishing date 2021-04-01T00:00:00Z
    Publisher Nature Portfolio
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top