LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 94

Search options

  1. Article ; Online: The field of protein function prediction as viewed by different domain scientists.

    Ramola, Rashika / Friedberg, Iddo / Radivojac, Predrag

    Bioinformatics advances

    2022  Volume 2, Issue 1, Page(s) vbac057

    Abstract: Motivation: Experimental biologists, biocurators, and computational biologists all play a role in characterizing a protein's function. The discovery of protein function in the laboratory by experimental scientists is the foundation of our knowledge ... ...

    Abstract Motivation: Experimental biologists, biocurators, and computational biologists all play a role in characterizing a protein's function. The discovery of protein function in the laboratory by experimental scientists is the foundation of our knowledge about proteins. Experimental findings are compiled in knowledgebases by biocurators to provide standardized, readily accessible, and computationally amenable information. Computational biologists train their methods using these data to predict protein function and guide subsequent experiments. To understand the state of affairs in this ecosystem, centered here around protein function prediction, we surveyed scientists from these three constituent communities.
    Results: We show that the three communities have common but also idiosyncratic perspectives on the field. Most strikingly, experimentalists rarely use state-of-the-art prediction software, but when presented with predictions, report many to be surprising and useful. Ontologies appear to be highly valued by biocurators, less so by experimentalists and computational biologists, yet controlled vocabularies bridge the communities and simplify the prediction task. Additionally, many software tools are not readily accessible and the predictions presented to the users can be broad and uninformative. We conclude that to meet both the social and technical challenges in the field, a more productive and meaningful interaction between members of the core communities is necessary.
    Availability and implementation: Data cannot be shared for ethical/privacy reasons.
    Supplementary information: Supplementary data are available at
    Language English
    Publishing date 2022-08-17
    Publishing country England
    Document type Journal Article
    ISSN 2635-0041
    ISSN (online) 2635-0041
    DOI 10.1093/bioadv/vbac057
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: GOThresher: a program to remove annotation biases from protein function annotation datasets.

    Joshi, Parnal / Banerjee, Sagnik / Hu, Xiao / Khade, Pranav M / Friedberg, Iddo

    Bioinformatics (Oxford, England)

    2023  Volume 39, Issue 1

    Abstract: Motivation: Advances in sequencing technologies have led to a surge in genomic data, although the functions of many gene products coded by these genes remain unknown. While in-depth, targeted experiments that determine the functions of these gene ... ...

    Abstract Motivation: Advances in sequencing technologies have led to a surge in genomic data, although the functions of many gene products coded by these genes remain unknown. While in-depth, targeted experiments that determine the functions of these gene products are crucial and routinely performed, they fail to keep up with the inflow of novel genomic data. In an attempt to address this gap, high-throughput experiments are being conducted in which a large number of genes are investigated in a single study. The annotations generated as a result of these experiments are generally biased towards a small subset of less informative Gene Ontology (GO) terms. Identifying and removing biases from protein function annotation databases is important since biases impact our understanding of protein function by providing a poor picture of the annotation landscape. Additionally, as machine learning methods for predicting protein function are becoming increasingly prevalent, it is essential that they are trained on unbiased datasets. Therefore, it is not only crucial to be aware of biases, but also to judiciously remove them from annotation datasets.
    Results: We introduce GOThresher, a Python tool that identifies and removes biases in function annotations from protein function annotation databases.
    Availability and implementation: GOThresher is written in Python and released via PyPI https://pypi.org/project/gothresher/ and on the Bioconda Anaconda channel https://anaconda.org/bioconda/gothresher. The source code is hosted on GitHub https://github.com/FriedbergLab/GOThresher and distributed under the GPL 3.0 license.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Computational Biology/methods ; Genomics ; Molecular Sequence Annotation ; Software ; Proteins/genetics ; Proteins/metabolism ; Databases, Protein
    Chemical Substances Proteins
    Language English
    Publishing date 2023-01-23
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btad048
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier.

    Hu, Xiao / Friedberg, Iddo

    GigaScience

    2019  Volume 8, Issue 10

    Abstract: Background: Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology ... ...

    Abstract Background: Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters.
    Findings: Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy.
    Conclusions: SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho.
    MeSH term(s) Algorithms ; Bacterial Proteins/genetics ; Cluster Analysis ; Computer Storage Devices ; Genome, Bacterial ; Genomics/methods
    Chemical Substances Bacterial Proteins
    Language English
    Publishing date 2019-10-24
    Publishing country United States
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2708999-X
    ISSN 2047-217X ; 2047-217X
    ISSN (online) 2047-217X
    ISSN 2047-217X
    DOI 10.1093/gigascience/giz118
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article: EpicTope: narrating protein sequence features to identify non-disruptive epitope tagging sites.

    Zinski, Joseph / Chung, Henri / Joshi, Parnal / Warrick, Finn / Berg, Brian D / Glova, Greg / McGrail, Maura / Balciunas, Darius / Friedberg, Iddo / Mullins, Mary

    bioRxiv : the preprint server for biology

    2024  

    Abstract: Epitope tagging is an invaluable technique enabling the identification, tracking, and purification of proteins in vivo. We developed a tool, EpicTope, to facilitate this method by identifying amino acid positions suitable for epitope insertion. Our ... ...

    Abstract Epitope tagging is an invaluable technique enabling the identification, tracking, and purification of proteins in vivo. We developed a tool, EpicTope, to facilitate this method by identifying amino acid positions suitable for epitope insertion. Our method uses a scoring function that considers multiple protein sequence and structural features to determine locations least disruptive to the protein's function. We validated our approach on the zebrafish Smad5 protein, showing that multiple predicted internally tagged Smad5 proteins rescue zebrafish
    Language English
    Publishing date 2024-03-11
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2024.03.03.583232
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: CAFA-evaluator: a Python tool for benchmarking ontological classification methods.

    Piovesan, Damiano / Zago, Davide / Joshi, Parnal / De Paolis Kaluza, M Clara / Mehdiabadi, Mahta / Ramola, Rashika / Monzon, Alexander Miguel / Reade, Walter / Friedberg, Iddo / Radivojac, Predrag / Tosatto, Silvio C E

    Bioinformatics advances

    2024  Volume 4, Issue 1, Page(s) vbae043

    Abstract: We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are ...

    Abstract We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requirements include a small number of standard Python libraries, making CAFA-evaluator easy to maintain. The code replicates the Critical Assessment of protein Function Annotation (CAFA) benchmarking, which evaluates predictions of the consistent subgraphs in Gene Ontology. Owing to its reliability and accuracy, the organizers have selected CAFA-evaluator as the official CAFA evaluation software.
    Availability and implementation: https://pypi.org/project/cafaeval.
    Language English
    Publishing date 2024-03-14
    Publishing country England
    Document type Journal Article
    ISSN 2635-0041
    ISSN (online) 2635-0041
    DOI 10.1093/bioadv/vbae043
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Identifying antimicrobial peptides using word embedding with deep recurrent neural networks.

    Hamid, Md-Nafiz / Friedberg, Iddo

    Bioinformatics (Oxford, England)

    2018  Volume 35, Issue 12, Page(s) 2009–2016

    Abstract: Motivation: Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for ... ...

    Abstract Motivation: Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences' low complexity and high variance, which frustrates sequence similarity-based searches.
    Results: Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used.
    Availability and implementation: Data and source code for this project are freely available at: https://github.com/nafizh/NeuBI.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Anti-Infective Agents ; Computational Biology ; Neural Networks, Computer ; Peptides ; Software
    Chemical Substances Anti-Infective Agents ; Peptides
    Language English
    Publishing date 2018-12-18
    Publishing country England
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/bty937
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Establishing the reliability of algorithms.

    Mangravite, Lara / Mooney, Sean D / Friedberg, Iddo / Guinney, Justin

    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

    2021  Volume 26, Page(s) 341–345

    Abstract: As rich biomedical data streams are accumulating across people and time, they provide a powerful opportunity to address limitations in our existing scientific knowledge and to overcome operational challenges in healthcare and life sciences. Yet the ... ...

    Abstract As rich biomedical data streams are accumulating across people and time, they provide a powerful opportunity to address limitations in our existing scientific knowledge and to overcome operational challenges in healthcare and life sciences. Yet the relative weighting of insights vs. methodologies in our current research ecosystem tends to skew the computational community away from algorithm evaluation and operationalization, resulting in a well-reported trend towards the proliferation of scientific outcomes of unknown reliability. Algorithm selection and use is hindered by several problems that persist across our field. One is the impact of the self-assessment bias, which can lead to mis-representations in the accuracy of research results. A second challenge is the impact of data context on algorithm performance. Biology and medicine are dynamic and heterogeneous. Data is collected under varying conditions. For algorithms, this means that performance is not universal - and need to be evaluated across a range of contexts. These issues are increasingly difficult as algorithms are trained and used on data collected in the real-world, outside of the traditional clinical research lab. In these cases, data collection is neither supervised nor well controlled and data access may be limited by privacy or proprietary reasons. Therefore, there is a risk that algorithms will be applied to data that are outside of the scope of the intent of the original training data provided. This workshop will focus on approaches that are emerging across the researcher community to quantify the accuracy of algorithms and the reliability of their outputs.
    MeSH term(s) Algorithms ; Computational Biology ; Data Collection ; Ecosystem ; Reproducibility of Results
    Language English
    Publishing date 2021-03-19
    Publishing country United States
    Document type Journal Article
    ISSN 2335-6936
    ISSN (online) 2335-6936
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Learning from the unknown: exploring the range of bacterial functionality.

    Mahlich, Yannick / Zhu, Chengsheng / Chung, Henri / Velaga, Pavan K / De Paolis Kaluza, M Clara / Radivojac, Predrag / Friedberg, Iddo / Bromberg, Yana

    Nucleic acids research

    2023  Volume 51, Issue 19, Page(s) 10162–10175

    Abstract: Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or ... ...

    Abstract Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or known taxonomic markers such as 16S rRNA. Here, we describe a novel approach to exploring bacterial functional repertoires without reference databases. Our Fusion scheme establishes functional relationships between bacteria and assigns organisms to Fusion-taxa that differ from otherwise defined taxonomic clades. Three key findings of our work stand out. First, bacterial functional comparisons outperform marker genes in assigning taxonomic clades. Fusion profiles are also better for this task than other functional annotation schemes. Second, Fusion-taxa are robust to addition of novel organisms and are, arguably, able to capture the environment-driven bacterial diversity. Finally, our alignment-free nucleic acid-based Siamese Neural Network model, created using Fusion functions, enables finding shared functionality of very distant, possibly structurally different, microbial homologs. Our work can thus help annotate functional repertoires of bacterial organisms and further guide our understanding of microbial communities.
    MeSH term(s) Bacteria/cytology ; Bacteria/genetics ; Databases, Factual ; Microbiota ; Phylogeny ; RNA, Ribosomal, 16S/genetics ; Bacterial Physiological Phenomena
    Chemical Substances RNA, Ribosomal, 16S
    Language English
    Publishing date 2023-09-20
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 186809-3
    ISSN 1362-4962 ; 1362-4954 ; 0301-5610 ; 0305-1048
    ISSN (online) 1362-4962 ; 1362-4954
    ISSN 0301-5610 ; 0305-1048
    DOI 10.1093/nar/gkad757
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: An accurate and interpretable model for antimicrobial resistance in pathogenic Escherichia coli from livestock and companion animal species.

    Chung, Henri C / Foxx, Christine L / Hicks, Jessica A / Stuber, Tod P / Friedberg, Iddo / Dorman, Karin S / Harris, Beth

    PloS one

    2023  Volume 18, Issue 8, Page(s) e0290473

    Abstract: Understanding the microbial genomic contributors to antimicrobial resistance (AMR) is essential for early detection of emerging AMR infections, a pressing global health threat in human and veterinary medicine. Here we used whole genome sequencing and ... ...

    Abstract Understanding the microbial genomic contributors to antimicrobial resistance (AMR) is essential for early detection of emerging AMR infections, a pressing global health threat in human and veterinary medicine. Here we used whole genome sequencing and antibiotic susceptibility test data from 980 disease causing Escherichia coli isolated from companion and farm animals to model AMR genotypes and phenotypes for 24 antibiotics. We determined the strength of genotype-to-phenotype relationships for 197 AMR genes with elastic net logistic regression. Model predictors were designed to evaluate different potential modes of AMR genotype translation into resistance phenotypes. Our results show a model that considers the presence of individual AMR genes and total number of AMR genes present from a set of genes known to confer resistance was able to accurately predict isolate resistance on average (mean F1 score = 98.0%, SD = 2.3%, mean accuracy = 98.2%, SD = 2.7%). However, fitted models sometimes varied for antibiotics in the same class and for the same antibiotic across animal hosts, suggesting heterogeneity in the genetic determinants of AMR resistance. We conclude that an interpretable AMR prediction model can be used to accurately predict resistance phenotypes across multiple host species and reveal testable hypotheses about how the mechanism of resistance may vary across antibiotics within the same class and across animal hosts for the same antibiotic.
    MeSH term(s) Animals ; Humans ; Livestock ; Anti-Bacterial Agents/pharmacology ; Pets ; Drug Resistance, Bacterial/genetics ; Escherichia coli/genetics
    Chemical Substances Anti-Bacterial Agents
    Language English
    Publishing date 2023-08-24
    Publishing country United States
    Document type Journal Article ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2267670-3
    ISSN 1932-6203 ; 1932-6203
    ISSN (online) 1932-6203
    ISSN 1932-6203
    DOI 10.1371/journal.pone.0290473
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: Identifying antimicrobial peptides using word embedding with deep recurrent neural networks

    Hamid, Md-Nafiz / Friedberg, Iddo

    Bioinformatics. 2019 June 01, v. 35, no. 12, p. 2009-2016

    2019  , Page(s) 2009–2016

    Abstract: Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the ... ...

    Abstract Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences’ low complexity and high variance, which frustrates sequence similarity-based searches. Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used. Data and source code for this project are freely available at: https://github.com/nafizh/NeuBI. Supplementary data are available at Bioinformatics online.
    Keywords Lactobacillus ; amino acids ; anti-infective agents ; antibiotic resistance ; antimicrobial peptides ; bacteriocins ; bioinformatics ; probability ; public health ; sequence homology ; variance
    Language English
    Dates of publication 2019-0601
    Size p. 2009-2016
    Publishing place Oxford University Press
    Document type Article ; Online
    Note Use and reproduction
    ZDB-ID 1468345-3
    ISSN 1367-4811 ; 1460-2059
    ISSN 1367-4811 ; 1460-2059
    DOI 10.1093/bioinformatics/bty937
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

To top