LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 25

Search options

  1. Article ; Online: Navigating the amino acid sequence space between functional proteins using a deep learning framework.

    Bitard-Feildel, Tristan

    PeerJ. Computer science

    2021  Volume 7, Page(s) e684

    Abstract: Motivation: Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific ... ...

    Abstract Motivation: Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions is however hard to comprehend due to its complexity. Generative models help to decipher complex systems thanks to their abilities to learn and recreate data specificity. Applied to proteins, they can capture the sequence patterns associated with functions and point out important relationships between sequence positions. By learning these dependencies between sequences and functions, they can ultimately be used to generate new sequences and navigate through uncharted area of molecular evolution.
    Results: This study presents an Adversarial Auto-Encoder (AAE) approached, an unsupervised generative model, to generate new protein sequences. AAEs are tested on three protein families known for their multiple functions the sulfatase, the HUP and the TPP families. Clustering results on the encoded sequences from the latent space computed by AAEs display high level of homogeneity regarding the protein sequence functions. The study also reports and analyzes for the first time two sampling strategies based on latent space interpolation and latent space arithmetic to generate intermediate protein sequences sharing sequential properties of original sequences linked to known functional properties issued from different families and functions. Generated sequences by interpolation between latent space data points demonstrate the ability of the AAE to generalize and produce meaningful biological sequences from an evolutionary uncharted area of the biological sequence space. Finally, 3D structure models computed by comparative modelling using generated sequences and templates of different sub-families point out to the ability of the latent space arithmetic to successfully transfer protein sequence properties linked to function between different sub-families. All in all this study confirms the ability of deep learning frameworks to model biological complexity and bring new tools to explore amino acid sequence and functional spaces.
    Language English
    Publishing date 2021-09-17
    Publishing country United States
    Document type Journal Article
    ISSN 2376-5992
    ISSN (online) 2376-5992
    DOI 10.7717/peerj-cs.684
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Navigating the amino acid sequence space between functional proteins using a deep learning framework

    Tristan Bitard-Feildel

    PeerJ Computer Science, Vol 7, p e

    2021  Volume 684

    Abstract: Motivation Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions ...

    Abstract Motivation Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions is however hard to comprehend due to its complexity. Generative models help to decipher complex systems thanks to their abilities to learn and recreate data specificity. Applied to proteins, they can capture the sequence patterns associated with functions and point out important relationships between sequence positions. By learning these dependencies between sequences and functions, they can ultimately be used to generate new sequences and navigate through uncharted area of molecular evolution. Results This study presents an Adversarial Auto-Encoder (AAE) approached, an unsupervised generative model, to generate new protein sequences. AAEs are tested on three protein families known for their multiple functions the sulfatase, the HUP and the TPP families. Clustering results on the encoded sequences from the latent space computed by AAEs display high level of homogeneity regarding the protein sequence functions. The study also reports and analyzes for the first time two sampling strategies based on latent space interpolation and latent space arithmetic to generate intermediate protein sequences sharing sequential properties of original sequences linked to known functional properties issued from different families and functions. Generated sequences by interpolation between latent space data points demonstrate the ability of the AAE to generalize and produce meaningful biological sequences from an evolutionary uncharted area of the biological sequence space. Finally, 3D structure models computed by comparative modelling using generated sequences and templates of different sub-families point out to the ability of the latent space arithmetic to successfully transfer protein sequence properties linked to function between different sub-families. All in all this study ...
    Keywords Latent space arithmetic ; Latent space exploration ; Protein sequence ; Protein function ; Electronic computers. Computer science ; QA75.5-76.95
    Subject code 612
    Language English
    Publishing date 2021-09-01T00:00:00Z
    Publisher PeerJ Inc.
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum.

    Bruley, Apolline / Bitard-Feildel, Tristan / Callebaut, Isabelle / Duprat, Elodie

    Proteins

    2022  Volume 91, Issue 4, Page(s) 466–484

    Abstract: Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid ... ...

    Abstract Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
    MeSH term(s) Amino Acid Sequence ; Proteome/metabolism ; Protein Structure, Secondary ; Hydrophobic and Hydrophilic Interactions ; Protein Domains
    Chemical Substances Proteome
    Language English
    Publishing date 2022-11-09
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 806683-8
    ISSN 1097-0134 ; 0887-3585
    ISSN (online) 1097-0134
    ISSN 0887-3585
    DOI 10.1002/prot.26441
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Exploring the dark foldable proteome by considering hydrophobic amino acids topology.

    Bitard-Feildel, Tristan / Callebaut, Isabelle

    Scientific reports

    2017  Volume 7, Page(s) 41425

    Abstract: The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a ... ...

    Abstract The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
    MeSH term(s) Amino Acid Sequence ; Amino Acids/chemistry ; Cluster Analysis ; Hydrophobic and Hydrophilic Interactions ; Molecular Sequence Annotation ; Proteome/metabolism ; Thermodynamics
    Chemical Substances Amino Acids ; Proteome
    Language English
    Publishing date 2017-01-30
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2615211-3
    ISSN 2045-2322 ; 2045-2322
    ISSN (online) 2045-2322
    ISSN 2045-2322
    DOI 10.1038/srep41425
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: HCAtk and pyHCA: A Toolkit and Python API for the Hydrophobic Cluster Analysis of Protein Sequences

    Bitard-Feildel, Tristan / Callebaut, Isabelle

    bioRxiv

    Abstract: Motivation: Detecting protein domains sharing no similarity to known domains, as stored in domain databases, is a challenging problem, particularly for unannotated proteomes, domains emerged recently, fast diverging proteins or domains with intrinsically ...

    Abstract Motivation: Detecting protein domains sharing no similarity to known domains, as stored in domain databases, is a challenging problem, particularly for unannotated proteomes, domains emerged recently, fast diverging proteins or domains with intrinsically disordered regions. Results: We developed pyHCA and HCAtk, a python API and standalone tool gathering together improved versions of previously developed methodologies, with new functionalities. The developed tools can be either used from command line or from a python API. Availability: HCAtk and pyHCA are available at https://github.com/T-B-F/pyHCA under the CeCILL-C license.
    Keywords covid19
    Publisher BioRxiv; MedRxiv
    Document type Article ; Online
    DOI 10.1101/249995
    Database COVID19

    Kategorien

  6. Article ; Online: Toxoplasma membrane inositol phospholipid binding protein TgREMIND is essential for secretory organelle function and host infection.

    Houngue, Rodrigue / Sangaré, Lamba Omar / Alayi, Tchilabalo Dilezitoko / Dieng, Aissatou / Bitard-Feildel, Tristan / Boulogne, Claire / Slomianny, Christian / Atindehou, Cynthia Menonve / Fanou, Lucie Ayi / Hathout, Yetrib / Callebaut, Isabelle / Tomavo, Stanislas

    Cell reports

    2023  Volume 43, Issue 1, Page(s) 113601

    Abstract: Apicomplexan parasites possess specialized secretory organelles called rhoptries, micronemes, and dense granules that play a vital role in host infection. In this study, we demonstrate that TgREMIND, a protein found in Toxoplasma gondii, is necessary for ...

    Abstract Apicomplexan parasites possess specialized secretory organelles called rhoptries, micronemes, and dense granules that play a vital role in host infection. In this study, we demonstrate that TgREMIND, a protein found in Toxoplasma gondii, is necessary for the biogenesis of rhoptries and dense granules. TgREMIND contains a Fes-CIP4 homology-Bin/Amphiphysin/Rvs (F-BAR) domain, which binds to membrane phospholipids, as well as a novel uncharacterized domain that we have named REMIND (regulator of membrane-interacting domain). Both the F-BAR domain and the REMIND are crucial for TgREMIND functions. When TgREMIND is depleted, there is a significant decrease in the abundance of dense granules and abnormal transparency of rhoptries, leading to a reduction in protein secretion from these organelles. The absence of TgREMIND inhibits host invasion and parasite dissemination, demonstrating that TgREMIND is essential for the proper function of critical secretory organelles required for successful infection by Toxoplasma.
    MeSH term(s) Animals ; Toxoplasma/metabolism ; Membrane Proteins/metabolism ; Protozoan Proteins/metabolism ; Organelles/metabolism ; Parasites/metabolism ; Phosphatidylinositols/metabolism
    Chemical Substances Membrane Proteins ; Protozoan Proteins ; Phosphatidylinositols
    Language English
    Publishing date 2023-12-28
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2649101-1
    ISSN 2211-1247 ; 2211-1247
    ISSN (online) 2211-1247
    ISSN 2211-1247
    DOI 10.1016/j.celrep.2023.113601
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: "Infostery" analysis of short molecular dynamics simulations identifies highly sensitive residues and predicts deleterious mutations.

    Karami, Yasaman / Bitard-Feildel, Tristan / Laine, Elodie / Carbone, Alessandra

    Scientific reports

    2018  Volume 8, Issue 1, Page(s) 16126

    Abstract: Characterizing a protein mutational landscape is a very challenging problem in Biology. Many disease-associated mutations do not seem to produce any effect on the global shape nor motions of the protein. Here, we use relatively short all-atom ... ...

    Abstract Characterizing a protein mutational landscape is a very challenging problem in Biology. Many disease-associated mutations do not seem to produce any effect on the global shape nor motions of the protein. Here, we use relatively short all-atom biomolecular simulations to predict mutational outcomes and we quantitatively assess the predictions on several hundreds of mutants. We perform simulations of the wild type and 175 mutants of PSD95's third PDZ domain in complex with its cognate ligand. By recording residue displacements correlations and interactions, we identify "communication pathways" and quantify them to predict the severity of the mutations. Moreover, we show that by exploiting simulations of the wild type, one can detect 80% of the positions highly sensitive to mutations with a precision of 89%. Importantly, our analysis describes the role of these positions in the inter-residue communication and dynamical architecture of the complex. We assess our approach on three different systems using data from deep mutational scanning experiments and high-throughput exome sequencing. We refer to our analysis as "infostery", from "info" - information - and "steric" - arrangement of residues in space. We provide a fully automated tool, COMMA2 ( www.lcqb.upmc.fr/COMMA2 ), that can be used to guide medicinal research by selecting important positions/mutations.
    MeSH term(s) Algorithms ; Amino Acids/chemistry ; Databases, Protein ; Molecular Dynamics Simulation ; Mutation/genetics ; Peptides/chemistry ; Point Mutation/genetics ; Proteins/chemistry ; Proteins/genetics
    Chemical Substances Amino Acids ; Peptides ; Proteins
    Language English
    Publishing date 2018-10-31
    Publishing country England
    Document type Journal Article
    ZDB-ID 2615211-3
    ISSN 2045-2322 ; 2045-2322
    ISSN (online) 2045-2322
    ISSN 2045-2322
    DOI 10.1038/s41598-018-34508-2
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Origins and structural properties of novel and de novo protein domains during insect evolution.

    Klasberg, Steffen / Bitard-Feildel, Tristan / Callebaut, Isabelle / Bornberg-Bauer, Erich

    The FEBS journal

    2018  Volume 285, Issue 14, Page(s) 2605–2625

    Abstract: Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we ... ...

    Abstract Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.
    MeSH term(s) Amino Acid Sequence ; Animals ; Base Sequence ; Cluster Analysis ; Computational Biology/methods ; Evolution, Molecular ; Exons ; Gene Duplication ; Gene Expression ; Gene Fusion ; Genome, Insect ; Hydrophobic and Hydrophilic Interactions ; Insect Proteins/chemistry ; Insect Proteins/genetics ; Insect Proteins/metabolism ; Insecta/classification ; Insecta/genetics ; Introns ; Phylogeny ; Protein Domains ; Selection, Genetic ; Sequence Deletion
    Chemical Substances Insect Proteins
    Language English
    Publishing date 2018-06-29
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2173655-8
    ISSN 1742-4658 ; 1742-464X
    ISSN (online) 1742-4658
    ISSN 1742-464X
    DOI 10.1111/febs.14504
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Order in Disorder as Observed by the "Hydrophobic Cluster Analysis" of Protein Sequences.

    Bitard-Feildel, Tristan / Lamiable, Alexis / Mornon, Jean-Paul / Callebaut, Isabelle

    Proteomics

    2018  Volume 18, Issue 21-22, Page(s) e1800054

    Abstract: Hydrophobic cluster analysis (HCA) is an original approach for protein sequence analysis, which provides access to the foldable repertoire of the protein universe, including yet unannotated protein segments ("dark proteome"). Foldable segments correspond ...

    Abstract Hydrophobic cluster analysis (HCA) is an original approach for protein sequence analysis, which provides access to the foldable repertoire of the protein universe, including yet unannotated protein segments ("dark proteome"). Foldable segments correspond to ordered regions, as well as to intrinsically disordered regions (IDRs) undergoing disorder to order transitions. In this review, how HCA can be used to give insight into this last category of foldable segments is illustrated, with examples matching known 3D structures. After reviewing the HCA principles, examples of short foldable segments are given, which often contain short linear motifs, typically matching hydrophobic clusters. These segments become ordered upon contact with partners, with secondary structure preferences generally corresponding to those observed in the 3D structures within the complexes. Such small foldable segments are sometimes larger than the segments of known 3D structures, including flanking hydrophobic clusters that may be critical for interaction specificity or regulation, as well as intervening sequences allowing fuzziness. Cases of larger conditionally disordered domains are also presented, with lower density in hydrophobic clusters than well-folded globular domains or with exposed hydrophobic patches, which are stabilized by interaction with partners.
    MeSH term(s) Cluster Analysis ; Hydrophobic and Hydrophilic Interactions ; Protein Structure, Secondary ; Sequence Analysis, Protein/methods
    Keywords covid19
    Language English
    Publishing date 2018-10-30
    Publishing country Germany
    Document type Journal Article ; Research Support, Non-U.S. Gov't ; Review
    ZDB-ID 2032093-0
    ISSN 1615-9861 ; 1615-9853
    ISSN (online) 1615-9861
    ISSN 1615-9853
    DOI 10.1002/pmic.201800054
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: Computational Identification of Novel Genes: Current and Future Perspectives.

    Klasberg, Steffen / Bitard-Feildel, Tristan / Mallet, Ludovic

    Bioinformatics and biology insights

    2016  Volume 10, Page(s) 121–131

    Abstract: While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of ...

    Abstract While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.
    Language English
    Publishing date 2016-08-01
    Publishing country United States
    Document type Journal Article ; Review
    ZDB-ID 2423808-9
    ISSN 1177-9322
    ISSN 1177-9322
    DOI 10.4137/BBI.S39950
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top