LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 35

Search options

  1. Article ; Online: Markov State Models: To Optimize or Not to Optimize.

    Arbon, Robert E / Zhu, Yanchen / Mey, Antonia S J S

    Journal of chemical theory and computation

    2024  Volume 20, Issue 2, Page(s) 977–988

    Abstract: Markov state models (MSM) are a popular statistical method for analyzing the conformational dynamics of proteins including protein folding. With all statistical and machine learning (ML) models, choices must be made about the modeling pipeline that ... ...

    Abstract Markov state models (MSM) are a popular statistical method for analyzing the conformational dynamics of proteins including protein folding. With all statistical and machine learning (ML) models, choices must be made about the modeling pipeline that cannot be directly learned from the data. These choices, or hyperparameters, are often evaluated by expert judgment or, in the case of MSMs, by maximizing variational scores such as the VAMP-2 score. Modern ML and statistical pipelines often use automatic hyperparameter selection techniques ranging from the simple, choosing the best score from a random selection of hyperparameters, to the complex, optimization via, e.g., Bayesian optimization. In this work, we ask whether it is possible to automatically select MSM models this way by estimating and analyzing over 16,000,000 observations from over 280,000 estimated MSMs. We find that differences in hyperparameters can change the physical interpretation of the optimization objective, making automatic selection difficult. In addition, we find that enforcing conditions of equilibrium in the VAMP scores can result in inconsistent model selection. However, other parameters that specify the VAMP-2 score (lag time and number of relaxation processes scored) have only a negligible influence on model selection. We suggest that model observables and variational scores should be only a guide to model selection and that a full investigation of the MSM properties should be undertaken when selecting hyperparameters.
    MeSH term(s) Bayes Theorem ; Vesicle-Associated Membrane Protein 2 ; Proteins ; Protein Folding ; Machine Learning ; Markov Chains
    Chemical Substances Vesicle-Associated Membrane Protein 2 ; Proteins
    Language English
    Publishing date 2024-01-01
    Publishing country United States
    Document type Journal Article
    ISSN 1549-9626
    ISSN (online) 1549-9626
    DOI 10.1021/acs.jctc.3c01134
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: SILVR: Guided Diffusion for Molecule Generation.

    Runcie, Nicholas T / Mey, Antonia S J S

    Journal of chemical information and modeling

    2023  Volume 63, Issue 19, Page(s) 5996–6005

    Abstract: Computationally generating new synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine learning models beyond conventional pharmacophoric methods have shown promise in the generation of novel ... ...

    Abstract Computationally generating new synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine learning models beyond conventional pharmacophoric methods have shown promise in the generation of novel small-molecule compounds but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 main protease fragments from Diamond XChem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning, and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.
    Language English
    Publishing date 2023-09-19
    Publishing country United States
    Document type Journal Article
    ZDB-ID 190019-5
    ISSN 1549-960X ; 0095-2338
    ISSN (online) 1549-960X
    ISSN 0095-2338
    DOI 10.1021/acs.jcim.3c00667
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Benchmarking Active Learning Protocols for Ligand-Binding Affinity Prediction.

    Gorantla, Rohan / Kubincová, Alžbeta / Suutari, Benjamin / Cossins, Benjamin P / Mey, Antonia S J S

    Journal of chemical information and modeling

    2024  Volume 64, Issue 6, Page(s) 1955–1965

    Abstract: Active learning (AL) has become a powerful tool in computational drug discovery, enabling the identification of top binders from vast molecular libraries. To design a robust AL protocol, it is important to understand the influence of AL parameters, as ... ...

    Abstract Active learning (AL) has become a powerful tool in computational drug discovery, enabling the identification of top binders from vast molecular libraries. To design a robust AL protocol, it is important to understand the influence of AL parameters, as well as the features of the data sets on the outcomes. We use four affinity data sets for different targets (TYK2, USP7, D2R, Mpro) to systematically evaluate the performance of machine learning models [Gaussian process (GP) model and Chemprop model], sample selection protocols, and the batch size based on metrics describing the overall predictive power of the model (R2, Spearman rank, root-mean-square error) as well as the accurate identification of top 2%/5% binders (Recall, F1 score). Both models have a comparable Recall of top binders on large data sets, but the GP model surpasses the Chemprop model when training data are sparse. A larger initial batch size, especially on diverse data sets, increased the Recall of both models as well as overall correlation metrics. However, for subsequent cycles, smaller batch sizes of 20 or 30 compounds proved to be desirable. Furthermore, adding artificial Gaussian noise to the data up to a certain threshold still allowed the model to identify clusters with top-scoring compounds. However, excessive noise (<1σ) did impact the model's predictive and exploitative capabilities.
    MeSH term(s) Benchmarking ; Ligands ; Machine Learning ; Drug Discovery/methods
    Chemical Substances Ligands
    Language English
    Publishing date 2024-03-06
    Publishing country United States
    Document type Journal Article
    ZDB-ID 190019-5
    ISSN 1549-960X ; 0095-2338
    ISSN (online) 1549-960X
    ISSN 0095-2338
    DOI 10.1021/acs.jcim.4c00220
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction.

    Gorantla, Rohan / Kubincová, Alžbeta / Weiße, Andrea Y / Mey, Antonia S J S

    Journal of chemical information and modeling

    2023  Volume 64, Issue 7, Page(s) 2496–2507

    Abstract: Accurate in silico prediction of protein-ligand binding affinity is important in the early stages of drug discovery. Deep learning-based methods exist but have yet to overtake more conventional methods such as giga-docking largely due to their lack of ... ...

    Abstract Accurate in silico prediction of protein-ligand binding affinity is important in the early stages of drug discovery. Deep learning-based methods exist but have yet to overtake more conventional methods such as giga-docking largely due to their lack of generalizability. To improve generalizability, we need to understand what these models learn from input protein and ligand data. We systematically investigated a sequence-based deep learning framework to assess the impact of protein and ligand encodings on predicting binding affinities for commonly used kinase data sets. The role of proteins is studied using convolutional neural network-based encodings obtained from sequences and graph neural network-based encodings enriched with structural information from contact maps. Ligand-based encodings are generated from graph-neural networks. We test different ligand perturbations by randomizing node and edge properties. For proteins, we make use of 3 different protein contact generation methods (AlphaFold2, Pconsc4, and ESM-1b) and compare these with a random control. Our investigation shows that protein encodings do not substantially impact the binding predictions, with no statistically significant difference in binding affinity for KIBA in the investigated metrics (concordance index, Pearson's R Spearman's Rank, and RMSE). Significant differences are seen for ligand encodings with random ligands and random ligand node properties, suggesting a much bigger reliance on ligand data for the learning tasks. Using different ways to combine protein and ligand encodings did not show a significant change in performance.
    MeSH term(s) Deep Learning ; Ligands ; Proteins/chemistry ; Neural Networks, Computer ; Protein Binding
    Chemical Substances Ligands ; Proteins
    Language English
    Publishing date 2023-11-20
    Publishing country United States
    Document type Journal Article
    ZDB-ID 190019-5
    ISSN 1549-960X ; 0095-2338
    ISSN (online) 1549-960X
    ISSN 0095-2338
    DOI 10.1021/acs.jcim.3c01208
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Book ; Online: SILVR

    Runcie, Nicholas T. / Mey, Antonia S. J. S.

    Guided Diffusion for Molecule Generation

    2023  

    Abstract: Computationally generating novel synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine-learning models beyond conventional pharmacophoric methods have shown promise in generating novel small ... ...

    Abstract Computationally generating novel synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine-learning models beyond conventional pharmacophoric methods have shown promise in generating novel small molecule compounds, but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 Main protease fragments from Diamond X-Chem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.

    Comment: paper, 20 paper, 11 figures
    Keywords Quantitative Biology - Biomolecules ; Statistics - Machine Learning
    Subject code 541
    Publishing date 2023-04-21
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: What geometrically constrained models can tell us about real-world protein contact maps.

    Jasmin Güven, J / Molkenthin, Nora / Mühle, Steffen / Mey, Antonia S J S

    Physical biology

    2023  Volume 20, Issue 4

    Abstract: The mechanisms by which a protein's 3D structure can be determined based on its amino acid sequence have long been one of the key mysteries of biophysics. Often simplistic models, such as those derived from geometric constraints, capture bulk real-world ... ...

    Abstract The mechanisms by which a protein's 3D structure can be determined based on its amino acid sequence have long been one of the key mysteries of biophysics. Often simplistic models, such as those derived from geometric constraints, capture bulk real-world 3D protein-protein properties well. One approach is using protein contact maps (PCMs) to better understand proteins' properties. In this study, we explore the emergent behaviour of contact maps for different geometrically constrained models and compare them to real-world protein systems. Specifically, we derive an analytical approximation for the distribution of amino acid distances, denoted as
    MeSH term(s) Protein Conformation ; Protein Folding ; Proteins/chemistry ; Amino Acids/chemistry ; Amino Acid Sequence
    Chemical Substances Proteins ; Amino Acids
    Language English
    Publishing date 2023-05-26
    Publishing country England
    Document type Journal Article
    ZDB-ID 2133216-2
    ISSN 1478-3975 ; 1478-3967
    ISSN (online) 1478-3975
    ISSN 1478-3967
    DOI 10.1088/1478-3975/acd543
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: mRNA interactions with disordered regions control protein activity.

    Luo, Yang / Pratihar, Supriya / Horste, Ellen H / Mitschka, Sibylle / Mey, Antonia S J S / Al-Hashimi, Hashim M / Mayr, Christine

    bioRxiv : the preprint server for biology

    2023  

    Abstract: The cytoplasm is compartmentalized into different translation environments. mRNAs use their 3'UTRs to localize to distinct cytoplasmic compartments, including TIS granules (TGs). Many transcription factors, including MYC, are translated in TGs. It was ... ...

    Abstract The cytoplasm is compartmentalized into different translation environments. mRNAs use their 3'UTRs to localize to distinct cytoplasmic compartments, including TIS granules (TGs). Many transcription factors, including MYC, are translated in TGs. It was shown that translation of proteins in TGs enables the formation of protein complexes that cannot be established when these proteins are translated in the cytosol, but the mechanism is poorly understood. Here we show that MYC protein complexes that involve binding to the intrinsically disordered region (IDR) of MYC are only formed when MYC is translated in TGs. TG-dependent protein complexes require TG-enriched mRNAs for assembly. These mRNAs bind to a new and widespread RNA-binding domain in neutral or negatively charged IDRs in several transcription factors, including MYC. RNA-IDR interaction changes the conformational ensemble of the IDR, enabling the formation of MYC protein complexes that act in the nucleus and control functions that cannot be accomplished by cytosolically-translated MYC. We propose that certain mRNAs have IDR chaperone activity as they control IDR conformations. In addition to post-translational modifications, we found a novel mode of protein activity regulation. Since RNA-IDR interactions are prevalent, we suggest that mRNA-dependent control of protein functional states is widespread.
    Language English
    Publishing date 2023-02-18
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.02.18.529068
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Book ; Online: What geometrically constrained folding models can tell us about real-world protein contact maps

    Molkenthin, Nora / Güven, J. J. / Mühle, Steffen / Mey, Antonia S. J. S.

    2022  

    Abstract: The mechanisms by which a protein's 3D structure can be determined based on its amino acid sequence have long been one of the key mysteries of biophysics. Often simplistic models, such as those derived from geometric constraints, capture bulk real-world ... ...

    Abstract The mechanisms by which a protein's 3D structure can be determined based on its amino acid sequence have long been one of the key mysteries of biophysics. Often simplistic models, such as those derived from geometric constraints, capture bulk real-world 3D protein-protein properties well. One approach is using protein contact maps to better understand proteins' properties. Here, we investigate the emergent behaviour of contact maps for different geometrically constrained models and real-world protein systems. We derive an analytical approximation for the distribution of model amino acid distances, $s$, by means of a mean-field approach. This approximation is then validated for simulations using a 2D and 3D random interaction model, as well as from contact maps of real-world protein data. Using data from the RCSB Protein Data Bank (PDB) and AlphaFold~2 database, the analytical approximation is fitted to protein chain lengths of $L\approx100$, $L\approx200$, and $L\approx300$. While a universal scaling behaviour for protein chains of different lengths could not be deduced, we present evidence that the amino acid distance distributions can be attributed to geometric constraints of protein chains in bulk and amino acid sequences only play a secondary role.

    Comment: 15 pages, 4 figures
    Keywords Physics - Biological Physics ; Quantitative Biology - Biomolecules
    Subject code 612
    Publishing date 2022-05-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Article ; Online: Self-organized emergence of folded protein-like network structures from geometric constraints.

    Molkenthin, Nora / Mühle, Steffen / Mey, Antonia S J S / Timme, Marc

    PloS one

    2020  Volume 15, Issue 2, Page(s) e0229230

    Abstract: The intricate three-dimensional geometries of protein tertiary structures underlie protein function and emerge through a folding process from one-dimensional chains of amino acids. The exact spatial sequence and configuration of amino acids, the ... ...

    Abstract The intricate three-dimensional geometries of protein tertiary structures underlie protein function and emerge through a folding process from one-dimensional chains of amino acids. The exact spatial sequence and configuration of amino acids, the biochemical environment and the temporal sequence of distinct interactions yield a complex folding process that cannot yet be easily tracked for all proteins. To gain qualitative insights into the fundamental mechanisms behind the folding dynamics and generic features of the folded structure, we propose a simple model of structure formation that takes into account only fundamental geometric constraints and otherwise assumes randomly paired connections. We find that despite its simplicity, the model results in a network ensemble consistent with key overall features of the ensemble of Protein Residue Networks we obtained from more than 1000 biological protein geometries as available through the Protein Data Base. Specifically, the distribution of the number of interaction neighbors a unit (amino acid) has, the scaling of the structure's spatial extent with chain length, the eigenvalue spectrum and the scaling of the smallest relaxation time with chain length are all consistent between model and real proteins. These results indicate that geometric constraints alone may already account for a number of generic features of protein tertiary structures.
    MeSH term(s) Algorithms ; Amino Acids/chemistry ; Amino Acids/metabolism ; Humans ; Models, Molecular ; Protein Conformation ; Protein Folding ; Protein Interaction Domains and Motifs ; Proteins/chemistry ; Proteins/metabolism
    Chemical Substances Amino Acids ; Proteins
    Language English
    Publishing date 2020-02-27
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ISSN 1932-6203
    ISSN (online) 1932-6203
    DOI 10.1371/journal.pone.0229230
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: Hybrid Alchemical Free Energy/Machine-Learning Methodology for the Computation of Hydration Free Energies.

    Scheen, Jenke / Wu, Wilson / Mey, Antonia S J S / Tosco, Paolo / Mackey, Mark / Michel, Julien

    Journal of chemical information and modeling

    2020  Volume 60, Issue 11, Page(s) 5331–5339

    Abstract: A methodology that combines alchemical free energy calculations (FEP) with machine learning (ML) has been developed to compute accurate absolute hydration free energies. The hybrid FEP/ML methodology was trained on a subset of the FreeSolv database and ... ...

    Abstract A methodology that combines alchemical free energy calculations (FEP) with machine learning (ML) has been developed to compute accurate absolute hydration free energies. The hybrid FEP/ML methodology was trained on a subset of the FreeSolv database and retrospectively shown to outperform most submissions from the SAMPL4 competition. Compared to pure machine-learning approaches, FEP/ML yields more precise estimates of free energies of hydration and requires a fraction of the training set size to outperform standalone FEP calculations. The ML-derived correction terms are further shown to be transferable to a range of related FEP simulation protocols. The approach may be used to inexpensively improve the accuracy of FEP calculations and to flag molecules which will benefit the most from bespoke force field parametrization efforts.
    MeSH term(s) Computer Simulation ; Entropy ; Machine Learning ; Retrospective Studies ; Thermodynamics
    Language English
    Publishing date 2020-08-04
    Publishing country United States
    Document type Journal Article
    ZDB-ID 190019-5
    ISSN 1549-960X ; 0095-2338
    ISSN (online) 1549-960X
    ISSN 0095-2338
    DOI 10.1021/acs.jcim.0c00600
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top