LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 24

Search options

  1. Article ; Online: Pretraining Strategies for Structure Agnostic Material Property Prediction.

    Huang, Hongshuo / Magar, Rishikesh / Barati Farimani, Amir

    Journal of chemical information and modeling

    2024  Volume 64, Issue 3, Page(s) 627–637

    Abstract: In recent years, machine learning (ML), especially graph neural network (GNN) models, has been successfully used for fast and accurate prediction of material properties. However, most ML models rely on relaxed crystal structures to develop descriptors ... ...

    Abstract In recent years, machine learning (ML), especially graph neural network (GNN) models, has been successfully used for fast and accurate prediction of material properties. However, most ML models rely on relaxed crystal structures to develop descriptors for accurate predictions. Generating these relaxed crystal structures can be expensive and time-consuming, thus requiring an additional processing step for models that rely on them. To address this challenge, structure-agnostic methods have been developed, which use fixed-length descriptors engineered based on human knowledge about the material. However, the fixed-length descriptors are often hand-engineered and require extensive domain knowledge and generally are not used in the context of learnable models which are known to have a superior performance. Recent advancements have proposed learnable frameworks that can construct representations based on stoichiometry alone, allowing the flexibility of using deep learning frameworks as well as leveraging structure-agnostic learning. In this work, we propose three different pretraining strategies that can be used to pretrain these structure-agnostic, learnable frameworks to further improve the downstream material property prediction performance. We incorporate strategies such as self-supervised learning (SSL), fingerprint learning (FL), and multimodal learning (ML) and demonstrate their efficacy on downstream tasks for the Roost architecture, a popular structure-agnostic framework. Our results show significant improvement in small data sets and data efficiency in the larger data sets, underscoring the potential of our pretrain strategies that effectively leverage unlabeled data for accurate material property prediction.
    MeSH term(s) Humans ; Machine Learning ; Neural Networks, Computer
    Language English
    Publishing date 2024-02-01
    Publishing country United States
    Document type Journal Article
    ZDB-ID 190019-5
    ISSN 1549-960X ; 0095-2338
    ISSN (online) 1549-960X
    ISSN 0095-2338
    DOI 10.1021/acs.jcim.3c00919
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: GPCR-BERT: Interpreting Sequential Design of G Protein-Coupled Receptors Using Protein Language Models.

    Kim, Seongwon / Mollaei, Parisa / Antony, Akshay / Magar, Rishikesh / Barati Farimani, Amir

    Journal of chemical information and modeling

    2024  Volume 64, Issue 4, Page(s) 1134–1144

    Abstract: With the rise of transformers and large language models (LLMs) in chemistry and biology, new avenues for the design and understanding of therapeutics have been opened up to the scientific community. Protein sequences can be modeled as language and can ... ...

    Abstract With the rise of transformers and large language models (LLMs) in chemistry and biology, new avenues for the design and understanding of therapeutics have been opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence data sets. In this letter, we developed the GPCR-BERT model for understanding the sequential design of G protein-coupled receptors (GPCRs). GPCRs are the target of over one-third of Food and Drug Administration-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship among amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, and E/DRY). By utilizing the pretrained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.
    MeSH term(s) Receptors, G-Protein-Coupled/chemistry ; Amino Acid Sequence ; Ligands
    Chemical Substances Receptors, G-Protein-Coupled ; Ligands
    Language English
    Publishing date 2024-02-10
    Publishing country United States
    Document type Journal Article
    ZDB-ID 190019-5
    ISSN 1549-960X ; 0095-2338
    ISSN (online) 1549-960X
    ISSN 0095-2338
    DOI 10.1021/acs.jcim.3c01706
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: MOFormer: Self-Supervised Transformer Model for Metal-Organic Framework Property Prediction.

    Cao, Zhonglin / Magar, Rishikesh / Wang, Yuyang / Barati Farimani, Amir

    Journal of the American Chemical Society

    2023  Volume 145, Issue 5, Page(s) 2958–2967

    Abstract: Metal-organic frameworks (MOFs) are materials with a high degree of porosity that can be used for many applications. However, the chemical space of MOFs is enormous due to the large variety of possible combinations of building blocks and topology. ... ...

    Abstract Metal-organic frameworks (MOFs) are materials with a high degree of porosity that can be used for many applications. However, the chemical space of MOFs is enormous due to the large variety of possible combinations of building blocks and topology. Discovering the optimal MOFs for specific applications requires an efficient and accurate search over countless potential candidates. Previous high-throughput screening methods using computational simulations like DFT can be time-consuming. Such methods also require the 3D atomic structures of MOFs, which adds one extra step when evaluating hypothetical MOFs. In this work, we propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs. MOFormer takes a text string representation of MOF (MOFid) as input, thus circumventing the need of obtaining the 3D structure of a hypothetical MOF and accelerating the screening process. By comparing to other descriptors such as Stoichiometric-120 and revised autocorrelations, we demonstrate that MOFormer can achieve state-of-the-art structure-agnostic prediction accuracy on all benchmarks. Furthermore, we introduce a self-supervised learning framework that pretrains the MOFormer via maximizing the cross-correlation between its structure-agnostic representations and structure-based representations of the crystal graph convolutional neural network (CGCNN) on >400k publicly available MOF data. Benchmarks show that pretraining improves the prediction accuracy of both models on various downstream prediction tasks. Furthermore, we revealed that MOFormer can be more data-efficient on quantum-chemical property prediction than structure-based CGCNN when training data is limited. Overall, MOFormer provides a novel perspective on efficient MOF property prediction using deep learning.
    Language English
    Publishing date 2023-01-27
    Publishing country United States
    Document type Journal Article
    ZDB-ID 3155-0
    ISSN 1520-5126 ; 0002-7863
    ISSN (online) 1520-5126
    ISSN 0002-7863
    DOI 10.1021/jacs.2c11420
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Forecasting COVID-19 new cases using deep learning methods.

    Xu, Lu / Magar, Rishikesh / Barati Farimani, Amir

    Computers in biology and medicine

    2022  Volume 144, Page(s) 105342

    Abstract: After nearly two years since the first identification of SARS-CoV-2 virus, the surge in cases because of virus mutations is a cause of grave public health concern across the globe. As a result of this health crisis, predicting the transmission pattern of ...

    Abstract After nearly two years since the first identification of SARS-CoV-2 virus, the surge in cases because of virus mutations is a cause of grave public health concern across the globe. As a result of this health crisis, predicting the transmission pattern of the virus is one of the most vital tasks for preparing and controlling the pandemic. In addition to mathematical models, machine learning tools, especially deep learning models have been developed for forecasting the trend of the number of patients affected by SARS-CoV-2 with great success. In this paper, three deep learning models, including CNN, LSTM, and the CNN-LSTM have been developed to predict the number of COVID-19 cases for Brazil, India and Russia. We also compare the performance of our models with the previously developed deep learning models and notice significant improvements in prediction performance. Although our models have been used only for forecasting cases in these three countries, the models can be easily applied to datasets of other countries. Among the models developed in this work, the LSTM model has the highest performance when forecasting and shows an improvement in the forecasting accuracy compared with some existing models. The research will enable accurate forecasting of the COVID-19 cases and support the global fight against the pandemic.
    MeSH term(s) COVID-19/epidemiology ; Deep Learning ; Forecasting ; Humans ; Pandemics ; SARS-CoV-2
    Language English
    Publishing date 2022-02-23
    Publishing country United States
    Document type Journal Article
    ZDB-ID 127557-4
    ISSN 1879-0534 ; 0010-4825
    ISSN (online) 1879-0534
    ISSN 0010-4825
    DOI 10.1016/j.compbiomed.2022.105342
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Book ; Online: Multimodal Language and Graph Learning of Adsorption Configuration in Catalysis

    Ock, Janghoon / Magar, Rishikesh / Antony, Akshay / Farimani, Amir Barati

    2024  

    Abstract: Adsorption energy, a reactivity descriptor, should be accurately assessed for efficient catalyst screening. This evaluation requires determining the lowest energy across various adsorption configurations on the catalytic surface. While graph neural ... ...

    Abstract Adsorption energy, a reactivity descriptor, should be accurately assessed for efficient catalyst screening. This evaluation requires determining the lowest energy across various adsorption configurations on the catalytic surface. While graph neural networks (GNNs) have gained popularity as a machine learning approach for computing the energy of catalyst systems, they rely heavily on atomic spatial coordinates and often lack clarity in their interpretations. Recent advancements in language models have broadened their applicability to predicting catalytic properties, allowing us to bypass the complexities of graph representation. These models are adept at handling textual data, making it possible to incorporate observable features in a human-readable format. However, language models encounter challenges in accurately predicting the energy of adsorption configurations, typically showing a high mean absolute error (MAE) of about 0.71 eV. Our study addresses this limitation by introducing a self-supervised multi-modal learning approach, termed graph-assisted pretraining. This method significantly reduces the MAE to 0.35 eV through a combination of data augmentation, achieving comparable accuracy with DimeNet++ while using 0.4% of its training data size. Furthermore, the Transformer encoder at the core of the language model can provide insights into the feature focus through its attention scores. This analysis shows that our multimodal training effectively redirects the model's attention toward relevant adsorption configurations from adsorbate-related features, enhancing prediction accuracy and interpretability.

    Comment: 28 pages, 6 figures
    Keywords Computer Science - Computational Engineering ; Finance ; and Science
    Subject code 006
    Publishing date 2024-01-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast.

    Wang, Yuyang / Magar, Rishikesh / Liang, Chen / Barati Farimani, Amir

    Journal of chemical information and modeling

    2022  Volume 62, Issue 11, Page(s) 2713–2725

    Abstract: Deep learning has been a prevalence in computational chemistry and widely implemented in molecular property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), has gathered growing attention for the potential to ... ...

    Abstract Deep learning has been a prevalence in computational chemistry and widely implemented in molecular property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), has gathered growing attention for the potential to learn molecular representations that generalize to the gigantic chemical space. Unlike supervised learning, SSL can directly leverage large unlabeled data, which greatly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most molecular SSL methods borrow the insights from the machine learning community but neglect the unique cheminformatics (e.g., molecular fingerprints) and multilevel graphical structures (e.g., functional groups) of molecules. In this work, we propose iMolCLR,
    MeSH term(s) Cheminformatics ; Computational Chemistry ; Machine Learning ; Neural Networks, Computer
    Language English
    Publishing date 2022-05-31
    Publishing country United States
    Document type Journal Article ; Review ; Research Support, Non-U.S. Gov't
    ZDB-ID 190019-5
    ISSN 1549-960X ; 0095-2338
    ISSN (online) 1549-960X
    ISSN 0095-2338
    DOI 10.1021/acs.jcim.2c00495
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Potential neutralizing antibodies discovered for novel corona virus using machine learning.

    Magar, Rishikesh / Yadav, Prakarsh / Barati Farimani, Amir

    Scientific reports

    2021  Volume 11, Issue 1, Page(s) 5261

    Abstract: The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. The recent outbreak of COVID-19 infected and killed thousands of people in the world. Rapid methods in finding ... ...

    Abstract The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. The recent outbreak of COVID-19 infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of SARS-CoV-2 will save the life of thousands. To predict neutralizing antibodies for SARS-CoV-2 in a high-throughput manner, in this paper, we use different machine learning (ML) model to predict the possible inhibitory synthetic antibodies for SARS-CoV-2. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, like XGBoost, Random Forest, Multilayered Perceptron, Support Vector Machine and Logistic Regression, we screened thousands of hypothetical antibody sequences and found nine stable antibodies that potentially inhibit SARS-CoV-2. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit SARS-CoV-2.
    MeSH term(s) Antibodies, Neutralizing ; High-Throughput Screening Assays/methods ; Machine Learning ; SARS-CoV-2/genetics ; SARS-CoV-2/immunology
    Chemical Substances Antibodies, Neutralizing
    Language English
    Publishing date 2021-03-04
    Publishing country England
    Document type Evaluation Study ; Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2615211-3
    ISSN 2045-2322 ; 2045-2322
    ISSN (online) 2045-2322
    ISSN 2045-2322
    DOI 10.1038/s41598-021-84637-4
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Potential neutralizing antibodies discovered for novel corona virus using machine learning

    Rishikesh Magar / Prakarsh Yadav / Amir Barati Farimani

    Scientific Reports, Vol 11, Iss 1, Pp 1-

    2021  Volume 11

    Abstract: Abstract The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. The recent outbreak of COVID-19 infected and killed thousands of people in the world. Rapid methods in ... ...

    Abstract Abstract The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. The recent outbreak of COVID-19 infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of SARS-CoV-2 will save the life of thousands. To predict neutralizing antibodies for SARS-CoV-2 in a high-throughput manner, in this paper, we use different machine learning (ML) model to predict the possible inhibitory synthetic antibodies for SARS-CoV-2. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, like XGBoost, Random Forest, Multilayered Perceptron, Support Vector Machine and Logistic Regression, we screened thousands of hypothetical antibody sequences and found nine stable antibodies that potentially inhibit SARS-CoV-2. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit SARS-CoV-2.
    Keywords Medicine ; R ; Science ; Q
    Subject code 570
    Language English
    Publishing date 2021-03-01T00:00:00Z
    Publisher Nature Portfolio
    Document type Article ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: GPT-MolBERTa

    Balaji, Suryanarayanan / Magar, Rishikesh / Jadhav, Yayati / Farimani, Amir Barati

    GPT Molecular Features Language Model for molecular property prediction

    2023  

    Abstract: With the emergence of Transformer architectures and their powerful understanding of textual data, a new horizon has opened up to predict the molecular properties based on text description. While SMILES are the most common form of representation, they are ...

    Abstract With the emergence of Transformer architectures and their powerful understanding of textual data, a new horizon has opened up to predict the molecular properties based on text description. While SMILES are the most common form of representation, they are lacking robustness, rich information and canonicity, which limit their effectiveness in becoming generalizable representations. Here, we present GPT-MolBERTa, a self-supervised large language model (LLM) which uses detailed textual descriptions of molecules to predict their properties. A text based description of 326000 molecules were collected using ChatGPT and used to train LLM to learn the representation of molecules. To predict the properties for the downstream tasks, both BERT and RoBERTa models were used in the finetuning stage. Experiments show that GPT-MolBERTa performs well on various molecule property benchmarks, and approaching state of the art performance in regression tasks. Additionally, further analysis of the attention mechanisms show that GPT-MolBERTa is able to pick up important information from the input textual data, displaying the interpretability of the model.

    Comment: Paper has 17 pages, 4 figures and 4 tables, along with 71 references
    Keywords Physics - Chemical Physics ; Computer Science - Machine Learning
    Publishing date 2023-09-20
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Crystal Twins

    Magar, Rishikesh / Wang, Yuyang / Farimani, Amir Barati

    Self-supervised Learning for Crystalline Material Property Prediction

    2022  

    Abstract: Machine learning (ML) models have been widely successful in the prediction of material properties. However, large labeled datasets required for training accurate ML models are elusive and computationally expensive to generate. Recent advances in Self- ... ...

    Abstract Machine learning (ML) models have been widely successful in the prediction of material properties. However, large labeled datasets required for training accurate ML models are elusive and computationally expensive to generate. Recent advances in Self-Supervised Learning (SSL) frameworks capable of training ML models on unlabeled data have mitigated this problem and demonstrated superior performance in computer vision and natural language processing tasks. Drawing inspiration from the developments in SSL, we introduce Crystal Twins (CT): an SSL method for crystalline materials property prediction. Using a large unlabeled dataset, we pre-train a Graph Neural Network (GNN) by applying the redundancy reduction principle to the graph latent embeddings of augmented instances obtained from the same crystalline system. By sharing the pre-trained weights when fine-tuning the GNN for regression tasks, we significantly improve the performance for 7 challenging material property prediction benchmarks

    Comment: Preprint - Under review 20 pages, 3 figures
    Keywords Computer Science - Machine Learning ; Condensed Matter - Materials Science
    Subject code 006
    Publishing date 2022-05-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top