LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Your last searches

  1. AU="Chen, Dexiong"
  2. AU="Chowdhury, Sharfuddin"

Search results

Result 1 - 10 of total 33

Search options

  1. Book ; Online: Endowing Protein Language Models with Structural Knowledge

    Chen, Dexiong / Hartout, Philip / Pellizzoni, Paolo / Oliver, Carlos / Borgwardt, Karsten

    2024  

    Abstract: Understanding the relationships between protein sequence, structure and function is a long-standing biological challenge with manifold implications from drug design to our understanding of evolution. Recently, protein language models have emerged as the ... ...

    Abstract Understanding the relationships between protein sequence, structure and function is a long-standing biological challenge with manifold implications from drug design to our understanding of evolution. Recently, protein language models have emerged as the preferred method for this challenge, thanks to their ability to harness large sequence databases. Yet, their reliance on expansive sequence data and parameter sets limits their flexibility and practicality in real-world scenarios. Concurrently, the recent surge in computationally predicted protein structures unlocks new opportunities in protein representation learning. While promising, the computational burden carried by such complex data still hinders widely-adopted practical applications. To address these limitations, we introduce a novel framework that enhances protein language models by integrating protein structural data. Drawing from recent advances in graph transformers, our approach refines the self-attention mechanisms of pretrained language transformers by integrating structural information with structure extractor modules. This refined model, termed Protein Structure Transformer (PST), is further pretrained on a small protein structure database, using the same masked language modeling objective as traditional protein language models. Empirical evaluations of PST demonstrate its superior parameter efficiency relative to protein language models, despite being pretrained on a dataset comprising only 542K structures. Notably, PST consistently outperforms the state-of-the-art foundation model for protein sequences, ESM-2, setting a new benchmark in protein function prediction. Our findings underscore the potential of integrating structural information into protein language models, paving the way for more effective and efficient protein modeling Code and pretrained models are available at https://github.com/BorgwardtLab/PST.
    Keywords Quantitative Biology - Quantitative Methods ; Computer Science - Machine Learning ; Quantitative Biology - Biomolecules
    Subject code 612
    Publishing date 2024-01-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Fisher Information Embedding for Node and Graph Learning

    Chen, Dexiong / Pellizzoni, Paolo / Borgwardt, Karsten

    2023  

    Abstract: Attention-based graph neural networks (GNNs), such as graph attention networks (GATs), have become popular neural architectures for processing graph-structured data and learning node embeddings. Despite their empirical success, these models rely on ... ...

    Abstract Attention-based graph neural networks (GNNs), such as graph attention networks (GATs), have become popular neural architectures for processing graph-structured data and learning node embeddings. Despite their empirical success, these models rely on labeled data and the theoretical properties of these models have yet to be fully understood. In this work, we propose a novel attention-based node embedding framework for graphs. Our framework builds upon a hierarchical kernel for multisets of subgraphs around nodes (e.g. neighborhoods) and each kernel leverages the geometry of a smooth statistical manifold to compare pairs of multisets, by "projecting" the multisets onto the manifold. By explicitly computing node embeddings with a manifold of Gaussian mixtures, our method leads to a new attention mechanism for neighborhood aggregation. We provide theoretical insights into generalizability and expressivity of our embeddings, contributing to a deeper understanding of attention-based GNNs. We propose both efficient unsupervised and supervised methods for learning the embeddings. Through experiments on several node classification benchmarks, we demonstrate that our proposed method outperforms existing attention-based graph models like GATs. Our code is available at https://github.com/BorgwardtLab/fisher_information_embedding.

    Comment: ICML 2023
    Keywords Statistics - Machine Learning ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2023-05-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Unsupervised Manifold Alignment with Joint Multidimensional Scaling

    Chen, Dexiong / Fan, Bowen / Oliver, Carlos / Borgwardt, Karsten

    2022  

    Abstract: We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional ... ...

    Abstract We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem to simultaneously generate isometric embeddings of data and learn correspondences between instances from two different datasets, while only requiring intra-dataset pairwise dissimilarities as input. This unique characteristic makes our approach applicable to datasets without access to the input features, such as solving the inexact graph matching problem. We propose an alternating optimization scheme to solve the problem that can fully benefit from the optimization techniques for MDS and Wasserstein Procrustes. We demonstrate the effectiveness of our approach in several applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment. The implementation of our work is available at https://github.com/BorgwardtLab/JointMDS

    Comment: ICLR 2023, see https://openreview.net/forum?id=lUpjsrKItz4
    Keywords Statistics - Machine Learning ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2022-07-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Structure-Aware Transformer for Graph Representation Learning

    Chen, Dexiong / O'Bray, Leslie / Borgwardt, Karsten

    2022  

    Abstract: The Transformer architecture has gained growing attention in graph representation learning recently, as it naturally overcomes several limitations of graph neural networks (GNNs) by avoiding their strict structural inductive biases and instead only ... ...

    Abstract The Transformer architecture has gained growing attention in graph representation learning recently, as it naturally overcomes several limitations of graph neural networks (GNNs) by avoiding their strict structural inductive biases and instead only encoding the graph structure via positional encoding. Here, we show that the node representations generated by the Transformer with positional encoding do not necessarily capture structural similarity between them. To address this issue, we propose the Structure-Aware Transformer, a class of simple and flexible graph Transformers built upon a new self-attention mechanism. This new self-attention incorporates structural information into the original self-attention by extracting a subgraph representation rooted at each node before computing the attention. We propose several methods for automatically generating the subgraph representation and show theoretically that the resulting representations are at least as expressive as the subgraph representations. Empirically, our method achieves state-of-the-art performance on five graph prediction benchmarks. Our structure-aware framework can leverage any existing GNN to extract the subgraph representation, and we show that it systematically improves performance relative to the base GNN model, successfully combining the advantages of GNNs and Transformers. Our code is available at https://github.com/BorgwardtLab/SAT.

    Comment: To appear in ICML 2022
    Keywords Statistics - Machine Learning ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2022-02-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article ; Online: MetaMixUp: Learning Adaptive Interpolation Policy of MixUp With Metalearning.

    Mai, Zhijun / Hu, Guosheng / Chen, Dexiong / Shen, Fumin / Shen, Heng Tao

    IEEE transactions on neural networks and learning systems

    2022  Volume 33, Issue 7, Page(s) 3050–3064

    Abstract: MixUp is an effective data augmentation method to regularize deep neural networks via random linear interpolations between pairs of samples and their labels. It plays an important role in model regularization, semisupervised learning (SSL), and domain ... ...

    Abstract MixUp is an effective data augmentation method to regularize deep neural networks via random linear interpolations between pairs of samples and their labels. It plays an important role in model regularization, semisupervised learning (SSL), and domain adaption. However, despite its empirical success, its deficiency of randomly mixing samples has poorly been studied. Since deep networks are capable of memorizing the entire data set, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks. To overcome overfitting to corrupted samples, inspired by metalearning (learning to learn), we propose a novel technique of learning to a mixup in this work, namely, MetaMixUp. Unlike the vanilla MixUp that samples interpolation policy from a predefined distribution, this article introduces a metalearning-based online optimization approach to dynamically learn the interpolation policy in a data-adaptive way (learning to learn better). The validation set performance via metalearning captures the noisy degree, which provides optimal directions for interpolation policy learning. Furthermore, we adapt our method for pseudolabel-based SSL along with a refined pseudolabeling strategy. In our experiments, our method achieves better performance than vanilla MixUp and its variants under SL configuration. In particular, extensive experiments show that our MetaMixUp adapted SSL greatly outperforms MixUp and many state-of-the-art methods on CIFAR-10 and SVHN benchmarks under the SSL configuration.
    Language English
    Publishing date 2022-07-06
    Publishing country United States
    Document type Journal Article
    ISSN 2162-2388
    ISSN (online) 2162-2388
    DOI 10.1109/TNNLS.2020.3049011
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article: Predicting

    Kim, Taehoon / Chen, Dexiong / Hornauer, Philipp / Emmenegger, Vishalini / Bartram, Julian / Ronchi, Silvia / Hierlemann, Andreas / Schröter, Manuel / Roqueiro, Damian

    Frontiers in neuroinformatics

    2023  Volume 16, Page(s) 1032538

    Abstract: Modern Graph Neural Networks (GNNs) provide opportunities to study the determinants underlying the complex activity patterns of biological neuronal networks. In this study, we applied GNNs to a large-scale electrophysiological dataset of rodent primary ... ...

    Abstract Modern Graph Neural Networks (GNNs) provide opportunities to study the determinants underlying the complex activity patterns of biological neuronal networks. In this study, we applied GNNs to a large-scale electrophysiological dataset of rodent primary neuronal networks obtained by means of high-density microelectrode arrays (HD-MEAs). HD-MEAs allow for long-term recording of extracellular spiking activity of individual neurons and networks and enable the extraction of physiologically relevant features at the single-neuron and population level. We employed established GNNs to generate a combined representation of single-neuron and connectivity features obtained from HD-MEA data, with the ultimate goal of predicting changes in single-neuron firing rate induced by a pharmacological perturbation. The aim of the main prediction task was to assess whether single-neuron and functional connectivity features, inferred under baseline conditions, were informative for predicting changes in neuronal activity in response to a perturbation with Bicuculline, a GABA
    Language English
    Publishing date 2023-01-11
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2452979-5
    ISSN 1662-5196
    ISSN 1662-5196
    DOI 10.3389/fninf.2022.1032538
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Book ; Online: Approximate Network Motif Mining Via Graph Learning

    Oliver, Carlos / Chen, Dexiong / Mallet, Vincent / Philippopoulos, Pericles / Borgwardt, Karsten

    2022  

    Abstract: Frequent and structurally related subgraphs, also known as network motifs, are valuable features of many graph datasets. However, the high computational complexity of identifying motif sets in arbitrary datasets (motif mining) has limited their use in ... ...

    Abstract Frequent and structurally related subgraphs, also known as network motifs, are valuable features of many graph datasets. However, the high computational complexity of identifying motif sets in arbitrary datasets (motif mining) has limited their use in many real-world datasets. By automatically leveraging statistical properties of datasets, machine learning approaches have shown promise in several tasks with combinatorial complexity and are therefore a promising candidate for network motif mining. In this work we seek to facilitate the development of machine learning approaches aimed at motif mining. We propose a formulation of the motif mining problem as a node labelling task. In addition, we build benchmark datasets and evaluation metrics which test the ability of models to capture different aspects of motif discovery such as motif number, size, topology, and scarcity. Next, we propose MotiFiesta, a first attempt at solving this problem in a fully differentiable manner with promising results on challenging baselines. Finally, we demonstrate through MotiFiesta that this learning setting can be applied simultaneously to general-purpose data mining and interpretable feature extraction for graph classification tasks.
    Keywords Computer Science - Machine Learning ; Statistics - Machine Learning
    Subject code 006 ; 004
    Publishing date 2022-06-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article ; Online: Biological sequence modeling with convolutional kernel networks.

    Chen, Dexiong / Jacob, Laurent / Mairal, Julien

    Bioinformatics (Oxford, England)

    2019  Volume 35, Issue 18, Page(s) 3294–3302

    Abstract: Motivation: The growing number of annotated biological sequences available makes it possible to learn genotype-phenotype relationships from data with increasingly high accuracy. When large quantities of labeled samples are available for training a model, ...

    Abstract Motivation: The growing number of annotated biological sequences available makes it possible to learn genotype-phenotype relationships from data with increasingly high accuracy. When large quantities of labeled samples are available for training a model, convolutional neural networks can be used to predict the phenotype of unannotated sequences with good accuracy. Unfortunately, their performance with medium- or small-scale datasets is mitigated, which requires inventing new data-efficient approaches.
    Results: We introduce a hybrid approach between convolutional neural networks and kernel methods to model biological sequences. Our method enjoys the ability of convolutional neural networks to learn data representations that are adapted to a specific task, while the kernel point of view yields algorithms that perform significantly better when the amount of training data is small. We illustrate these advantages for transcription factor binding prediction and protein homology detection, and we demonstrate that our model is also simple to interpret, which is crucial for discovering predictive motifs in sequences.
    Availability and implementation: Source code is freely available at https://gitlab.inria.fr/dchen/CKN-seq.
    Supplementary information: Supplementary data are available at Bioinformatics online.
    MeSH term(s) Algorithms ; Neural Networks, Computer ; Protein Binding ; Software
    Language English
    Publishing date 2019-02-12
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btz094
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: ESCO2 promotes lung adenocarcinoma progression by regulating hnRNPA1 acetylation.

    Zhu, Hui-Er / Li, Tao / Shi, Shengnan / Chen, De-Xiong / Chen, Weiping / Chen, Hui

    Journal of experimental & clinical cancer research : CR

    2021  Volume 40, Issue 1, Page(s) 64

    Abstract: Background: Emerging evidence indicates that metabolism reprogramming and abnormal acetylation modification play an important role in lung adenocarcinoma (LUAD) progression, although the mechanism is largely unknown.: Methods: Here, we used three ... ...

    Abstract Background: Emerging evidence indicates that metabolism reprogramming and abnormal acetylation modification play an important role in lung adenocarcinoma (LUAD) progression, although the mechanism is largely unknown.
    Methods: Here, we used three public databases (Oncomine, Gene Expression Omnibus [GEO], The Cancer Genome Atlas [TCGA]) to analyze ESCO2 (establishment of cohesion 1 homolog 2) expression in LUAD. The biological function of ESCO2 was studiedusing cell proliferation, colony formation, cell migration, and invasion assays in vitro, and mouse xenograft models in vivo. ESCO2 interacting proteins were searched using gene set enrichment analysis (GSEA) and mass spectrometry. Pyruvate kinase M1/2 (PKM) mRNA splicing assay was performed using RT-PCR together with restriction digestion. LUAD cell metabolism was studied using glucose uptake assays and lactate production. ESCO2 expression was significantly upregulated in LUAD tissues, and higher ESCO2 expression indicated worse prognosis for patients with LUAD.
    Results: We found that ESCO2 promoted LUAD cell proliferation and metastasis metabolic reprogramming in vitro and in vivo. Mechanistically, ESCO2 increased hnRNPA1 (heterogeneous nuclear ribonucleoprotein A1) binding to the intronic sequences flanking exon 9 (EI9) of PKM mRNA by inhibiting hnRNPA1 nuclear translocation, eventually inhibiting PKM1 isoform formation and inducing PKM2 isoform formation.
    Conclusions: Our findings confirm that ESCO2 is a key factor in promoting LUAD malignant progression and suggest that it is a new target for treating LUAD.
    MeSH term(s) Acetylation ; Acetyltransferases/metabolism ; Adenocarcinoma of Lung/genetics ; Adenocarcinoma of Lung/metabolism ; Adenocarcinoma of Lung/pathology ; Animals ; Cell Line, Tumor ; Chromosomal Proteins, Non-Histone/metabolism ; Disease Progression ; HEK293 Cells ; Heterogeneous Nuclear Ribonucleoprotein A1/genetics ; Heterogeneous Nuclear Ribonucleoprotein A1/metabolism ; Humans ; Lung Neoplasms/genetics ; Lung Neoplasms/metabolism ; Lung Neoplasms/pathology ; Male ; Mice ; Mice, Inbred BALB C ; Mice, Inbred NOD ; Middle Aged ; Transfection
    Chemical Substances Chromosomal Proteins, Non-Histone ; Heterogeneous Nuclear Ribonucleoprotein A1 ; hnRNPA1 protein, human ; Acetyltransferases (EC 2.3.1.-) ; ESCO2 protein, human (EC 2.3.1.-)
    Language English
    Publishing date 2021-02-11
    Publishing country England
    Document type Journal Article
    ZDB-ID 803138-1
    ISSN 1756-9966 ; 0392-9078
    ISSN (online) 1756-9966
    ISSN 0392-9078
    DOI 10.1186/s13046-021-01858-1
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Book ; Online: Convolutional Kernel Networks for Graph-Structured Data

    Chen, Dexiong / Jacob, Laurent / Mairal, Julien

    2020  

    Abstract: We introduce a family of multilayer graph kernels and establish new links between graph convolutional neural networks and kernel methods. Our approach generalizes convolutional kernel networks to graph-structured data, by representing graphs as a ... ...

    Abstract We introduce a family of multilayer graph kernels and establish new links between graph convolutional neural networks and kernel methods. Our approach generalizes convolutional kernel networks to graph-structured data, by representing graphs as a sequence of kernel feature maps, where each node carries information about local graph substructures. On the one hand, the kernel point of view offers an unsupervised, expressive, and easy-to-regularize data representation, which is useful when limited samples are available. On the other hand, our model can also be trained end-to-end on large-scale data, leading to new types of graph convolutional neural networks. We show that our method achieves competitive performance on several graph classification benchmarks, while offering simple model interpretation. Our code is freely available at https://github.com/claying/GCKN.
    Keywords Statistics - Machine Learning ; Computer Science - Machine Learning
    Publishing date 2020-03-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top