Article ; Online: Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.
Bioinformatics (Oxford, England)
2021
Abstract: Motivation: Gene expression data is commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data ... ...
Abstract | Motivation: Gene expression data is commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data is often very high dimensional, noisy, and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise, and struggle to capture biologically relevant information. In this article we utilise external biological knowledge embedded within structures of gene interaction graphs such as protein-protein interaction networks (PPI) to guide the construction of predictive models. Results: We present GINCCo (Gene Interaction Network Constrained Construction), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represent biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularisation yielding strong predictive performance whilst drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperform SVM, Fully-Connected MLP, and Randomly-Connected MLPs despite greatly reduced model complexity. Availability: https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within protein-protein interaction networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this paper. |
---|---|
Language | English |
Publishing date | 2021-12-09 |
Publishing country | England |
Document type | Journal Article |
ZDB-ID | 1422668-6 |
ISSN | 1367-4811 ; 1367-4803 |
ISSN (online) | 1367-4811 |
ISSN | 1367-4803 |
DOI | 10.1093/bioinformatics/btab830 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
More links
Kategorien
In stock of ZB MED Cologne/Königswinter
Zs.A 2374: Show issues | Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (2.OG) ab Jg. 2022: Lesesaal (EG) |
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.