LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 91

Search options

  1. Article ; Online: Causal inference in drug discovery and development.

    Michoel, Tom / Zhang, Jitao David

    Drug discovery today

    2023  Volume 28, Issue 10, Page(s) 103737

    Abstract: To discover new drugs is to seek and to prove causality. As an emerging approach leveraging human knowledge and creativity, data, and machine intelligence, causal inference holds the promise of reducing cognitive bias and improving decision-making in ... ...

    Abstract To discover new drugs is to seek and to prove causality. As an emerging approach leveraging human knowledge and creativity, data, and machine intelligence, causal inference holds the promise of reducing cognitive bias and improving decision-making in drug discovery. Although it has been applied across the value chain, the concepts and practice of causal inference remain obscure to many practitioners. This article offers a nontechnical introduction to causal inference, reviews its recent applications, and discusses opportunities and challenges of adopting the causal language in drug discovery and development.
    MeSH term(s) Humans ; Bias ; Causality ; Knowledge ; Drug Discovery
    Language English
    Publishing date 2023-08-15
    Publishing country England
    Document type Journal Article ; Review ; Research Support, Non-U.S. Gov't
    ZDB-ID 1324988-5
    ISSN 1878-5832 ; 1359-6446
    ISSN (online) 1878-5832
    ISSN 1359-6446
    DOI 10.1016/j.drudis.2023.103737
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality.

    Hasibi, Ramin / Michoel, Tom / Oyarzún, Diego A

    NPJ systems biology and applications

    2024  Volume 10, Issue 1, Page(s) 24

    Abstract: Genome-scale metabolic models are powerful tools for understanding cellular physiology. Flux balance analysis (FBA), in particular, is an optimization-based approach widely employed for predicting metabolic phenotypes. In model microbes such as ... ...

    Abstract Genome-scale metabolic models are powerful tools for understanding cellular physiology. Flux balance analysis (FBA), in particular, is an optimization-based approach widely employed for predicting metabolic phenotypes. In model microbes such as Escherichia coli, FBA has been successful at predicting essential genes, i.e. those genes that impair survival when deleted. A central assumption in this approach is that both wild type and deletion strains optimize the same fitness objective. Although the optimality assumption may hold for the wild type metabolic network, deletion strains are not subject to the same evolutionary pressures and knock-out mutants may steer their metabolism to meet other objectives for survival. Here, we present FlowGAT, a hybrid FBA-machine learning strategy for predicting essentiality directly from wild type metabolic phenotypes. The approach is based on graph-structured representation of metabolic fluxes predicted by FBA, where nodes correspond to enzymatic reactions and edges quantify the propagation of metabolite mass flow between a reaction and its neighbours. We integrate this information into a graph neural network that can be trained on knock-out fitness assay data. Comparisons across different model architectures reveal that FlowGAT predictions for E. coli are close to those of FBA for several growth conditions. This suggests that essentiality of enzymatic genes can be predicted by exploiting the inherent network structure of metabolism. Our approach demonstrates the benefits of combining the mechanistic insights afforded by genome-scale models with the ability of deep learning to infer patterns from complex datasets.
    MeSH term(s) Escherichia coli/genetics ; Machine Learning ; Neural Networks, Computer ; Phenotype
    Language English
    Publishing date 2024-03-06
    Publishing country England
    Document type Journal Article
    ISSN 2056-7189
    ISSN (online) 2056-7189
    DOI 10.1038/s41540-024-00348-2
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article: eQTLs as causal instruments for the reconstruction of hormone linked gene networks.

    Bankier, Sean / Michoel, Tom

    Frontiers in endocrinology

    2022  Volume 13, Page(s) 949061

    Abstract: Hormones act within in highly dynamic systems and much of the phenotypic response to variation in hormone levels is mediated by changes in gene expression. The increase in the number and power of large genetic association studies has led to the ... ...

    Abstract Hormones act within in highly dynamic systems and much of the phenotypic response to variation in hormone levels is mediated by changes in gene expression. The increase in the number and power of large genetic association studies has led to the identification of hormone linked genetic variants. However, the biological mechanisms underpinning the majority of these loci are poorly understood. The advent of affordable, high throughput next generation sequencing and readily available transcriptomic databases has shown that many of these genetic variants also associate with variation in gene expression levels as expression Quantitative Trait Loci (eQTLs). In addition to further dissecting complex genetic variation, eQTLs have been applied as tools for causal inference. Many hormone networks are driven by transcription factors, and many of these genes can be linked to eQTLs. In this mini-review, we demonstrate how causal inference and gene networks can be used to describe the impact of hormone linked genetic variation upon the transcriptome within an endocrinology context.
    MeSH term(s) Gene Regulatory Networks ; Hormones ; Polymorphism, Single Nucleotide ; Quantitative Trait Loci ; Transcriptome
    Chemical Substances Hormones
    Language English
    Publishing date 2022-08-17
    Publishing country Switzerland
    Document type Journal Article ; Review ; Research Support, Non-U.S. Gov't
    ZDB-ID 2592084-4
    ISSN 1664-2392
    ISSN 1664-2392
    DOI 10.3389/fendo.2022.949061
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders.

    Malik, Muhammad Ammar / Michoel, Tom

    G3 (Bethesda, Md.)

    2021  Volume 12, Issue 2

    Abstract: Random effects models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating ... ...

    Abstract Random effects models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effects models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here, we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result, we propose a restricted maximum-likelihood (REML) method that estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors and show that this reduces to probabilistic principal component analysis on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains greater or equal likelihood values, can be computed using standard matrix operations, results in latent factors that do not overlap with any known factors, and has a runtime reduced by several orders of magnitude. Hence, the REML method facilitates the application of random effects modeling strategies for learning latent variance components to much larger gene expression datasets than possible with current methods.
    MeSH term(s) Gene Expression ; Genome ; Likelihood Functions ; Models, Statistical
    Language English
    Publishing date 2021-12-05
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2629978-1
    ISSN 2160-1836 ; 2160-1836
    ISSN (online) 2160-1836
    ISSN 2160-1836
    DOI 10.1093/g3journal/jkab410
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast.

    Ludl, Adriaan-Alexander / Michoel, Tom

    Molecular omics

    2021  Volume 17, Issue 2, Page(s) 241–251

    Abstract: Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are ... ...

    Abstract Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the Yeastract database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.
    MeSH term(s) Computational Biology ; Databases, Genetic ; Gene Expression Regulation, Fungal/genetics ; Gene Regulatory Networks/genetics ; Genetic Variation ; Genome, Fungal/genetics ; Genomics ; Models, Genetic ; Quantitative Trait Loci/genetics ; Saccharomyces cerevisiae/genetics ; Saccharomyces cerevisiae Proteins/genetics ; Transcription Factors/genetics
    Chemical Substances Saccharomyces cerevisiae Proteins ; Stb5 protein, S cerevisiae ; Transcription Factors
    Language English
    Publishing date 2021-01-13
    Publishing country England
    Document type Journal Article
    ISSN 2515-4184
    ISSN (online) 2515-4184
    DOI 10.1039/d0mo00140f
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: A Graph Feature Auto-Encoder for the prediction of unobserved node features on biological networks.

    Hasibi, Ramin / Michoel, Tom

    BMC bioinformatics

    2021  Volume 22, Issue 1, Page(s) 525

    Abstract: Background: Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, ... ...

    Abstract Background: Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features.
    Results: We studied the representation of transcriptional, protein-protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach.
    Conclusion: Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.
    MeSH term(s) Animals ; Computational Biology ; Escherichia coli ; Gene Regulatory Networks ; Mice ; Neural Networks, Computer ; Proteins
    Chemical Substances Proteins
    Language English
    Publishing date 2021-10-27
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-021-04447-3
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables.

    Khan, Mariyam / Ludl, Adriaan / Bankier, Sean / Björkegren, Johan Lm / Michoel, Tom

    ArXiv

    2024  

    Abstract: Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, ... ...

    Abstract Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, that is, loci where the same genetic variants are associated to multiple nearby genes, MVMR can potentially be used to predict candidate causal genes. However, consensus in the field dictates that the genetic instruments in MVMR must be independent (not in linkage disequilibrium), which is usually not possible when considering a group of candidate genes from the same locus. Here we used causal inference theory to show that MVMR with correlated instruments satisfies the instrumental set condition. This is a classical result by Brito and Pearl (2002) for structural equation models that guarantees the identifiability of individual causal effects in situations where multiple exposures collectively, but not individually, separate a set of instrumental variables from an outcome variable. Extensive simulations confirmed the validity and usefulness of these theoretical results even at modest sample sizes
    Language English
    Publishing date 2024-01-11
    Publishing country United States
    Document type Preprint
    ISSN 2331-8422
    ISSN (online) 2331-8422
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Use of big data and machine learning algorithms to extract possible treatment targets in neurodevelopmental disorders.

    Malik, Muhammad Ammar / Faraone, Stephen V / Michoel, Tom / Haavik, Jan

    Pharmacology & therapeutics

    2023  Volume 250, Page(s) 108530

    Abstract: Neurodevelopmental disorders (NDDs) impact multiple aspects of an individual's functioning, including social interactions, communication, and behaviors. The underlying biological mechanisms of NDDs are not yet fully understood, and pharmacological ... ...

    Abstract Neurodevelopmental disorders (NDDs) impact multiple aspects of an individual's functioning, including social interactions, communication, and behaviors. The underlying biological mechanisms of NDDs are not yet fully understood, and pharmacological treatments have been limited in their effectiveness, in part due to the complex nature of these disorders and the heterogeneity of symptoms across individuals. Identifying genetic loci associated with NDDs can help in understanding biological mechanisms and potentially lead to the development of new treatments. However, the polygenic nature of these complex disorders has made identifying new treatment targets from genome-wide association studies (GWAS) challenging. Recent advances in the fields of big data and high-throughput tools have provided radically new insights into the underlying biological mechanism of NDDs. This paper reviews various big data approaches, including classical and more recent techniques like deep learning, which can identify potential treatment targets from GWAS and other omics data, with a particular emphasis on NDDs. We also emphasize the increasing importance of explainable and causal machine learning (ML) methods that can aid in identifying genes, molecular pathways, and more complex biological processes that may be future targets of intervention in these disorders. We conclude that these new developments in genetics and ML hold promise for advancing our understanding of NDDs and identifying novel treatment targets.
    MeSH term(s) Humans ; Genome-Wide Association Study ; Big Data ; Neurodevelopmental Disorders/drug therapy ; Neurodevelopmental Disorders/genetics ; Algorithms ; Machine Learning
    Language English
    Publishing date 2023-09-12
    Publishing country England
    Document type Journal Article ; Review ; Research Support, Non-U.S. Gov't
    ZDB-ID 194735-7
    ISSN 1879-016X ; 0163-7258
    ISSN (online) 1879-016X
    ISSN 0163-7258
    DOI 10.1016/j.pharmthera.2023.108530
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Book ; Online: Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables

    Khan, Mariyam / Ludl, Adriaan / Bankier, Sean / Bjorkegren, Johan / Michoel, Tom

    2024  

    Abstract: Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, ... ...

    Abstract Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, that is, loci where the same genetic variants are associated to multiple nearby genes, MVMR can potentially be used to predict candidate causal genes. However, consensus in the field dictates that the genetic instruments in MVMR must be independent, which is usually not possible when considering a group of candidate genes from the same locus. We used causal inference theory to show that MVMR with correlated instruments satisfies the instrumental set condition. This is a classical result by Brito and Pearl (2002) for structural equation models that guarantees the identifiability of causal effects in situations where multiple exposures collectively, but not individually, separate a set of instrumental variables from an outcome variable. Extensive simulations confirmed the validity and usefulness of these theoretical results even at modest sample sizes. Importantly, the causal effect estimates remain unbiased and their variance small when instruments are highly correlated. We applied MVMR with correlated instrumental variable sets at risk loci from genome-wide association studies (GWAS) for coronary artery disease using eQTL data from the STARNET study. Our method predicts causal genes at twelve loci, each associated with multiple colocated genes in multiple tissues. However, the extensive degree of regulatory pleiotropy across tissues and the limited number of causal variants in each locus still require that MVMR is run on a tissue-by-tissue basis, and testing all gene-tissue pairs at a given locus in a single model to predict causal gene-tissue combinations remains infeasible.

    Comment: 26 pages, 5 figures, 3 supplementary figures. Code available at https://github.com/mariyam-khan/Causal_genes_GWAS_loci_CAD . Supporting data available at ...
    Keywords Statistics - Methodology ; Quantitative Biology - Quantitative Methods
    Subject code 310
    Publishing date 2024-01-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net

    Michoel, Tom

    2017  

    Abstract: The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. However, there ...

    Abstract The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. However, there has been limited success in deriving estimates for the full posterior distribution of regression coefficients in these models, due to a need to evaluate analytically intractable partition function integrals. Here, the Fourier transform is used to express these integrals as complex-valued oscillatory integrals over "regression frequencies". This results in an analytic expansion and stationary phase approximation for the partition functions of the Bayesian lasso and elastic net, where the non-differentiability of the double-exponential prior has so far eluded such an approach. Use of this approximation leads to highly accurate numerical estimates for the expectation values and marginal posterior distributions of the regression coefficients, and allows for Bayesian inference of much higher dimensional models than previously possible.

    Comment: Switched to new NeurIPS style file; 11 pages, 3 figures + appendices 29 pages, 3 supplementary figures
    Keywords Statistics - Methodology ; Computer Science - Machine Learning ; Mathematics - Statistics Theory ; Quantitative Biology - Quantitative Methods
    Subject code 310
    Publishing date 2017-09-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top