LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 6 of total 6

Search options

  1. Article ; Online: Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders.

    Malik, Muhammad Ammar / Michoel, Tom

    G3 (Bethesda, Md.)

    2021  Volume 12, Issue 2

    Abstract: Random effects models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating ... ...

    Abstract Random effects models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effects models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here, we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result, we propose a restricted maximum-likelihood (REML) method that estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors and show that this reduces to probabilistic principal component analysis on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains greater or equal likelihood values, can be computed using standard matrix operations, results in latent factors that do not overlap with any known factors, and has a runtime reduced by several orders of magnitude. Hence, the REML method facilitates the application of random effects modeling strategies for learning latent variance components to much larger gene expression datasets than possible with current methods.
    MeSH term(s) Gene Expression ; Genome ; Likelihood Functions ; Models, Statistical
    Language English
    Publishing date 2021-12-05
    Publishing country England
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2629978-1
    ISSN 2160-1836 ; 2160-1836
    ISSN (online) 2160-1836
    ISSN 2160-1836
    DOI 10.1093/g3journal/jkab410
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Use of big data and machine learning algorithms to extract possible treatment targets in neurodevelopmental disorders.

    Malik, Muhammad Ammar / Faraone, Stephen V / Michoel, Tom / Haavik, Jan

    Pharmacology & therapeutics

    2023  Volume 250, Page(s) 108530

    Abstract: Neurodevelopmental disorders (NDDs) impact multiple aspects of an individual's functioning, including social interactions, communication, and behaviors. The underlying biological mechanisms of NDDs are not yet fully understood, and pharmacological ... ...

    Abstract Neurodevelopmental disorders (NDDs) impact multiple aspects of an individual's functioning, including social interactions, communication, and behaviors. The underlying biological mechanisms of NDDs are not yet fully understood, and pharmacological treatments have been limited in their effectiveness, in part due to the complex nature of these disorders and the heterogeneity of symptoms across individuals. Identifying genetic loci associated with NDDs can help in understanding biological mechanisms and potentially lead to the development of new treatments. However, the polygenic nature of these complex disorders has made identifying new treatment targets from genome-wide association studies (GWAS) challenging. Recent advances in the fields of big data and high-throughput tools have provided radically new insights into the underlying biological mechanism of NDDs. This paper reviews various big data approaches, including classical and more recent techniques like deep learning, which can identify potential treatment targets from GWAS and other omics data, with a particular emphasis on NDDs. We also emphasize the increasing importance of explainable and causal machine learning (ML) methods that can aid in identifying genes, molecular pathways, and more complex biological processes that may be future targets of intervention in these disorders. We conclude that these new developments in genetics and ML hold promise for advancing our understanding of NDDs and identifying novel treatment targets.
    MeSH term(s) Humans ; Genome-Wide Association Study ; Big Data ; Neurodevelopmental Disorders/drug therapy ; Neurodevelopmental Disorders/genetics ; Algorithms ; Machine Learning
    Language English
    Publishing date 2023-09-12
    Publishing country England
    Document type Journal Article ; Review ; Research Support, Non-U.S. Gov't
    ZDB-ID 194735-7
    ISSN 1879-016X ; 0163-7258
    ISSN (online) 1879-016X
    ISSN 0163-7258
    DOI 10.1016/j.pharmthera.2023.108530
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Correction: Automatic segmentation of brain MRI using a novel patch-wise U-net deep architecture.

    Lee, Bumshik / Yamanakkanavar, Nagaraj / Malik, Muhammad Ammar / Choi, Jae Young

    PloS one

    2022  Volume 17, Issue 2, Page(s) e0264231

    Abstract: This corrects the article DOI: 10.1371/journal.pone.0236493.]. ...

    Abstract [This corrects the article DOI: 10.1371/journal.pone.0236493.].
    Language English
    Publishing date 2022-02-14
    Publishing country United States
    Document type Published Erratum
    ZDB-ID 2267670-3
    ISSN 1932-6203 ; 1932-6203
    ISSN (online) 1932-6203
    ISSN 1932-6203
    DOI 10.1371/journal.pone.0264231
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: rfPhen2Gen

    Malik, Muhammad Ammar / Lundervold, Alexander S. / Michoel, Tom

    A machine learning based association study of brain imaging phenotypes to genotypes

    2022  

    Abstract: Imaging genetic studies aim to find associations between genetic variants and imaging quantitative traits. Traditional genome-wide association studies (GWAS) are based on univariate statistical tests, but when multiple traits are analyzed together they ... ...

    Abstract Imaging genetic studies aim to find associations between genetic variants and imaging quantitative traits. Traditional genome-wide association studies (GWAS) are based on univariate statistical tests, but when multiple traits are analyzed together they suffer from a multiple-testing problem and from not taking into account correlations among the traits. An alternative approach to multi-trait GWAS is to reverse the functional relation between genotypes and traits, by fitting a multivariate regression model to predict genotypes from multiple traits simultaneously. However, current reverse genotype prediction approaches are mostly based on linear models. Here, we evaluated random forest regression (RFR) as a method to predict SNPs from imaging QTs and identify biologically relevant associations. We learned machine learning models to predict 518,484 SNPs using 56 brain imaging QTs. We observed that genotype regression error is a better indicator of permutation p-value significance than genotype classification accuracy. SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest, but not ridge regression. Moreover, random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders. Feature selection identified well-known brain regions associated with AD,like the hippocampus and amygdala, as important predictors of the most significant SNPs. In summary, our results indicate that non-linear methods like random forests may offer additional insights into phenotype-genotype associations compared to traditional linear multi-variate GWAS methods.
    Keywords Quantitative Biology - Genomics ; Computer Science - Machine Learning
    Subject code 310
    Publishing date 2022-03-31
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders

    Malik, Muhammad Ammar / Michoel, Tom

    2020  

    Abstract: Random effect models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating ... ...

    Abstract Random effect models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effect models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result we propose a restricted maximum-likelihood method which estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors, and show that this reduces to probabilistic PCA on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains greater or equal likelihood values, can be computed using standard matrix operations, results in latent factors that don't overlap with any known factors, and has a runtime reduced by several orders of magnitude. Hence the restricted maximum-likelihood method facilitates the application of random effect modelling strategies for learning latent variance components to much larger gene expression datasets than possible with current methods.

    Comment: 15 pages, 4 figures, 3 supplementary figures, 17 pages supplementary methods
    Keywords Statistics - Methodology ; Computer Science - Machine Learning ; Quantitative Biology - Genomics ; Quantitative Biology - Quantitative Methods ; Statistics - Machine Learning
    Subject code 310
    Publishing date 2020-05-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: High-dimensional multi-trait GWAS by reverse prediction of genotypes

    Malik, Muhammad Ammar / Ludl, Adriaan-Alexander / Michoel, Tom

    2021  

    Abstract: Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate ... ...

    Abstract Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analysis of traits. Reverse regression, where genotypes of genetic variants are regressed on multiple traits simultaneously, has emerged as a promising approach to perform multi-trait GWAS in high-dimensional settings where the number of traits exceeds the number of samples. We extended this approach and analyzed different machine learning methods (ridge regression, random forests and support vector machines)for reverse regression in multi-trait GWAS, using genotypes, gene expression data and ground-truth transcriptional regulatory networks from the DREAM5 SysGen Challenge and from a cross between two yeast strains to evaluate methods. We found that genotype prediction performance, in terms of root mean squared error (RMSE), allowed to distinguish between genomic regions with high and low transcriptional activity. Moreover, model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans-eQTL target genes, with complementary findings across methods.
    Keywords Quantitative Biology - Genomics ; Computer Science - Machine Learning ; Quantitative Biology - Quantitative Methods ; Statistics - Methodology
    Subject code 006
    Publishing date 2021-10-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top