LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 55

Search options

  1. Article ; Online: Structure-informed clustering for population stratification in association studies.

    Bose, Aritra / Burch, Myson / Chowdhury, Agniva / Paschou, Peristera / Drineas, Petros

    BMC bioinformatics

    2023  Volume 24, Issue 1, Page(s) 411

    Abstract: Background: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing ... ...

    Abstract Background: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants.
    Results: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans.
    Conclusions: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
    MeSH term(s) Humans ; Genetic Markers ; Polymorphism, Single Nucleotide ; Linkage Disequilibrium ; Phenotype ; Cluster Analysis
    Chemical Substances Genetic Markers
    Language English
    Publishing date 2023-10-31
    Publishing country England
    Document type Journal Article
    ZDB-ID 2041484-5
    ISSN 1471-2105 ; 1471-2105
    ISSN (online) 1471-2105
    ISSN 1471-2105
    DOI 10.1186/s12859-023-05511-w
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Reconstructing SNP allele and genotype frequencies from GWAS summary statistics.

    Yang, Zhiyu / Paschou, Peristera / Drineas, Petros

    Scientific reports

    2022  Volume 12, Issue 1, Page(s) 8242

    Abstract: The emergence of genome-wide association studies (GWAS) has led to the creation of large repositories of human genetic variation, creating enormous opportunities for genetic research and worldwide collaboration. Methods that are based on GWAS summary ... ...

    Abstract The emergence of genome-wide association studies (GWAS) has led to the creation of large repositories of human genetic variation, creating enormous opportunities for genetic research and worldwide collaboration. Methods that are based on GWAS summary statistics seek to leverage such records, overcoming barriers that often exist in individual-level data access while also offering significant computational savings. Such summary-statistics-based applications include GWAS meta-analysis, with and without sample overlap, and case-case GWAS. We compare performance of leading methods for summary-statistics-based genomic analysis and also introduce a novel framework that can unify usual summary-statistics-based implementations via the reconstruction of allelic and genotypic frequencies and counts (ReACt). First, we evaluate ASSET, METAL, and ReACt using both synthetic and real data for GWAS meta-analysis (with and without sample overlap) and find that, while all three methods are comparable in terms of power and error control, ReACt and METAL are faster than ASSET by a factor of at least hundred. We then proceed to evaluate performance of ReACt vs an existing method for case-case GWAS and show comparable performance, with ReACt requiring minimal underlying assumptions and being more user-friendly. Finally, ReACt allows us to evaluate, for the first time, an implementation for calculating polygenic risk score (PRS) for groups of cases and controls based on summary statistics. Our work demonstrates the power of GWAS summary-statistics-based methodologies and the proposed novel method provides a unifying framework and allows further extension of possibilities for researchers seeking to understand the genetics of complex disease.
    MeSH term(s) Alleles ; Genome-Wide Association Study ; Genotype ; Humans ; Phenotype ; Polymorphism, Single Nucleotide
    Language English
    Publishing date 2022-05-17
    Publishing country England
    Document type Journal Article ; Meta-Analysis ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 2615211-3
    ISSN 2045-2322 ; 2045-2322
    ISSN (online) 2045-2322
    ISSN 2045-2322
    DOI 10.1038/s41598-022-12185-6
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: Feature Space Sketching for Logistic Regression

    Dexter, Gregory / Khanna, Rajiv / Raheel, Jawad / Drineas, Petros

    2023  

    Abstract: We present novel bounds for coreset construction, feature selection, and dimensionality reduction for logistic regression. All three approaches can be thought of as sketching the logistic regression inputs. On the coreset construction front, we resolve ... ...

    Abstract We present novel bounds for coreset construction, feature selection, and dimensionality reduction for logistic regression. All three approaches can be thought of as sketching the logistic regression inputs. On the coreset construction front, we resolve open problems from prior work and present novel bounds for the complexity of coreset construction methods. On the feature selection and dimensionality reduction front, we initiate the study of forward error bounds for logistic regression. Our bounds are tight up to constant factors and our forward error bounds can be extended to Generalized Linear Models.
    Keywords Computer Science - Machine Learning ; Statistics - Machine Learning
    Publishing date 2023-03-24
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article: Constructing Compact Signatures for Individual Fingerprinting of Brain Connectomes.

    Ravindra, Vikram / Drineas, Petros / Grama, Ananth

    Frontiers in neuroscience

    2021  Volume 15, Page(s) 549322

    Abstract: Recent neuroimaging studies have shown that functional connectomes are unique to individuals, i.e., two distinct fMRIs taken over different sessions of the same subject are more similar in terms of their connectomes than those from two different subjects. ...

    Abstract Recent neuroimaging studies have shown that functional connectomes are unique to individuals, i.e., two distinct fMRIs taken over different sessions of the same subject are more similar in terms of their connectomes than those from two different subjects. In this study, we present new results that identify specific parts of resting state and task-specific connectomes that are responsible for the unique signatures. We show that a very small part of the connectome can be used to derive features for discriminating between individuals. A network of these features is shown to achieve excellent training and test accuracy in matching imaging datasets. We show that these features are statistically significant, robust to perturbations, invariant across populations, and are localized to a small number of structural regions of the brain. Furthermore, we show that for task-specific connectomes, the regions identified by our method are consistent with their known functional characterization. We present a new matrix sampling technique to derive computationally efficient and accurate methods for identifying the discriminating sub-connectome and support all of our claims using state-of-the-art statistical tests and computational techniques.
    Language English
    Publishing date 2021-04-06
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2411902-7
    ISSN 1662-453X ; 1662-4548
    ISSN (online) 1662-453X
    ISSN 1662-4548
    DOI 10.3389/fnins.2021.549322
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Book ; Online: Sketching Algorithms for Sparse Dictionary Learning

    Dexter, Gregory / Drineas, Petros / Woodruff, David P. / Yasuda, Taisuke

    PTAS and Turnstile Streaming

    2023  

    Abstract: Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of ... ...

    Abstract Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean $k$-means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the $n$ input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem, which generalizes to the first PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering. In particular, given a design matrix $\mathbf A\in\mathbb R^{n\times d}$ in a turnstile stream, we show an $\tilde O(nr/\epsilon^2 + dk/\epsilon)$ space upper bound for $r$-sparse dictionary learning of size $k$, an $\tilde O(n/\epsilon^2 + dk/\epsilon)$ space upper bound for $k$-means clustering, as well as an $\tilde O(n)$ space upper bound for $k$-means clustering on random order row insertion streams with a natural "bounded sensitivity" assumption. On the lower bounds side, we obtain a general $\tilde\Omega(n/\epsilon + dk/\epsilon)$ lower bound for $k$-means clustering, as well as an $\tilde\Omega(n/\epsilon^2)$ lower bound for algorithms which can estimate the cost of a single fixed set of candidate centers.

    Comment: To appear in NeurIPS 2023
    Keywords Computer Science - Data Structures and Algorithms ; Computer Science - Machine Learning
    Subject code 006 ; 005
    Publishing date 2023-10-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Refined Mechanism Design for Approximately Structured Priors via Active Regression

    Boutsikas, Christos / Drineas, Petros / Mertzanidis, Marios / Psomas, Alexandros / Verma, Paritosh

    2023  

    Abstract: We consider the problem of a revenue-maximizing seller with a large number of items $m$ for sale to $n$ strategic bidders, whose valuations are drawn independently from high-dimensional, unknown prior distributions. It is well-known that optimal and even ...

    Abstract We consider the problem of a revenue-maximizing seller with a large number of items $m$ for sale to $n$ strategic bidders, whose valuations are drawn independently from high-dimensional, unknown prior distributions. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute, and, even when they can be found, are often rife with various counter-intuitive properties. In this paper, following a model introduced recently by Cai and Daskalakis~\cite{cai2022recommender}, we consider the case that bidders' prior distributions can be well-approximated by a topic model. We design an active learning component, responsible for interacting with the bidders and outputting low-dimensional approximations of their types, and a mechanism design component, responsible for robustifying mechanisms for the low-dimensional model to work for the approximate types of the former component. On the active learning front, we cast our problem in the framework of Randomized Linear Algebra (RLA) for regression problems, allowing us to import several breakthrough results from that line of research, and adapt them to our setting. On the mechanism design front, we remove many restrictive assumptions of prior work on the type of access needed to the underlying distributions and the associated mechanisms. To the best of our knowledge, our work is the first to formulate connections between mechanism design, and RLA for active learning of regression problems, opening the door for further applications of randomized linear algebra primitives to mechanism design.

    Comment: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)
    Keywords Computer Science - Computer Science and Game Theory ; Computer Science - Data Structures and Algorithms ; Computer Science - Information Retrieval ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2023-10-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Low-Rank Updates of Matrix Square Roots

    Shumeli, Shany / Drineas, Petros / Avron, Haim

    2022  

    Abstract: Models in which the covariance matrix has the structure of a sparse matrix plus a low rank perturbation are ubiquitous in data science applications. It is often desirable for algorithms to take advantage of such structures, avoiding costly matrix ... ...

    Abstract Models in which the covariance matrix has the structure of a sparse matrix plus a low rank perturbation are ubiquitous in data science applications. It is often desirable for algorithms to take advantage of such structures, avoiding costly matrix computations that often require cubic time and quadratic storage. This is often accomplished by performing operations that maintain such structures, e.g. matrix inversion via the Sherman-Morrison-Woodbury formula. In this paper we consider the matrix square root and inverse square root operations. Given a low rank perturbation to a matrix, we argue that a low-rank approximate correction to the (inverse) square root exists. We do so by establishing a geometric decay bound on the true correction's eigenvalues. We then proceed to frame the correction as the solution of an algebraic Riccati equation, and discuss how a low-rank solution to that equation can be computed. We analyze the approximation error incurred when approximately solving the algebraic Riccati equation, providing spectral and Frobenius norm forward and backward error bounds. Finally, we describe several applications of our algorithms, and demonstrate their utility in numerical experiments.
    Keywords Mathematics - Numerical Analysis ; Computer Science - Machine Learning
    Subject code 518
    Publishing date 2022-01-31
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Article: A Fast, Provably Accurate Approximation Algorithm for Sparse Principal Component Analysis Reveals Human Genetic Variation Across the World.

    Chowdhury, Agniva / Bose, Aritra / Zhou, Samson / Woodruff, David P / Drineas, Petros

    Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )

    2022  Volume 13278, Page(s) 86–106

    Abstract: Principal component analysis (PCA) is a widely used dimensionality reduction technique in machine learning and multivariate statistics. To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been ... ...

    Abstract Principal component analysis (PCA) is a widely used dimensionality reduction technique in machine learning and multivariate statistics. To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA). In this paper, we present ThreSPCA, a provably accurate algorithm based on thresholding the Singular Value Decomposition for the SPCA problem, without imposing any restrictive assumptions on the input covariance matrix. Our thresholding algorithm is conceptually simple; much faster than current state-of-the-art; and performs well in practice. When applied to genotype data from the 1000 Genomes Project, ThreSPCA is faster than previous benchmarks, at least as accurate, and leads to a set of interpretable biomarkers, revealing genetic diversity across the world.
    Language English
    Publishing date 2022-04-29
    Publishing country Germany
    Document type Journal Article
    DOI 10.1007/978-3-031-04749-7_6
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: Multiomic approach and Mendelian randomization analysis identify causal associations between blood biomarkers and subcortical brain structure volumes.

    Jain, Pritesh R / Yates, Madison / de Celis, Carlos Rubin / Drineas, Petros / Jahanshad, Neda / Thompson, Paul / Paschou, Peristera

    NeuroImage

    2023  Volume 284, Page(s) 120466

    Abstract: Alterations in subcortical brain structure volumes have been found to be associated with several neurodegenerative and psychiatric disorders. At the same time, genome-wide association studies (GWAS) have identified numerous common variants associated ... ...

    Abstract Alterations in subcortical brain structure volumes have been found to be associated with several neurodegenerative and psychiatric disorders. At the same time, genome-wide association studies (GWAS) have identified numerous common variants associated with brain structure. In this study, we integrate these findings, aiming to identify proteins, metabolites, or microbes that have a putative causal association with subcortical brain structure volumes via a two-sample Mendelian randomization approach. This method uses genetic variants as instrument variables to identify potentially causal associations between an exposure and an outcome. The exposure data that we analyzed comprised genetic associations for 2994 plasma proteins, 237 metabolites, and 103 microbial genera. The outcome data included GWAS data for seven subcortical brain structure volumes including accumbens, amygdala, caudate, hippocampus, pallidum, putamen, and thalamus. Eleven proteins and six metabolites were found to have a significant association with subcortical structure volumes, with nine proteins and five metabolites replicated using independent exposure data. We found causal associations between accumbens volume and plasma protease c1 inhibitor as well as strong association between putamen volume and Agouti signaling protein. Among metabolites, urate had the strongest association with thalamic volume. No significant associations were detected between the microbial genera and subcortical brain structure volumes. We also observed significant enrichment for biological processes such as proteolysis, regulation of the endoplasmic reticulum apoptotic signaling pathway, and negative regulation of DNA binding. Our findings provide insights to the mechanisms through which brain volumes may be affected in the pathogenesis of neurodevelopmental and psychiatric disorders and point to potential treatment targets for disorders that are associated with subcortical brain structure volumes.
    MeSH term(s) Humans ; Mendelian Randomization Analysis ; Genome-Wide Association Study/methods ; Multiomics ; Brain/diagnostic imaging ; Brain/pathology ; Biomarkers ; Magnetic Resonance Imaging/methods
    Chemical Substances Biomarkers
    Language English
    Publishing date 2023-11-22
    Publishing country United States
    Document type Journal Article
    ZDB-ID 1147767-2
    ISSN 1095-9572 ; 1053-8119
    ISSN (online) 1095-9572
    ISSN 1053-8119
    DOI 10.1016/j.neuroimage.2023.120466
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: Multiomic approach and Mendelian randomization analysis identify causal associations between blood biomarkers and subcortical brain structure volumes.

    Jain, Pritesh / Yates, Madison / de Celis, Carlos Rubin / Drineas, Petros / Jahanshad, Neda / Thompson, Paul / Paschou, Peristera

    medRxiv : the preprint server for health sciences

    2023  

    Abstract: Alterations in subcortical brain structure volumes have been found to be associated with several neurodegenerative and psychiatric disorders. At the same time, genome-wide association studies (GWAS) have identified numerous common variants associated ... ...

    Abstract Alterations in subcortical brain structure volumes have been found to be associated with several neurodegenerative and psychiatric disorders. At the same time, genome-wide association studies (GWAS) have identified numerous common variants associated with brain structure. In this study, we integrate these findings, aiming to identify proteins, metabolites, or microbes that have a putative causal association with subcortical brain structure volumes via a two-sample Mendelian randomization approach. This method uses genetic variants as instrument variables to identify potentially causal associations between an exposure and an outcome. The exposure data that we analyzed comprised genetic associations for 2,994 plasma proteins, 237 metabolites, and 103 microbial genera. The outcome data included GWAS data for seven subcortical brain structure volumes including accumbens, amygdala, caudate, hippocampus, pallidum, putamen, and thalamus. Eleven proteins and six metabolites were found to have a significant association with subcortical structure volumes. We found causal associations between amygdala volume and granzyme A as well as association between accumbens volume and plasma protease c1 inhibitor. Among metabolites, urate had the strongest association with thalamic volume. No significant associations were detected between the microbial genera and subcortical brain structure volumes. We also observed significant enrichment for biological processes such as proteolysis, regulation of the endoplasmic reticulum apoptotic signaling pathway, and negative regulation of DNA binding. Our findings provide insights to the mechanisms through which brain volumes may be affected in the pathogenesis of neurodevelopmental and psychiatric disorders and point to potential treatment targets for disorders that are associated with subcortical brain structure volumes.
    Language English
    Publishing date 2023-04-03
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.03.30.23287968
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top