LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 27

Search options

  1. Article: Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca.

    Weine, Eric / Carbonetto, Peter / Stephens, Matthew

    bioRxiv : the preprint server for biology

    2024  

    Abstract: Motivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of ... ...

    Abstract Motivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (RNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call "Alternating Poisson Regression" (APR), produces better quality fits, and in less time, than existing algorithms. APR is also memory-efficient, and lends itself to parallel implementation on multi-core processors, both of which are helpful for handling large single-cell RNA-seq data sets. We illustrate the benefits of this approach in two published single-cell RNA-seq data sets. The new algorithms are implemented in an R package, fastglmpca.
    Language English
    Publishing date 2024-03-27
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2024.03.23.586420
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article: Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model.

    Zou, Yuxin / Carbonetto, Peter / Xie, Dongyue / Wang, Gao / Stephens, Matthew

    bioRxiv : the preprint server for biology

    2024  

    Abstract: We introduce mvSuSiE, a multi-trait fine-mapping method for identifying putative causal variants from genetic association data (individual-level or summary data). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to ...

    Abstract We introduce mvSuSiE, a multi-trait fine-mapping method for identifying putative causal variants from genetic association data (individual-level or summary data). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to improve power to identify causal SNPs. Comparisons on simulated data show that mvSuSiE is competitive in speed, power and precision with existing multi-trait methods, and uniformly improves on single-trait fine-mapping (SuSiE) in each trait separately. We applied mvSuSiE to jointly fine-map 16 blood cell traits using data from the UK Biobank. By jointly analyzing the traits and modeling heterogeneous effect sharing patterns, we discovered a much larger number of causal SNPs (>3,000) compared with single-trait fine-mapping, and with narrower credible sets. mvSuSiE also more comprehensively characterized the ways in which the genetic variants affect one or more blood cell traits; 68% of causal SNPs showed significant effects in more than one blood cell type.
    Language English
    Publishing date 2024-04-18
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.04.14.536893
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article: Flexible Signal Denoising via Flexible Empirical Bayes Shrinkage.

    Xing, Zhengrong / Carbonetto, Peter / Stephens, Matthew

    Journal of machine learning research : JMLR

    2023  Volume 22

    Abstract: Signal denoising-also known as non-parametric regression-is often performed through shrinkage estimation in a transformed (e.g., wavelet) domain; shrinkage in the transformed domain corresponds to smoothing in the original domain. A key question in such ... ...

    Abstract Signal denoising-also known as non-parametric regression-is often performed through shrinkage estimation in a transformed (e.g., wavelet) domain; shrinkage in the transformed domain corresponds to smoothing in the original domain. A key question in such applications is how much to shrink, or, equivalently, how much to smooth. Empirical Bayes shrinkage methods provide an attractive solution to this problem; they use the data to estimate a distribution of underlying "effects," hence automatically select an appropriate amount of shrinkage. However, most existing implementations of empirical Bayes shrinkage are less flexible than they could be-both in their assumptions on the underlying distribution of effects, and in their ability to handle heteroskedasticity-which limits their signal denoising applications. Here we address this by adopting a particularly flexible, stable and computationally convenient empirical Bayes shrinkage method and applying it to several signal denoising problems. These applications include smoothing of Poisson data and heteroskedastic Gaussian data. We show through empirical comparisons that the results are competitive with other methods, including both simple thresholding rules and purpose-built empirical Bayes procedures. Our methods are implemented in the R package smashr, "SMoothing by Adaptive SHrinkage in R," available at https://www.github.com/stephenslab/smashr.
    Language English
    Publishing date 2023-12-19
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2042762-1
    ISSN 1533-7928 ; 1532-4435
    ISSN (online) 1533-7928
    ISSN 1532-4435
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Fine-mapping from summary data with the "Sum of Single Effects" model.

    Zou, Yuxin / Carbonetto, Peter / Wang, Gao / Stephens, Matthew

    PLoS genetics

    2022  Volume 18, Issue 7, Page(s) e1010299

    Abstract: In recent work, Wang et al introduced the "Sum of Single Effects" (SuSiE) model, and showed that it provides a simple and efficient approach to fine-mapping genetic variants from individual-level data. Here we present new methods for fitting the SuSiE ... ...

    Abstract In recent work, Wang et al introduced the "Sum of Single Effects" (SuSiE) model, and showed that it provides a simple and efficient approach to fine-mapping genetic variants from individual-level data. Here we present new methods for fitting the SuSiE model to summary data, for example to single-SNP z-scores from an association study and linkage disequilibrium (LD) values estimated from a suitable reference panel. To develop these new methods, we first describe a simple, generic strategy for extending any individual-level data method to deal with summary data. The key idea is to replace the usual regression likelihood with an analogous likelihood based on summary data. We show that existing fine-mapping methods such as FINEMAP and CAVIAR also (implicitly) use this strategy, but in different ways, and so this provides a common framework for understanding different methods for fine-mapping. We investigate other common practical issues in fine-mapping with summary data, including problems caused by inconsistencies between the z-scores and LD estimates, and we develop diagnostics to identify these inconsistencies. We also present a new refinement procedure that improves model fits in some data sets, and hence improves overall reliability of the SuSiE fine-mapping results. Detailed evaluations of fine-mapping methods in a range of simulated data sets show that SuSiE applied to summary data is competitive, in both speed and accuracy, with the best available fine-mapping methods for summary data.
    MeSH term(s) Likelihood Functions ; Linkage Disequilibrium ; Models, Genetic ; Polymorphism, Single Nucleotide/genetics ; Reproducibility of Results
    Language English
    Publishing date 2022-07-19
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2186725-2
    ISSN 1553-7404 ; 1553-7390
    ISSN (online) 1553-7404
    ISSN 1553-7390
    DOI 10.1371/journal.pgen.1010299
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership.

    Carbonetto, Peter / Luo, Kaixuan / Sarkar, Abhishek / Hung, Anthony / Tayeb, Karl / Pott, Sebastian / Stephens, Matthew

    Genome biology

    2023  Volume 24, Issue 1, Page(s) 236

    Abstract: Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other ... ...

    Abstract Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
    MeSH term(s) Single-Cell Analysis/methods ; Chromatin Immunoprecipitation Sequencing ; Algorithms ; Cluster Analysis ; Sequence Analysis, RNA/methods ; Gene Expression Profiling/methods
    Language English
    Publishing date 2023-10-19
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/s13059-023-03067-9
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article: DISSECTING TUMOR TRANSCRIPTIONAL HETEROGENEITY FROM SINGLE-CELL RNA-SEQ DATA BY GENERALIZED BINARY COVARIANCE DECOMPOSITION.

    Liu, Yusha / Carbonetto, Peter / Willwerscheid, Jason / Oakes, Scott A / Macleod, Kay F / Stephens, Matthew

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Profiling tumors with single-cell RNA sequencing (scRNA-seq) has the potential to identify recurrent patterns of transcription variation related to cancer progression, and so produce new therapeutically-relevant insights. However, the presence of strong ... ...

    Abstract Profiling tumors with single-cell RNA sequencing (scRNA-seq) has the potential to identify recurrent patterns of transcription variation related to cancer progression, and so produce new therapeutically-relevant insights. However, the presence of strong inter-tumor heterogeneity often obscures more subtle patterns that are shared across tumors, some of which may characterize clinically-relevant disease subtypes. Here we introduce a new statistical method to address this problem. We show that this method can help decompose transcriptional heterogeneity into interpretable components - including patient-specific, dataset-specific and shared components relevant to disease subtypes - and that, in the presence of strong inter-tumor heterogeneity, our method can produce more interpretable results than existing widely-used methods. Applied to data from three studies on pancreatic cancer adenocarcinoma (PDAC), our method produces a refined characterization of existing tumor subtypes (e.g. classical vs basal), and identifies a new gene expression program (GEP) that is prognostic of poor survival independent of established prognostic factors such as tumor stage and subtype. The new GEP is enriched for genes involved in a variety of stress responses, and suggests a potentially important role for the integrated stress response in PDAC development and prognosis.
    Language English
    Publishing date 2023-08-17
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.08.15.553436
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership.

    Carbonetto, Peter / Luo, Kaixuan / Sarkar, Abhishek / Hung, Anthony / Tayeb, Karl / Pott, Sebastian / Stephens, Matthew

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other ... ...

    Abstract Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
    Language English
    Publishing date 2023-09-14
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.03.03.531029
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes.

    Morgante, Fabio / Carbonetto, Peter / Wang, Gao / Zou, Yuxin / Sarkar, Abhishek / Stephens, Matthew

    PLoS genetics

    2023  Volume 19, Issue 7, Page(s) e1010539

    Abstract: Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, ... ...

    Abstract Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms.
    MeSH term(s) Bayes Theorem ; Genotype ; Phenotype ; Computer Simulation ; Models, Genetic ; Gene Expression ; Polymorphism, Single Nucleotide
    Language English
    Publishing date 2023-07-07
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 2186725-2
    ISSN 1553-7404 ; 1553-7390
    ISSN (online) 1553-7404
    ISSN 1553-7390
    DOI 10.1371/journal.pgen.1010539
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article ; Online: A simple new approach to variable selection in regression, with application to genetic fine mapping.

    Wang, Gao / Sarkar, Abhishek / Carbonetto, Peter / Stephens, Matthew

    Journal of the Royal Statistical Society. Series B, Statistical methodology

    2020  Volume 82, Issue 5, Page(s) 1273–1300

    Abstract: We introduce a simple new approach to variable selection in linear regression, with a particular focus ... ...

    Abstract We introduce a simple new approach to variable selection in linear regression, with a particular focus on
    Language English
    Publishing date 2020-07-10
    Publishing country England
    Document type Journal Article
    ZDB-ID 1490719-7
    ISSN 1467-9868 ; 0035-9246 ; 1369-7412
    ISSN (online) 1467-9868
    ISSN 0035-9246 ; 1369-7412
    DOI 10.1111/rssb.12388
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article: A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions Using Sequential Quadratic Programming.

    Kim, Youngseok / Carbonetto, Peter / Stephens, Matthew / Anitescu, Mihai

    Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America

    2020  Volume 29, Issue 2, Page(s) 261–273

    Abstract: Maximum likelihood estimation of mixture proportions has a long history, and continues to play an important role in modern statistics, including in development of nonparametric empirical Bayes methods. Maximum likelihood of mixture proportions has ... ...

    Abstract Maximum likelihood estimation of mixture proportions has a long history, and continues to play an important role in modern statistics, including in development of nonparametric empirical Bayes methods. Maximum likelihood of mixture proportions has traditionally been solved using the expectation maximization (EM) algorithm, but recent work by Koenker & Mizera shows that modern convex optimization techniques-in particular, interior point methods-are substantially faster and more accurate than EM. Here, we develop a new solution based on sequential quadratic programming (SQP). It is substantially faster than the interior point method, and just as accurate. Our approach combines several ideas: first, it solves a reformulation of the original problem; second, it uses an SQP approach to make the best use of the expensive gradient and Hessian computations; third, the SQP iterations are implemented using an active set method to exploit the sparse nature of the quadratic subproblems; fourth, it uses accurate low-rank approximations for more efficient gradient and Hessian computations. We illustrate the benefits of the SQP approach in experiments on synthetic data sets and a large genetic association data set. In large data sets (
    Language English
    Publishing date 2020-01-08
    Publishing country United States
    Document type Journal Article
    ZDB-ID 2014382-5
    ISSN 1537-2715 ; 1061-8600
    ISSN (online) 1537-2715
    ISSN 1061-8600
    DOI 10.1080/10618600.2019.1689985
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top