LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 65

Search options

  1. Article ; Online: Quantifying common and distinct information in single-cell multimodal data with Tilted Canonical Correlation Analysis.

    Lin, Kevin Z / Zhang, Nancy R

    Proceedings of the National Academy of Sciences of the United States of America

    2023  Volume 120, Issue 32, Page(s) e2303647120

    Abstract: Multimodal single-cell technologies profile multiple modalities for each cell simultaneously, enabling a more thorough characterization of cell populations. Existing dimension-reduction methods for multimodal data capture the "union of information," ... ...

    Abstract Multimodal single-cell technologies profile multiple modalities for each cell simultaneously, enabling a more thorough characterization of cell populations. Existing dimension-reduction methods for multimodal data capture the "union of information," producing a lower-dimensional embedding that combines the information across modalities. While these tools are useful, we focus on a fundamentally different task of separating and quantifying the information among cells that is shared between the two modalities as well as unique to only one modality. Hence, we develop Tilted Canonical Correlation Analysis (Tilted-CCA), a method that decomposes a paired multimodal dataset into three lower-dimensional embeddings-one embedding captures the "intersection of information," representing the geometric relations among the cells that is common to both modalities, while the remaining two embeddings capture the "distinct information for a modality," representing the modality-specific geometric relations. We analyze single-cell multimodal datasets sequencing RNA along surface antibodies (i.e., CITE-seq) as well as RNA alongside chromatin accessibility (i.e., 10x) for blood cells and developing neurons via Tilted-CCA. These analyses show that Tilted-CCA enables meaningful visualization and quantification of the cross-modal information. Finally, Tilted-CCA's framework allows us to perform two specific downstream analyses. First, for single-cell datasets that simultaneously profile transcriptome and surface antibody markers, we show that Tilted-CCA helps design the target antibody panel to complement the transcriptome best. Second, for developmental single-cell datasets that simultaneously profile transcriptome and chromatin accessibility, we show that Tilted-CCA helps identify development-informative genes and distinguish between transient versus terminal cell types.
    MeSH term(s) Algorithms ; Canonical Correlation Analysis ; Transcriptome ; Single-Cell Analysis/methods
    Language English
    Publishing date 2023-07-31
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 209104-5
    ISSN 1091-6490 ; 0027-8424
    ISSN (online) 1091-6490
    ISSN 0027-8424
    DOI 10.1073/pnas.2303647120
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Semblance: An empirical similarity kernel on probability spaces.

    Agarwal, Divyansh / Zhang, Nancy R

    Science advances

    2019  Volume 5, Issue 12, Page(s) eaau9630

    Abstract: In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data's underlying probability distribution is unclear, the function used to compute ... ...

    Abstract In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data's underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations. The advantage of Semblance lies in its distribution-free formulation and its ability to place greater emphasis on proximity between observation pairs that fall at the outskirts of the data distribution, as opposed to those toward the center. Semblance is a valid Mercer kernel, allowing its principled use in kernel-based learning algorithms, and for any data modality. We demonstrate its consistently improved performance against conventional methods through simulations and real case studies from diverse applications in single-cell transcriptomics, image reconstruction, and financial forecasting.
    MeSH term(s) Algorithms ; Computer Simulation ; Empirical Research ; Image Processing, Computer-Assisted ; Principal Component Analysis ; Probability ; Support Vector Machine
    Language English
    Publishing date 2019-12-04
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 2810933-8
    ISSN 2375-2548 ; 2375-2548
    ISSN (online) 2375-2548
    ISSN 2375-2548
    DOI 10.1126/sciadv.aau9630
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article: Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics

    Mukherjee, Somabha / Agarwal, Divyansh / Zhang, Nancy R. / Bhattacharya, Bhaswar B.

    Journal of the American Statistical Association. 2022 Apr. 3, v. 117, no. 538

    2022  

    Abstract: In this article, we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the data from the different classes to create a graph based ... ...

    Abstract In this article, we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the data from the different classes to create a graph based on the minimum non-bipartite matching, and then utilizes the number of edges connecting data points from different classes to examine the closeness between the distributions. The proposed test is exactly distribution-free (the null distribution does not depend on the distribution of the data) and can be efficiently applied to multivariate as well as non-Euclidean data, whenever the inter-point distances are well-defined. We show that the test is universally consistent, and prove a distributional limit theorem for the test statistic under general alternatives. Through simulation studies, we demonstrate its superior performance against other common and well-known multisample tests. The method is applied to single cell transcriptomics data obtained from the peripheral blood, cancer tissue, and tumor-adjacent normal tissue of human subjects with hepatocellular carcinoma and non-small-cell lung cancer. Our method unveils patterns in how biochemical metabolic pathways are altered across immune cells in a cancer setting, depending on the tissue location. All of the methods described herein are implemented in the R package multicross. Supplementary materials for this article are available online.
    Keywords blood ; genomics ; hepatoma ; humans ; lung neoplasms ; probability ; transcriptomics
    Language English
    Dates of publication 2022-0403
    Size p. 627-638.
    Publishing place Taylor & Francis
    Document type Article
    ZDB-ID 2064981-2
    ISSN 1537-274X
    ISSN 1537-274X
    DOI 10.1080/01621459.2020.1791131
    Database NAL-Catalogue (AGRICOLA)

    More links

    Kategorien

  4. Article ; Online: Nonparametric single-cell multiomic characterization of trio relationships between transcription factors, target genes, and cis-regulatory regions.

    Jiang, Yuchao / Harigaya, Yuriko / Zhang, Zhaojun / Zhang, Hongpan / Zang, Chongzhi / Zhang, Nancy R

    Cell systems

    2022  Volume 13, Issue 9, Page(s) 737–751.e4

    Abstract: The epigenetic control of gene expression is highly cell-type and context specific. Yet, despite its complexity, gene regulatory logic can be broken down into modular components consisting of a transcription factor (TF) activating or repressing the ... ...

    Abstract The epigenetic control of gene expression is highly cell-type and context specific. Yet, despite its complexity, gene regulatory logic can be broken down into modular components consisting of a transcription factor (TF) activating or repressing the target gene expression through its binding to a cis-regulatory region. We propose a nonparametric approach, TRIPOD, to detect and characterize the three-way relationships between a TF, its target gene, and the accessibility of the TF's binding site using single-cell RNA and ATAC multiomic data. We apply TRIPOD to interrogate the cell-type-specific regulatory logic in peripheral blood mononuclear cells and contrast our results to detections from enhancer databases, cis-eQTL studies, ChIP-seq experiments, and TF knockdown/knockout studies. We then apply TRIPOD to mouse embryonic brain data and identify regulatory relationships, validated by ChIP-seq and PLAC-seq. Finally, we demonstrate TRIPOD on the SHARE-seq data of differentiating mouse hair follicle cells and identify lineage-specific regulation supported by histone marks and super-enhancer annotations. A record of this paper's transparent peer review process is included in the supplemental information.
    MeSH term(s) Animals ; Binding Sites/genetics ; Leukocytes, Mononuclear/metabolism ; Mice ; RNA ; Regulatory Sequences, Nucleic Acid ; Transcription Factors/genetics ; Transcription Factors/metabolism
    Chemical Substances Transcription Factors ; RNA (63231-63-0)
    Language English
    Publishing date 2022-09-01
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 2854138-8
    ISSN 2405-4720 ; 2405-4712
    ISSN (online) 2405-4720
    ISSN 2405-4712
    DOI 10.1016/j.cels.2022.08.004
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Book ; Online: Reply to "Issues arising from benchmarking single-cell RNA sequencing imputation methods"

    Huang, Mo / Zhang, Nancy R.

    2019  

    Abstract: In our Brief Communication (DOI:10.1038/s41592-018-0033-z), we presented the method SAVER for recovering true gene expression levels in noisy single cell RNA sequencing data. We evaluated the performance of SAVER, along with comparable methods MAGIC and ... ...

    Abstract In our Brief Communication (DOI:10.1038/s41592-018-0033-z), we presented the method SAVER for recovering true gene expression levels in noisy single cell RNA sequencing data. We evaluated the performance of SAVER, along with comparable methods MAGIC and scImpute, in an RNA FISH validation experiment and a data downsampling experiment. In a Comment [arXiv:1908.07084v1], Li & Li were concerned with the use of the downsampled datasets, specifically focusing on clustering results obtained from the Zeisel et al. data. Here, we will address these comments and, furthermore, amend the data downsampling experiment to demonstrate that the findings from the data downsampling experiment in our Brief Communication are valid.
    Keywords Statistics - Applications ; Quantitative Biology - Genomics ; Quantitative Biology - Quantitative Methods
    Publishing date 2019-09-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Article ; Online: Surface protein imputation from single cell transcriptomes by deep neural networks.

    Zhou, Zilu / Ye, Chengzhong / Wang, Jingshu / Zhang, Nancy R

    Nature communications

    2020  Volume 11, Issue 1, Page(s) 651

    Abstract: While single cell RNA sequencing (scRNA-seq) is invaluable for studying cell populations, cell-surface proteins are often integral markers of cellular function and serve as primary targets for therapeutic intervention. Here we propose a transfer learning ...

    Abstract While single cell RNA sequencing (scRNA-seq) is invaluable for studying cell populations, cell-surface proteins are often integral markers of cellular function and serve as primary targets for therapeutic intervention. Here we propose a transfer learning framework, single cell Transcriptome to Protein prediction with deep neural network (cTP-net), to impute surface protein abundances from scRNA-seq data by learning from existing single-cell multi-omic resources.
    MeSH term(s) Cells/cytology ; Cells/metabolism ; Gene Expression Profiling/methods ; Humans ; Membrane Proteins/genetics ; Membrane Proteins/metabolism ; Neural Networks, Computer ; Sequence Analysis, RNA ; Single-Cell Analysis/methods ; Transcriptome
    Chemical Substances Membrane Proteins
    Language English
    Publishing date 2020-01-31
    Publishing country England
    Document type Evaluation Study ; Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 2553671-0
    ISSN 2041-1723 ; 2041-1723
    ISSN (online) 2041-1723
    ISSN 2041-1723
    DOI 10.1038/s41467-020-14391-0
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: DENDRO: genetic heterogeneity profiling and subclone detection by single-cell RNA sequencing.

    Zhou, Zilu / Xu, Bihui / Minn, Andy / Zhang, Nancy R

    Genome biology

    2020  Volume 21, Issue 1, Page(s) 10

    Abstract: Although scRNA-seq is now ubiquitously adopted in studies of intratumor heterogeneity, detection of somatic mutations and inference of clonal membership from scRNA-seq is currently unreliable. We propose DENDRO, an analysis method for scRNA-seq data that ...

    Abstract Although scRNA-seq is now ubiquitously adopted in studies of intratumor heterogeneity, detection of somatic mutations and inference of clonal membership from scRNA-seq is currently unreliable. We propose DENDRO, an analysis method for scRNA-seq data that clusters single cells into genetically distinct subclones and reconstructs the phylogenetic tree relating the subclones. DENDRO utilizes transcribed point mutations and accounts for technical noise and expression stochasticity. We benchmark DENDRO and demonstrate its application on simulation data and real data from three cancer types. In particular, on a mouse melanoma model in response to immunotherapy, DENDRO delineates the role of neoantigens in treatment response.
    MeSH term(s) Animals ; Genetic Heterogeneity ; Genetic Techniques ; Humans ; Mice ; Neoplasms/genetics ; Phylogeny ; Single-Cell Analysis ; Software
    Language English
    Publishing date 2020-01-14
    Publishing country England
    Document type Evaluation Study ; Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1474-760X
    ISSN (online) 1474-760X
    ISSN 1474-760X
    DOI 10.1186/s13059-019-1922-x
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Integration of spatial and single-cell data across modalities with weak linkage.

    Chen, Shuxiao / Zhu, Bokai / Huang, Sijia / Hickey, John W / Lin, Kevin Z / Snyder, Michael / Greenleaf, William J / Nolan, Garry P / Zhang, Nancy R / Ma, Zongming

    bioRxiv : the preprint server for biology

    2023  

    Abstract: single-cell sequencing methods have enabled the profiling of multiple types of molecular readouts at cellular resolution, and recent developments in spatial barcoding, in situ hybridization, and in situ sequencing allow such molecular readouts to retain ... ...

    Abstract single-cell sequencing methods have enabled the profiling of multiple types of molecular readouts at cellular resolution, and recent developments in spatial barcoding, in situ hybridization, and in situ sequencing allow such molecular readouts to retain their spatial context. Since no technology can provide complete characterization across all layers of biological modalities within the same cell, there is pervasive need for computational cross-modal integration (also called diagonal integration) of single-cell and spatial omics data. For current methods, the feasibility of cross-modal integration relies on the existence of highly correlated, a priori "linked" features. When such linked features are few or uninformative, a scenario that we call "weak linkage", existing methods fail. We developed MaxFuse, a cross-modal data integration method that, through iterative co-embedding, data smoothing, and cell matching, leverages all information in each modality to obtain high-quality integration. MaxFuse is modality-agnostic and, through comprehensive benchmarks on single-cell and spatial ground-truth multiome datasets, demonstrates high robustness and accuracy in the weak linkage scenario. A prototypical example of weak linkage is the integration of spatial proteomic data with single-cell sequencing data. On two example analyses of this type, we demonstrate how MaxFuse enables the spatial consolidation of proteomic, transcriptomic and epigenomic information at single-cell resolution on the same tissue section.
    Language English
    Publishing date 2023-01-16
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.01.12.523851
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Article: Signal recovery in single cell batch integration.

    Zhang, Zhaojun / Mathew, Divij / Lim, Tristan / Mason, Kaishu / Martinez, Clara Morral / Huang, Sijia / Wherry, E John / Susztak, Katalin / Minn, Andy J / Ma, Zongming / Zhang, Nancy R

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the ... ...

    Abstract Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be "appropriately" mixed, while preserving "main cell type clusters". We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a "pool-of-controls" design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study.
    Language English
    Publishing date 2023-09-23
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.05.05.539614
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  10. Article ; Online: SCALE: modeling allele-specific gene expression by single-cell RNA sequencing.

    Jiang, Yuchao / Zhang, Nancy R / Li, Mingyao

    Genome biology

    2017  Volume 18, Issue 1, Page(s) 74

    Abstract: Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the ... ...

    Abstract Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the characterization of allele-specific bursting. Here, we propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that cis control in gene expression overwhelmingly manifests as differences in burst frequency.
    MeSH term(s) Algorithms ; Alleles ; Animals ; Blastocyst/cytology ; Cells, Cultured ; Diploidy ; Fibroblasts/cytology ; Gene Expression Profiling/methods ; Humans ; Mice ; Models, Genetic ; Sequence Analysis, RNA/methods ; Single-Cell Analysis/methods
    Language English
    Publishing date 2017-04-26
    Publishing country England
    Document type Journal Article
    ZDB-ID 2040529-7
    ISSN 1474-760X ; 1465-6914 ; 1465-6906
    ISSN (online) 1474-760X ; 1465-6914
    ISSN 1465-6906
    DOI 10.1186/s13059-017-1200-8
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top