LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 9 of total 9

Search options

  1. Book ; Online: Sharp finite-sample concentration of independent variables

    Balsubramani, Akshay

    2020  

    Abstract: We show an extension of Sanov's theorem on large deviations, controlling the tail probabilities of i.i.d. random variables with matching concentration and anti-concentration bounds. This result has a general scope, applies to samples of any size, and has ...

    Abstract We show an extension of Sanov's theorem on large deviations, controlling the tail probabilities of i.i.d. random variables with matching concentration and anti-concentration bounds. This result has a general scope, applies to samples of any size, and has a short information-theoretic proof using elementary techniques.
    Keywords Computer Science - Machine Learning ; Computer Science - Information Theory ; Mathematics - Probability ; Statistics - Machine Learning
    Publishing date 2020-08-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Article ; Online: Accelerating in silico saturation mutagenesis using compressed sensing.

    Schreiber, Jacob / Nair, Surag / Balsubramani, Akshay / Kundaje, Anshul

    Bioinformatics (Oxford, England)

    2022  Volume 38, Issue 14, Page(s) 3557–3564

    Abstract: Motivation: In silico saturation mutagenesis (ISM) is a popular approach in computational genomics for calculating feature attributions on biological sequences that proceeds by systematically perturbing each position in a sequence and recording the ... ...

    Abstract Motivation: In silico saturation mutagenesis (ISM) is a popular approach in computational genomics for calculating feature attributions on biological sequences that proceeds by systematically perturbing each position in a sequence and recording the difference in model output. However, this method can be slow because systematically perturbing each position requires performing a number of forward passes proportional to the length of the sequence being examined.
    Results: In this work, we propose a modification of ISM that leverages the principles of compressed sensing to require only a constant number of forward passes, regardless of sequence length, when applied to models that contain operations with a limited receptive field, such as convolutions. Our method, named Yuzu, can reduce the time that ISM spends in convolution operations by several orders of magnitude and, consequently, Yuzu can speed up ISM on several commonly used architectures in genomics by over an order of magnitude. Notably, we found that Yuzu provides speedups that increase with the complexity of the convolution operation and the length of the sequence being analyzed, suggesting that Yuzu provides large benefits in realistic settings.
    Availability and implementation: We have made this tool available at https://github.com/kundajelab/yuzu.
    MeSH term(s) Mutagenesis ; Genomics/methods
    Language English
    Publishing date 2022-05-18
    Publishing country England
    Document type Journal Article ; Research Support, N.I.H., Extramural
    ZDB-ID 1422668-6
    ISSN 1367-4811 ; 1367-4803
    ISSN (online) 1367-4811
    ISSN 1367-4803
    DOI 10.1093/bioinformatics/btac385
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Article ; Online: Domain-adaptive neural networks improve cross-species prediction of transcription factor binding.

    Cochran, Kelly / Srivastava, Divyanshi / Shrikumar, Avanti / Balsubramani, Akshay / Hardison, Ross C / Kundaje, Anshul / Mahony, Shaun

    Genome research

    2022  Volume 32, Issue 3, Page(s) 512–523

    Abstract: The intrinsic DNA sequence preferences and cell type-specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of ...

    Abstract The intrinsic DNA sequence preferences and cell type-specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type-specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species-specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.
    MeSH term(s) Binding Sites ; Chromatin Immunoprecipitation Sequencing ; Computational Biology/methods ; Neural Networks, Computer ; Protein Binding ; Transcription Factors/metabolism
    Chemical Substances Transcription Factors
    Language English
    Publishing date 2022-01-18
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 1284872-4
    ISSN 1549-5469 ; 1088-9051 ; 1054-9803
    ISSN (online) 1549-5469
    ISSN 1088-9051 ; 1054-9803
    DOI 10.1101/gr.275394.121
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article: Transcription factor stoichiometry, motif affinity and syntax regulate single-cell chromatin dynamics during fibroblast reprogramming to pluripotency.

    Nair, Surag / Ameen, Mohamed / Sundaram, Laksshman / Pampari, Anusri / Schreiber, Jacob / Balsubramani, Akshay / Wang, Yu Xin / Burns, David / Blau, Helen M / Karakikes, Ioannis / Wang, Kevin C / Kundaje, Anshul

    bioRxiv : the preprint server for biology

    2023  

    Abstract: Ectopic expression ... ...

    Abstract Ectopic expression of
    Language English
    Publishing date 2023-10-21
    Publishing country United States
    Document type Preprint
    DOI 10.1101/2023.10.04.560808
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Book ; Online: An adaptive nearest neighbor rule for classification

    Balsubramani, Akshay / Dasgupta, Sanjoy / Freund, Yoav / Moran, Shay

    2019  

    Abstract: We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary ... ...

    Abstract We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger $k$ for predicting the labels of points in noisy regions.) We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, $k$-NN with an optimal choice of $k$. In particular, we derive bounds on the convergence rates of our classifier that depend on a local quantity we call the `advantage' which is significantly weaker than the Lipschitz conditions used in previous convergence rate proofs. These generalization bounds hinge on a variant of the seminal Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant concerns conditional probabilities and may be of independent interest.
    Keywords Computer Science - Machine Learning ; Computer Science - Artificial Intelligence ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2019-05-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Learning transport cost from subset correspondence

    Liu, Ruishan / Balsubramani, Akshay / Zou, James

    2019  

    Abstract: Learning to align multiple datasets is an important problem with many applications, and it is especially useful when we need to integrate multiple experiments or correct for confounding. Optimal transport (OT) is a principled approach to align datasets, ... ...

    Abstract Learning to align multiple datasets is an important problem with many applications, and it is especially useful when we need to integrate multiple experiments or correct for confounding. Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related. Reliable cost functions are typically not available and practitioners often resort to using hand-crafted or Euclidean cost even if it may not be appropriate. In this work, we investigate how to learn the cost function using a small amount of side information which is often available. The side information we consider captures subset correspondence -- i.e. certain subsets of points in the two data sets are known to be related. For example, we may have some images labeled as cars in both datasets; or we may have a common annotated cell type in single-cell data from two batches. We develop an end-to-end optimizer (OT-SI) that differentiates through the Sinkhorn algorithm and effectively learns the suitable cost function from side information. On systematic experiments in images, marriage-matching and single-cell RNA-seq, our method substantially outperform state-of-the-art benchmarks.
    Keywords Computer Science - Machine Learning ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2019-09-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Article ; Online: Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease.

    Ameen, Mohamed / Sundaram, Laksshman / Shen, Mengcheng / Banerjee, Abhimanyu / Kundu, Soumya / Nair, Surag / Shcherbina, Anna / Gu, Mingxia / Wilson, Kitchener D / Varadarajan, Avyay / Vadgama, Nirmal / Balsubramani, Akshay / Wu, Joseph C / Engreitz, Jesse M / Farh, Kyle / Karakikes, Ioannis / Wang, Kevin C / Quertermous, Thomas / Greenleaf, William J /
    Kundaje, Anshul

    Cell

    2023  Volume 185, Issue 26, Page(s) 4937–4953.e23

    Abstract: To define the multi-cellular epigenomic and transcriptional landscape of cardiac cellular development, we generated single-cell chromatin accessibility maps of human fetal heart tissues. We identified eight major differentiation trajectories involving ... ...

    Abstract To define the multi-cellular epigenomic and transcriptional landscape of cardiac cellular development, we generated single-cell chromatin accessibility maps of human fetal heart tissues. We identified eight major differentiation trajectories involving primary cardiac cell types, each associated with dynamic transcription factor (TF) activity signatures. We contrasted regulatory landscapes of iPSC-derived cardiac cell types and their in vivo counterparts, which enabled optimization of in vitro differentiation of epicardial cells. Further, we interpreted sequence based deep learning models of cell-type-resolved chromatin accessibility profiles to decipher underlying TF motif lexicons. De novo mutations predicted to affect chromatin accessibility in arterial endothelium were enriched in congenital heart disease (CHD) cases vs. controls. In vitro studies in iPSCs validated the functional impact of identified variation on the predicted developmental cell types. This work thus defines the cell-type-resolved cis-regulatory sequence determinants of heart development and identifies disruption of cell type-specific regulatory elements in CHD.
    MeSH term(s) Humans ; Chromatin/genetics ; Heart Defects, Congenital/genetics ; Heart ; Mutation ; Single-Cell Analysis
    Chemical Substances Chromatin
    Language English
    Publishing date 2023-01-07
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, Non-U.S. Gov't
    ZDB-ID 187009-9
    ISSN 1097-4172 ; 0092-8674
    ISSN (online) 1097-4172
    ISSN 0092-8674
    DOI 10.1016/j.cell.2022.11.028
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: A genome-wide atlas of co-essential modules assigns function to uncharacterized genes.

    Wainberg, Michael / Kamber, Roarke A / Balsubramani, Akshay / Meyers, Robin M / Sinnott-Armstrong, Nasa / Hornburg, Daniel / Jiang, Lihua / Chan, Joanne / Jian, Ruiqi / Gu, Mingxin / Shcherbina, Anna / Dubreuil, Michael M / Spees, Kaitlyn / Meuleman, Wouter / Snyder, Michael P / Bassik, Michael C / Kundaje, Anshul

    Nature genetics

    2021  Volume 53, Issue 5, Page(s) 638–649

    Abstract: A central question in the post-genomic era is how genes interact to form biological pathways. Measurements of gene dependency across hundreds of cell lines have been used to cluster genes into 'co-essential' pathways, but this approach has been limited ... ...

    Abstract A central question in the post-genomic era is how genes interact to form biological pathways. Measurements of gene dependency across hundreds of cell lines have been used to cluster genes into 'co-essential' pathways, but this approach has been limited by ubiquitous false positives. In the present study, we develop a statistical method that enables robust identification of gene co-essentiality and yields a genome-wide set of functional modules. This atlas recapitulates diverse pathways and protein complexes, and predicts the functions of 108 uncharacterized genes. Validating top predictions, we show that TMEM189 encodes plasmanylethanolamine desaturase, a key enzyme for plasmalogen synthesis. We also show that C15orf57 encodes a protein that binds the AP2 complex, localizes to clathrin-coated pits and enables efficient transferrin uptake. Finally, we provide an interactive webtool for the community to explore our results, which establish co-essentiality profiling as a powerful resource for biological pathway identification and discovery of new gene functions.
    MeSH term(s) Clathrin/metabolism ; Endocytosis ; Epigenesis, Genetic ; Gene Expression Regulation ; Gene Regulatory Networks ; Genes ; Genome ; HeLa Cells ; Humans ; Molecular Sequence Annotation ; Neoplasms/genetics ; Plasmalogens/biosynthesis ; Signal Transduction/genetics
    Chemical Substances Clathrin ; Plasmalogens
    Language English
    Publishing date 2021-04-15
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Extramural ; Research Support, Non-U.S. Gov't
    ZDB-ID 1108734-1
    ISSN 1546-1718 ; 1061-4036
    ISSN (online) 1546-1718
    ISSN 1061-4036
    DOI 10.1038/s41588-021-00840-z
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Book ; Online: WILDS

    Koh, Pang Wei / Sagawa, Shiori / Marklund, Henrik / Xie, Sang Michael / Zhang, Marvin / Balsubramani, Akshay / Hu, Weihua / Yasunaga, Michihiro / Phillips, Richard Lanas / Gao, Irena / Lee, Tony / David, Etienne / Stavness, Ian / Guo, Wei / Earnshaw, Berton A. / Haque, Imran S. / Beery, Sara / Leskovec, Jure / Kundaje, Anshul /
    Pierson, Emma / Levine, Sergey / Finn, Chelsea / Liang, Percy

    A Benchmark of in-the-Wild Distribution Shifts

    2020  

    Abstract: Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity, these real-world distribution shifts are ... ...

    Abstract Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity, these real-world distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts which naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training results in substantially lower out-of-distribution than in-distribution performance, and that this gap remains even with models trained by existing methods for handling distribution shifts. This underscores the need for new training methods that produce models which are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.
    Keywords Computer Science - Machine Learning
    Subject code 519
    Publishing date 2020-12-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top