Article ; Online: Multivariate phenotype analysis enables genome-wide inference of mammalian gene function.
2022 Volume 20, Issue 8, Page(s) e3001723
Abstract: The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this ... ...
Abstract | The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines. There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype-gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene-phenotype map with good sensitivity. A factor analysis of the MV model's fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability. |
---|---|
MeSH term(s) | Animals ; Databases, Factual ; Genome/genetics ; Humans ; Mammals/genetics ; Mice ; Mice, Knockout ; Molecular Sequence Annotation ; Phenotype |
Language | English |
Publishing date | 2022-08-09 |
Publishing country | United States |
Document type | Journal Article ; Research Support, Non-U.S. Gov't ; Research Support, N.I.H., Extramural |
ZDB-ID | 2126776-5 |
ISSN | 1545-7885 ; 1544-9173 |
ISSN (online) | 1545-7885 |
ISSN | 1544-9173 |
DOI | 10.1371/journal.pbio.3001723 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
More links
Kategorien
In stock of ZB MED Cologne/Königswinter
Zs.A 6193: Show issues | Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 2021: Bestellungen von Artikeln über das Online-Bestellformular ab Jg. 2022: Lesesaal (EG) |
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.