Artikel ; Online: Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data.
2022 Band 17, Heft 1, Seite(n) e0263344
Abstract: Motivation: Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate ... ...
Abstract | Motivation: Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate prediction of gene co-expression from such large datasets remain unclear. Especially the importance of batch effect correction is understudied. Results: We processed RNA-seq data of 68 human and 76 mouse cell types and tissues using 50 different workflows into 7,200 genome-wide gene co-expression networks. We then conducted a systematic analysis of the factors that result in high-quality co-expression predictions, focusing on normalization, batch effect correction, and measure of correlation. We confirmed the key importance of high sample counts for high-quality predictions. However, choosing a suitable normalization approach and applying batch effect correction can further improve the quality of co-expression estimates, equivalent to a >80% and >40% increase in samples. In larger datasets, batch effect removal was equivalent to a more than doubling of the sample size. Finally, Pearson correlation appears more suitable than Spearman correlation, except for smaller datasets. Conclusion: A key point for accurate prediction of gene co-expression is the collection of many samples. However, paying attention to data normalization, batch effects, and the measure of correlation can significantly improve the quality of co-expression estimates. |
---|---|
Mesh-Begriff(e) | Animals ; Databases, Genetic ; Gene Expression Regulation ; Gene Ontology ; Gene Regulatory Networks ; Genome, Human ; Humans ; Linear Models ; Mice ; Models, Genetic ; RNA-Seq ; Reproducibility of Results ; Statistics, Nonparametric |
Sprache | Englisch |
Erscheinungsdatum | 2022-01-28 |
Erscheinungsland | United States |
Dokumenttyp | Journal Article ; Research Support, Non-U.S. Gov't |
ZDB-ID | 2267670-3 |
ISSN | 1932-6203 ; 1932-6203 |
ISSN (online) | 1932-6203 |
ISSN | 1932-6203 |
DOI | 10.1371/journal.pone.0263344 |
Datenquelle | MEDical Literature Analysis and Retrieval System OnLINE |
Zusatzmaterialien
Kategorien
Über subito bestellen
Dieser Service ist kostenpflichtig (siehe Lieferbedingungen von subito). Bestellungen, die einen Artikel nebst Supplementary Material umfassen, werden grundsätzlich wie mehrfache Bestellungen bearbeitet. Gebühren fallen in diesen Fällen für jede einzelne Bestellung an.