Article ; Online: Data quality-aware genomic data integration
Computer Methods and Programs in Biomedicine Update, Vol 1, Iss , Pp 100009- (2021)
2021
Abstract: Genomic data are growing at unprecedented pace, along with new protocols, update polices, formats and guidelines, terminologies and ontologies, which are made available every day by data providers. In this continuously evolving universe, enforcing ... ...
Abstract | Genomic data are growing at unprecedented pace, along with new protocols, update polices, formats and guidelines, terminologies and ontologies, which are made available every day by data providers. In this continuously evolving universe, enforcing quality on data and metadata is increasingly critical. While many aspects of data quality are addressed at each individual source, we focus on the need for a systematic approach when data from several sources are integrated, as such integration is an essential aspect for modern genomic data analysis. Data quality must be assessed from many perspectives, including accessibility, currency, representational consistency, specificity, and reliability.In this article we review relevant literature and, based on the analysis of many datasets and platforms, we report on methods used for guaranteeing data quality while integrating heterogeneous data sources. We explore several real-world cases that are exemplary of more general underlying data quality problems and we illustrate how they can be resolved with a structured method, sensibly applicable also to other biomedical domains. The overviewed methods are implemented in a large framework for the integration of processed genomic data, which is made available to the research community for supporting tertiary data analysis over Next Generation Sequencing datasets, continuously loaded from many open data sources, bringing considerable added value to biological knowledge discovery. |
---|---|
Keywords | Data quality ; Data integration ; Data curation ; Genomic datasets ; Metadata ; Interoperability ; Computer applications to medicine. Medical informatics ; R858-859.7 |
Subject code | 004 |
Language | English |
Publishing date | 2021-01-01T00:00:00Z |
Publisher | Elsevier |
Document type | Article ; Online |
Database | BASE - Bielefeld Academic Search Engine (life sciences selection) |
Full text online
More links
Kategorien
Inter-library loan at ZB MED
Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.