Cargando…
Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R
BACKGROUND: No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments. METHODS: Deve...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019177/ https://www.ncbi.nlm.nih.gov/pubmed/33810787 http://dx.doi.org/10.1186/s12874-021-01252-7 |
_version_ | 1783674327078535168 |
---|---|
author | Schmidt, Carsten Oliver Struckmann, Stephan Enzenbach, Cornelia Reineke, Achim Stausberg, Jürgen Damerow, Stefan Huebner, Marianne Schmidt, Börge Sauerbrei, Willi Richter, Adrian |
author_facet | Schmidt, Carsten Oliver Struckmann, Stephan Enzenbach, Cornelia Reineke, Achim Stausberg, Jürgen Damerow, Stefan Huebner, Marianne Schmidt, Börge Sauerbrei, Willi Richter, Adrian |
author_sort | Schmidt, Carsten Oliver |
collection | PubMed |
description | BACKGROUND: No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments. METHODS: Developments were guided by the evaluation of an existing data quality framework and literature reviews. Functions for the computation of data quality indicators were written in R. The concept and implementations are illustrated based on data from the population-based Study of Health in Pomerania (SHIP). RESULTS: The data quality framework comprises 34 data quality indicators. These target four aspects of data quality: compliance with pre-specified structural and technical requirements (integrity); presence of data values (completeness); inadmissible or uncertain data values and contradictions (consistency); unexpected distributions and associations (accuracy). R functions calculate data quality metrics based on the provided study data and metadata and R Markdown reports are generated. Guidance on the concept and tools is available through a dedicated website. CONCLUSIONS: The presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring while a study is carried out as well as performing an initial data analysis before starting substantive scientific analyses but the developments are also of relevance beyond research. |
format | Online Article Text |
id | pubmed-8019177 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80191772021-04-05 Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R Schmidt, Carsten Oliver Struckmann, Stephan Enzenbach, Cornelia Reineke, Achim Stausberg, Jürgen Damerow, Stefan Huebner, Marianne Schmidt, Börge Sauerbrei, Willi Richter, Adrian BMC Med Res Methodol Research Article BACKGROUND: No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments. METHODS: Developments were guided by the evaluation of an existing data quality framework and literature reviews. Functions for the computation of data quality indicators were written in R. The concept and implementations are illustrated based on data from the population-based Study of Health in Pomerania (SHIP). RESULTS: The data quality framework comprises 34 data quality indicators. These target four aspects of data quality: compliance with pre-specified structural and technical requirements (integrity); presence of data values (completeness); inadmissible or uncertain data values and contradictions (consistency); unexpected distributions and associations (accuracy). R functions calculate data quality metrics based on the provided study data and metadata and R Markdown reports are generated. Guidance on the concept and tools is available through a dedicated website. CONCLUSIONS: The presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring while a study is carried out as well as performing an initial data analysis before starting substantive scientific analyses but the developments are also of relevance beyond research. BioMed Central 2021-04-02 /pmc/articles/PMC8019177/ /pubmed/33810787 http://dx.doi.org/10.1186/s12874-021-01252-7 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Schmidt, Carsten Oliver Struckmann, Stephan Enzenbach, Cornelia Reineke, Achim Stausberg, Jürgen Damerow, Stefan Huebner, Marianne Schmidt, Börge Sauerbrei, Willi Richter, Adrian Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R |
title | Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R |
title_full | Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R |
title_fullStr | Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R |
title_full_unstemmed | Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R |
title_short | Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R |
title_sort | facilitating harmonized data quality assessments. a data quality framework for observational health research data collections with software implementations in r |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019177/ https://www.ncbi.nlm.nih.gov/pubmed/33810787 http://dx.doi.org/10.1186/s12874-021-01252-7 |
work_keys_str_mv | AT schmidtcarstenoliver facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT struckmannstephan facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT enzenbachcornelia facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT reinekeachim facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT stausbergjurgen facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT damerowstefan facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT huebnermarianne facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT schmidtborge facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT sauerbreiwilli facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr AT richteradrian facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr |