Cargando…

Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R

BACKGROUND: No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments. METHODS: Deve...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmidt, Carsten Oliver, Struckmann, Stephan, Enzenbach, Cornelia, Reineke, Achim, Stausberg, Jürgen, Damerow, Stefan, Huebner, Marianne, Schmidt, Börge, Sauerbrei, Willi, Richter, Adrian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019177/
https://www.ncbi.nlm.nih.gov/pubmed/33810787
http://dx.doi.org/10.1186/s12874-021-01252-7
_version_ 1783674327078535168
author Schmidt, Carsten Oliver
Struckmann, Stephan
Enzenbach, Cornelia
Reineke, Achim
Stausberg, Jürgen
Damerow, Stefan
Huebner, Marianne
Schmidt, Börge
Sauerbrei, Willi
Richter, Adrian
author_facet Schmidt, Carsten Oliver
Struckmann, Stephan
Enzenbach, Cornelia
Reineke, Achim
Stausberg, Jürgen
Damerow, Stefan
Huebner, Marianne
Schmidt, Börge
Sauerbrei, Willi
Richter, Adrian
author_sort Schmidt, Carsten Oliver
collection PubMed
description BACKGROUND: No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments. METHODS: Developments were guided by the evaluation of an existing data quality framework and literature reviews. Functions for the computation of data quality indicators were written in R. The concept and implementations are illustrated based on data from the population-based Study of Health in Pomerania (SHIP). RESULTS: The data quality framework comprises 34 data quality indicators. These target four aspects of data quality: compliance with pre-specified structural and technical requirements (integrity); presence of data values (completeness); inadmissible or uncertain data values and contradictions (consistency); unexpected distributions and associations (accuracy). R functions calculate data quality metrics based on the provided study data and metadata and R Markdown reports are generated. Guidance on the concept and tools is available through a dedicated website. CONCLUSIONS: The presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring while a study is carried out as well as performing an initial data analysis before starting substantive scientific analyses but the developments are also of relevance beyond research.
format Online
Article
Text
id pubmed-8019177
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80191772021-04-05 Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R Schmidt, Carsten Oliver Struckmann, Stephan Enzenbach, Cornelia Reineke, Achim Stausberg, Jürgen Damerow, Stefan Huebner, Marianne Schmidt, Börge Sauerbrei, Willi Richter, Adrian BMC Med Res Methodol Research Article BACKGROUND: No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments. METHODS: Developments were guided by the evaluation of an existing data quality framework and literature reviews. Functions for the computation of data quality indicators were written in R. The concept and implementations are illustrated based on data from the population-based Study of Health in Pomerania (SHIP). RESULTS: The data quality framework comprises 34 data quality indicators. These target four aspects of data quality: compliance with pre-specified structural and technical requirements (integrity); presence of data values (completeness); inadmissible or uncertain data values and contradictions (consistency); unexpected distributions and associations (accuracy). R functions calculate data quality metrics based on the provided study data and metadata and R Markdown reports are generated. Guidance on the concept and tools is available through a dedicated website. CONCLUSIONS: The presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring while a study is carried out as well as performing an initial data analysis before starting substantive scientific analyses but the developments are also of relevance beyond research. BioMed Central 2021-04-02 /pmc/articles/PMC8019177/ /pubmed/33810787 http://dx.doi.org/10.1186/s12874-021-01252-7 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Schmidt, Carsten Oliver
Struckmann, Stephan
Enzenbach, Cornelia
Reineke, Achim
Stausberg, Jürgen
Damerow, Stefan
Huebner, Marianne
Schmidt, Börge
Sauerbrei, Willi
Richter, Adrian
Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R
title Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R
title_full Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R
title_fullStr Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R
title_full_unstemmed Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R
title_short Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R
title_sort facilitating harmonized data quality assessments. a data quality framework for observational health research data collections with software implementations in r
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019177/
https://www.ncbi.nlm.nih.gov/pubmed/33810787
http://dx.doi.org/10.1186/s12874-021-01252-7
work_keys_str_mv AT schmidtcarstenoliver facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT struckmannstephan facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT enzenbachcornelia facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT reinekeachim facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT stausbergjurgen facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT damerowstefan facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT huebnermarianne facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT schmidtborge facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT sauerbreiwilli facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr
AT richteradrian facilitatingharmonizeddataqualityassessmentsadataqualityframeworkforobservationalhealthresearchdatacollectionswithsoftwareimplementationsinr