Cargando…

Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network

BACKGROUND: Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Khare, Ritu, Utidjian, Levon H., Razzaghi, Hanieh, Soucek, Victoria, Burrows, Evanette, Eckrich, Daniel, Hoyt, Richard, Weinstein, Harris, Miller, Matthew W., Soler, David, Tucker, Joshua, Bailey, L. Charles
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Ubiquity Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6676917/
https://www.ncbi.nlm.nih.gov/pubmed/31531382
http://dx.doi.org/10.5334/egems.294
_version_ 1783440853284421632
author Khare, Ritu
Utidjian, Levon H.
Razzaghi, Hanieh
Soucek, Victoria
Burrows, Evanette
Eckrich, Daniel
Hoyt, Richard
Weinstein, Harris
Miller, Matthew W.
Soler, David
Tucker, Joshua
Bailey, L. Charles
author_facet Khare, Ritu
Utidjian, Levon H.
Razzaghi, Hanieh
Soucek, Victoria
Burrows, Evanette
Eckrich, Daniel
Hoyt, Richard
Weinstein, Harris
Miller, Matthew W.
Soler, David
Tucker, Joshua
Bailey, L. Charles
author_sort Khare, Ritu
collection PubMed
description BACKGROUND: Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs. IMPLEMENTATION: Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking. RESULTS: During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment. CONCLUSIONS: In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.
format Online
Article
Text
id pubmed-6676917
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Ubiquity Press
record_format MEDLINE/PubMed
spelling pubmed-66769172019-09-17 Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network Khare, Ritu Utidjian, Levon H. Razzaghi, Hanieh Soucek, Victoria Burrows, Evanette Eckrich, Daniel Hoyt, Richard Weinstein, Harris Miller, Matthew W. Soler, David Tucker, Joshua Bailey, L. Charles EGEMS (Wash DC) Model/Framework BACKGROUND: Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs. IMPLEMENTATION: Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking. RESULTS: During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment. CONCLUSIONS: In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research. Ubiquity Press 2019-08-01 /pmc/articles/PMC6676917/ /pubmed/31531382 http://dx.doi.org/10.5334/egems.294 Text en Copyright: © 2019 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
spellingShingle Model/Framework
Khare, Ritu
Utidjian, Levon H.
Razzaghi, Hanieh
Soucek, Victoria
Burrows, Evanette
Eckrich, Daniel
Hoyt, Richard
Weinstein, Harris
Miller, Matthew W.
Soler, David
Tucker, Joshua
Bailey, L. Charles
Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network
title Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network
title_full Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network
title_fullStr Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network
title_full_unstemmed Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network
title_short Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network
title_sort design and refinement of a data quality assessment workflow for a large pediatric research network
topic Model/Framework
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6676917/
https://www.ncbi.nlm.nih.gov/pubmed/31531382
http://dx.doi.org/10.5334/egems.294
work_keys_str_mv AT khareritu designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT utidjianlevonh designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT razzaghihanieh designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT soucekvictoria designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT burrowsevanette designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT eckrichdaniel designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT hoytrichard designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT weinsteinharris designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT millermattheww designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT solerdavid designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT tuckerjoshua designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork
AT baileylcharles designandrefinementofadataqualityassessmentworkflowforalargepediatricresearchnetwork