Cargando…

Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study

OBJECTIVES: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the...

Descripción completa

Detalles Bibliográficos
Autores principales: Dziadkowiec, Oliwier, Callahan, Tiffany, Ozkaynak, Mustafa, Reeder, Blaine, Welton, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AcademyHealth 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4933574/
https://www.ncbi.nlm.nih.gov/pubmed/27429992
http://dx.doi.org/10.13063/2327-9214.1201
_version_ 1782441191355187200
author Dziadkowiec, Oliwier
Callahan, Tiffany
Ozkaynak, Mustafa
Reeder, Blaine
Welton, John
author_facet Dziadkowiec, Oliwier
Callahan, Tiffany
Ozkaynak, Mustafa
Reeder, Blaine
Welton, John
author_sort Dziadkowiec, Oliwier
collection PubMed
description OBJECTIVES: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the DQ framework and data set not cleaned with the DQ framework. BACKGROUND: The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large data sets for analysis, the results might be erroneous, which might affect clinical decision-making or the results of Comparative Effectives Research studies. METHODS: Two emergency department (ED) data sets extracted from EPIC databases (adult ED and children ED) were used as examples for examining the five concepts of DQ based on a DQ assessment framework designed for EHR databases. The first data set contained 70,061 visits; and the second data set contained 2,815,550 visits. SPSS Syntax examples as well as step-by-step instructions of how to apply the five key DQ concepts these EHR database extracts are provided. CONCLUSIONS: SPSS Syntax to address each of the DQ concepts proposed by Kahn et al. (2012)1 was developed. The data set cleaned using Kahn’s framework yielded more accurate results than the data set cleaned without this framework. Future plans involve creating functions in R language for cleaning data extracted from the EHR as well as an R package that combines DQ checks with missing data analysis functions.
format Online
Article
Text
id pubmed-4933574
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher AcademyHealth
record_format MEDLINE/PubMed
spelling pubmed-49335742016-07-15 Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study Dziadkowiec, Oliwier Callahan, Tiffany Ozkaynak, Mustafa Reeder, Blaine Welton, John EGEMS (Wash DC) Articles OBJECTIVES: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the DQ framework and data set not cleaned with the DQ framework. BACKGROUND: The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large data sets for analysis, the results might be erroneous, which might affect clinical decision-making or the results of Comparative Effectives Research studies. METHODS: Two emergency department (ED) data sets extracted from EPIC databases (adult ED and children ED) were used as examples for examining the five concepts of DQ based on a DQ assessment framework designed for EHR databases. The first data set contained 70,061 visits; and the second data set contained 2,815,550 visits. SPSS Syntax examples as well as step-by-step instructions of how to apply the five key DQ concepts these EHR database extracts are provided. CONCLUSIONS: SPSS Syntax to address each of the DQ concepts proposed by Kahn et al. (2012)1 was developed. The data set cleaned using Kahn’s framework yielded more accurate results than the data set cleaned without this framework. Future plans involve creating functions in R language for cleaning data extracted from the EHR as well as an R package that combines DQ checks with missing data analysis functions. AcademyHealth 2016-06-24 /pmc/articles/PMC4933574/ /pubmed/27429992 http://dx.doi.org/10.13063/2327-9214.1201 Text en All eGEMs publications are licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License http://creativecommons.org/licenses/by-nc-nd/3.0/
spellingShingle Articles
Dziadkowiec, Oliwier
Callahan, Tiffany
Ozkaynak, Mustafa
Reeder, Blaine
Welton, John
Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study
title Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study
title_full Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study
title_fullStr Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study
title_full_unstemmed Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study
title_short Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study
title_sort using a data quality framework to clean data extracted from the electronic health record: a case study
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4933574/
https://www.ncbi.nlm.nih.gov/pubmed/27429992
http://dx.doi.org/10.13063/2327-9214.1201
work_keys_str_mv AT dziadkowiecoliwier usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy
AT callahantiffany usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy
AT ozkaynakmustafa usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy
AT reederblaine usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy
AT weltonjohn usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy