Cargando…
Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study
OBJECTIVES: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
AcademyHealth
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4933574/ https://www.ncbi.nlm.nih.gov/pubmed/27429992 http://dx.doi.org/10.13063/2327-9214.1201 |
_version_ | 1782441191355187200 |
---|---|
author | Dziadkowiec, Oliwier Callahan, Tiffany Ozkaynak, Mustafa Reeder, Blaine Welton, John |
author_facet | Dziadkowiec, Oliwier Callahan, Tiffany Ozkaynak, Mustafa Reeder, Blaine Welton, John |
author_sort | Dziadkowiec, Oliwier |
collection | PubMed |
description | OBJECTIVES: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the DQ framework and data set not cleaned with the DQ framework. BACKGROUND: The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large data sets for analysis, the results might be erroneous, which might affect clinical decision-making or the results of Comparative Effectives Research studies. METHODS: Two emergency department (ED) data sets extracted from EPIC databases (adult ED and children ED) were used as examples for examining the five concepts of DQ based on a DQ assessment framework designed for EHR databases. The first data set contained 70,061 visits; and the second data set contained 2,815,550 visits. SPSS Syntax examples as well as step-by-step instructions of how to apply the five key DQ concepts these EHR database extracts are provided. CONCLUSIONS: SPSS Syntax to address each of the DQ concepts proposed by Kahn et al. (2012)1 was developed. The data set cleaned using Kahn’s framework yielded more accurate results than the data set cleaned without this framework. Future plans involve creating functions in R language for cleaning data extracted from the EHR as well as an R package that combines DQ checks with missing data analysis functions. |
format | Online Article Text |
id | pubmed-4933574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | AcademyHealth |
record_format | MEDLINE/PubMed |
spelling | pubmed-49335742016-07-15 Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study Dziadkowiec, Oliwier Callahan, Tiffany Ozkaynak, Mustafa Reeder, Blaine Welton, John EGEMS (Wash DC) Articles OBJECTIVES: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the DQ framework and data set not cleaned with the DQ framework. BACKGROUND: The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large data sets for analysis, the results might be erroneous, which might affect clinical decision-making or the results of Comparative Effectives Research studies. METHODS: Two emergency department (ED) data sets extracted from EPIC databases (adult ED and children ED) were used as examples for examining the five concepts of DQ based on a DQ assessment framework designed for EHR databases. The first data set contained 70,061 visits; and the second data set contained 2,815,550 visits. SPSS Syntax examples as well as step-by-step instructions of how to apply the five key DQ concepts these EHR database extracts are provided. CONCLUSIONS: SPSS Syntax to address each of the DQ concepts proposed by Kahn et al. (2012)1 was developed. The data set cleaned using Kahn’s framework yielded more accurate results than the data set cleaned without this framework. Future plans involve creating functions in R language for cleaning data extracted from the EHR as well as an R package that combines DQ checks with missing data analysis functions. AcademyHealth 2016-06-24 /pmc/articles/PMC4933574/ /pubmed/27429992 http://dx.doi.org/10.13063/2327-9214.1201 Text en All eGEMs publications are licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License http://creativecommons.org/licenses/by-nc-nd/3.0/ |
spellingShingle | Articles Dziadkowiec, Oliwier Callahan, Tiffany Ozkaynak, Mustafa Reeder, Blaine Welton, John Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study |
title | Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study |
title_full | Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study |
title_fullStr | Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study |
title_full_unstemmed | Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study |
title_short | Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study |
title_sort | using a data quality framework to clean data extracted from the electronic health record: a case study |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4933574/ https://www.ncbi.nlm.nih.gov/pubmed/27429992 http://dx.doi.org/10.13063/2327-9214.1201 |
work_keys_str_mv | AT dziadkowiecoliwier usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy AT callahantiffany usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy AT ozkaynakmustafa usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy AT reederblaine usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy AT weltonjohn usingadataqualityframeworktocleandataextractedfromtheelectronichealthrecordacasestudy |