Cargando…

Strategies for Handling Missing Data in Electronic Health Record Derived Data

Electronic health records (EHRs) present a wealth of data that are vital for improving patient-centered outcomes, although the data can present significant statistical challenges. In particular, EHR data contains substantial missing information that if left unaddressed could reduce the validity of c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wells, Brian J., Chagin, Kevin M., Nowacki, Amy S., Kattan, Michael W.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	AcademyHealth 2013
Materias:	Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4371484/ https://www.ncbi.nlm.nih.gov/pubmed/25848578 http://dx.doi.org/10.13063/2327-9214.1035

_version_	1782363051316477952
author	Wells, Brian J. Chagin, Kevin M. Nowacki, Amy S. Kattan, Michael W.
author_facet	Wells, Brian J. Chagin, Kevin M. Nowacki, Amy S. Kattan, Michael W.
author_sort	Wells, Brian J.
collection	PubMed
description	Electronic health records (EHRs) present a wealth of data that are vital for improving patient-centered outcomes, although the data can present significant statistical challenges. In particular, EHR data contains substantial missing information that if left unaddressed could reduce the validity of conclusions drawn. Properly addressing the missing data issue in EHR data is complicated by the fact that it is sometimes difficult to differentiate between missing data and a negative value. For example, a patient without a documented history of heart failure may truly not have disease or the clinician may have simply not documented the condition. Approaches for reducing missing data in EHR systems come from multiple angles, including: increasing structured data documentation, reducing data input errors, and utilization of text parsing / natural language processing. This paper focuses on the analytical approaches for handling missing data, primarily multiple imputation. The broad range of variables available in typical EHR systems provide a wealth of information for mitigating potential biases caused by missing data. The probability of missing data may be linked to disease severity and healthcare utilization since unhealthier patients are more likely to have comorbidities and each interaction with the health care system provides an opportunity for documentation. Therefore, any imputation routine should include predictor variables that assess overall health status (e.g. Charlson Comorbidity Index) and healthcare utilization (e.g. number of encounters) even when these comorbidities and patient encounters are unrelated to the disease of interest. Linking the EHR data with other sources of information (e.g. National Death Index and census data) can also provide less biased variables for imputation. Additional methodological research with EHR data and improved epidemiological training of clinical investigators is warranted.
format	Online Article Text
id	pubmed-4371484
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	AcademyHealth
record_format	MEDLINE/PubMed
spelling	pubmed-43714842015-04-06 Strategies for Handling Missing Data in Electronic Health Record Derived Data Wells, Brian J. Chagin, Kevin M. Nowacki, Amy S. Kattan, Michael W. EGEMS (Wash DC) Methods Electronic health records (EHRs) present a wealth of data that are vital for improving patient-centered outcomes, although the data can present significant statistical challenges. In particular, EHR data contains substantial missing information that if left unaddressed could reduce the validity of conclusions drawn. Properly addressing the missing data issue in EHR data is complicated by the fact that it is sometimes difficult to differentiate between missing data and a negative value. For example, a patient without a documented history of heart failure may truly not have disease or the clinician may have simply not documented the condition. Approaches for reducing missing data in EHR systems come from multiple angles, including: increasing structured data documentation, reducing data input errors, and utilization of text parsing / natural language processing. This paper focuses on the analytical approaches for handling missing data, primarily multiple imputation. The broad range of variables available in typical EHR systems provide a wealth of information for mitigating potential biases caused by missing data. The probability of missing data may be linked to disease severity and healthcare utilization since unhealthier patients are more likely to have comorbidities and each interaction with the health care system provides an opportunity for documentation. Therefore, any imputation routine should include predictor variables that assess overall health status (e.g. Charlson Comorbidity Index) and healthcare utilization (e.g. number of encounters) even when these comorbidities and patient encounters are unrelated to the disease of interest. Linking the EHR data with other sources of information (e.g. National Death Index and census data) can also provide less biased variables for imputation. Additional methodological research with EHR data and improved epidemiological training of clinical investigators is warranted. AcademyHealth 2013-12-17 /pmc/articles/PMC4371484/ /pubmed/25848578 http://dx.doi.org/10.13063/2327-9214.1035 Text en All eGEMs publications are licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License http://creativecommons.org/licenses/by-nc-nd/3.0/
spellingShingle	Methods Wells, Brian J. Chagin, Kevin M. Nowacki, Amy S. Kattan, Michael W. Strategies for Handling Missing Data in Electronic Health Record Derived Data
title	Strategies for Handling Missing Data in Electronic Health Record Derived Data
title_full	Strategies for Handling Missing Data in Electronic Health Record Derived Data
title_fullStr	Strategies for Handling Missing Data in Electronic Health Record Derived Data
title_full_unstemmed	Strategies for Handling Missing Data in Electronic Health Record Derived Data
title_short	Strategies for Handling Missing Data in Electronic Health Record Derived Data
title_sort	strategies for handling missing data in electronic health record derived data
topic	Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4371484/ https://www.ncbi.nlm.nih.gov/pubmed/25848578 http://dx.doi.org/10.13063/2327-9214.1035
work_keys_str_mv	AT wellsbrianj strategiesforhandlingmissingdatainelectronichealthrecordderiveddata AT chaginkevinm strategiesforhandlingmissingdatainelectronichealthrecordderiveddata AT nowackiamys strategiesforhandlingmissingdatainelectronichealthrecordderiveddata AT kattanmichaelw strategiesforhandlingmissingdatainelectronichealthrecordderiveddata

Strategies for Handling Missing Data in Electronic Health Record Derived Data

Ejemplares similares