Cargando…

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

BACKGROUND: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Beaulieu-Jones, Brett K, Lavage, Daniel R, Snyder, John W, Moore, Jason H, Pendergrass, Sarah A, Bauer, Christopher R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2018
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5845101/ https://www.ncbi.nlm.nih.gov/pubmed/29475824 http://dx.doi.org/10.2196/medinform.8960

_version_	1783305354896998400
author	Beaulieu-Jones, Brett K Lavage, Daniel R Snyder, John W Moore, Jason H Pendergrass, Sarah A Bauer, Christopher R
author_facet	Beaulieu-Jones, Brett K Lavage, Daniel R Snyder, John W Moore, Jason H Pendergrass, Sarah A Bauer, Christopher R
author_sort	Beaulieu-Jones, Brett K
collection	PubMed
description	BACKGROUND: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results. OBJECTIVE: The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. METHODS: We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling). RESULTS: Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation. CONCLUSIONS: The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available.
format	Online Article Text
id	pubmed-5845101
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-58451012018-03-19 Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis Beaulieu-Jones, Brett K Lavage, Daniel R Snyder, John W Moore, Jason H Pendergrass, Sarah A Bauer, Christopher R JMIR Med Inform Original Paper BACKGROUND: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results. OBJECTIVE: The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. METHODS: We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling). RESULTS: Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation. CONCLUSIONS: The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available. JMIR Publications 2018-02-23 /pmc/articles/PMC5845101/ /pubmed/29475824 http://dx.doi.org/10.2196/medinform.8960 Text en ©Brett K Beaulieu-Jones, Daniel R Lavage, John W Snyder, Jason H Moore, Sarah A Pendergrass, Christopher R Bauer. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 23.02.2018. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Beaulieu-Jones, Brett K Lavage, Daniel R Snyder, John W Moore, Jason H Pendergrass, Sarah A Bauer, Christopher R Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
title	Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
title_full	Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
title_fullStr	Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
title_full_unstemmed	Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
title_short	Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
title_sort	characterizing and managing missing structured data in electronic health records: data analysis
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5845101/ https://www.ncbi.nlm.nih.gov/pubmed/29475824 http://dx.doi.org/10.2196/medinform.8960
work_keys_str_mv	AT beaulieujonesbrettk characterizingandmanagingmissingstructureddatainelectronichealthrecordsdataanalysis AT lavagedanielr characterizingandmanagingmissingstructureddatainelectronichealthrecordsdataanalysis AT snyderjohnw characterizingandmanagingmissingstructureddatainelectronichealthrecordsdataanalysis AT moorejasonh characterizingandmanagingmissingstructureddatainelectronichealthrecordsdataanalysis AT pendergrasssaraha characterizingandmanagingmissingstructureddatainelectronichealthrecordsdataanalysis AT bauerchristopherr characterizingandmanagingmissingstructureddatainelectronichealthrecordsdataanalysis

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

Ejemplares similares