Cargando…

A multi-step approach to managing missing data in time and patient variant electronic health records

OBJECTIVE: Electronic health records (EHR) hold promise for conducting large-scale analyses linking individual characteristics to health outcomes. However, these data often contain a large number of missing values at both the patient and visit level due to variation in data collection across facilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Cesare, Nina, Were, Lawrence P. O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851714/
https://www.ncbi.nlm.nih.gov/pubmed/35177096
http://dx.doi.org/10.1186/s13104-022-05911-w
Descripción
Sumario:OBJECTIVE: Electronic health records (EHR) hold promise for conducting large-scale analyses linking individual characteristics to health outcomes. However, these data often contain a large number of missing values at both the patient and visit level due to variation in data collection across facilities, providers, and clinical need. This study proposes a stepwise framework for imputing missing values within a visit-level EHR dataset that combines informative missingness and conditional imputation in a scalable manner that may be parallelized for efficiency. RESULTS: For this study we use a subset of data from AMPATH representing information from 530,812 clinic visits from 16,316 Human Immunodeficiency Virus (HIV) positive women across Western Kenya who have given birth. We apply this process to a set of 84 clinical, social and economic variables and are able to impute values for 84.6% of variables with missing data with an average reduction in missing data of approximately 35.6%. We validate the use of this imputed dataset by predicting National Hospital Insurance Fund (NHIF) enrollment with 94.8% accuracy.