Cargando…

A multi-step approach to managing missing data in time and patient variant electronic health records

OBJECTIVE: Electronic health records (EHR) hold promise for conducting large-scale analyses linking individual characteristics to health outcomes. However, these data often contain a large number of missing values at both the patient and visit level due to variation in data collection across facilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Cesare, Nina, Were, Lawrence P. O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851714/
https://www.ncbi.nlm.nih.gov/pubmed/35177096
http://dx.doi.org/10.1186/s13104-022-05911-w
_version_ 1784652878115241984
author Cesare, Nina
Were, Lawrence P. O.
author_facet Cesare, Nina
Were, Lawrence P. O.
author_sort Cesare, Nina
collection PubMed
description OBJECTIVE: Electronic health records (EHR) hold promise for conducting large-scale analyses linking individual characteristics to health outcomes. However, these data often contain a large number of missing values at both the patient and visit level due to variation in data collection across facilities, providers, and clinical need. This study proposes a stepwise framework for imputing missing values within a visit-level EHR dataset that combines informative missingness and conditional imputation in a scalable manner that may be parallelized for efficiency. RESULTS: For this study we use a subset of data from AMPATH representing information from 530,812 clinic visits from 16,316 Human Immunodeficiency Virus (HIV) positive women across Western Kenya who have given birth. We apply this process to a set of 84 clinical, social and economic variables and are able to impute values for 84.6% of variables with missing data with an average reduction in missing data of approximately 35.6%. We validate the use of this imputed dataset by predicting National Hospital Insurance Fund (NHIF) enrollment with 94.8% accuracy.
format Online
Article
Text
id pubmed-8851714
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88517142022-02-22 A multi-step approach to managing missing data in time and patient variant electronic health records Cesare, Nina Were, Lawrence P. O. BMC Res Notes Research Note OBJECTIVE: Electronic health records (EHR) hold promise for conducting large-scale analyses linking individual characteristics to health outcomes. However, these data often contain a large number of missing values at both the patient and visit level due to variation in data collection across facilities, providers, and clinical need. This study proposes a stepwise framework for imputing missing values within a visit-level EHR dataset that combines informative missingness and conditional imputation in a scalable manner that may be parallelized for efficiency. RESULTS: For this study we use a subset of data from AMPATH representing information from 530,812 clinic visits from 16,316 Human Immunodeficiency Virus (HIV) positive women across Western Kenya who have given birth. We apply this process to a set of 84 clinical, social and economic variables and are able to impute values for 84.6% of variables with missing data with an average reduction in missing data of approximately 35.6%. We validate the use of this imputed dataset by predicting National Hospital Insurance Fund (NHIF) enrollment with 94.8% accuracy. BioMed Central 2022-02-17 /pmc/articles/PMC8851714/ /pubmed/35177096 http://dx.doi.org/10.1186/s13104-022-05911-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Note
Cesare, Nina
Were, Lawrence P. O.
A multi-step approach to managing missing data in time and patient variant electronic health records
title A multi-step approach to managing missing data in time and patient variant electronic health records
title_full A multi-step approach to managing missing data in time and patient variant electronic health records
title_fullStr A multi-step approach to managing missing data in time and patient variant electronic health records
title_full_unstemmed A multi-step approach to managing missing data in time and patient variant electronic health records
title_short A multi-step approach to managing missing data in time and patient variant electronic health records
title_sort multi-step approach to managing missing data in time and patient variant electronic health records
topic Research Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851714/
https://www.ncbi.nlm.nih.gov/pubmed/35177096
http://dx.doi.org/10.1186/s13104-022-05911-w
work_keys_str_mv AT cesarenina amultistepapproachtomanagingmissingdataintimeandpatientvariantelectronichealthrecords
AT werelawrencepo amultistepapproachtomanagingmissingdataintimeandpatientvariantelectronichealthrecords
AT cesarenina multistepapproachtomanagingmissingdataintimeandpatientvariantelectronichealthrecords
AT werelawrencepo multistepapproachtomanagingmissingdataintimeandpatientvariantelectronichealthrecords