Cargando…

Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)

The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many da...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghosheh, Ghadeer O., Thwaites, C. Louise, Zhu, Tingting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295936/
https://www.ncbi.nlm.nih.gov/pubmed/37371844
http://dx.doi.org/10.3390/biomedicines11061749
_version_ 1785063539156713472
author Ghosheh, Ghadeer O.
Thwaites, C. Louise
Zhu, Tingting
author_facet Ghosheh, Ghadeer O.
Thwaites, C. Louise
Zhu, Tingting
author_sort Ghosheh, Ghadeer O.
collection PubMed
description The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many data-specific limitations, such as the small size and irregular sampling, hinder the progress in such applications. Recently, deep generative models have been proposed to generate realistic-looking synthetic data, including EHRs, by learning the underlying data distribution without compromising patient privacy. In this study, we first use a deep generative model to generate synthetic data based on a small dataset (364 patients) from a LMIC setting. Next, we use synthetic data to build models that predict the onset of hospital-acquired infections based on minimal information collected at patient ICU admission. The performance of the diagnostic model trained on the synthetic data outperformed models trained on the original and oversampled data using techniques such as SMOTE. We also experiment with varying the size of the synthetic data and observe the impact on the performance and interpretability of the models. Our results show the promise of using deep generative models in enabling healthcare data owners to develop and validate models that serve their needs and applications, despite limitations in dataset size.
format Online
Article
Text
id pubmed-10295936
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102959362023-06-28 Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) Ghosheh, Ghadeer O. Thwaites, C. Louise Zhu, Tingting Biomedicines Article The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many data-specific limitations, such as the small size and irregular sampling, hinder the progress in such applications. Recently, deep generative models have been proposed to generate realistic-looking synthetic data, including EHRs, by learning the underlying data distribution without compromising patient privacy. In this study, we first use a deep generative model to generate synthetic data based on a small dataset (364 patients) from a LMIC setting. Next, we use synthetic data to build models that predict the onset of hospital-acquired infections based on minimal information collected at patient ICU admission. The performance of the diagnostic model trained on the synthetic data outperformed models trained on the original and oversampled data using techniques such as SMOTE. We also experiment with varying the size of the synthetic data and observe the impact on the performance and interpretability of the models. Our results show the promise of using deep generative models in enabling healthcare data owners to develop and validate models that serve their needs and applications, despite limitations in dataset size. MDPI 2023-06-18 /pmc/articles/PMC10295936/ /pubmed/37371844 http://dx.doi.org/10.3390/biomedicines11061749 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ghosheh, Ghadeer O.
Thwaites, C. Louise
Zhu, Tingting
Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)
title Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)
title_full Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)
title_fullStr Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)
title_full_unstemmed Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)
title_short Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)
title_sort synthesizing electronic health records for predictive models in low-middle-income countries (lmics)
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295936/
https://www.ncbi.nlm.nih.gov/pubmed/37371844
http://dx.doi.org/10.3390/biomedicines11061749
work_keys_str_mv AT ghoshehghadeero synthesizingelectronichealthrecordsforpredictivemodelsinlowmiddleincomecountrieslmics
AT thwaitesclouise synthesizingelectronichealthrecordsforpredictivemodelsinlowmiddleincomecountrieslmics
AT zhutingting synthesizingelectronichealthrecordsforpredictivemodelsinlowmiddleincomecountrieslmics