Cargando…
Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs)
The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many da...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295936/ https://www.ncbi.nlm.nih.gov/pubmed/37371844 http://dx.doi.org/10.3390/biomedicines11061749 |
_version_ | 1785063539156713472 |
---|---|
author | Ghosheh, Ghadeer O. Thwaites, C. Louise Zhu, Tingting |
author_facet | Ghosheh, Ghadeer O. Thwaites, C. Louise Zhu, Tingting |
author_sort | Ghosheh, Ghadeer O. |
collection | PubMed |
description | The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many data-specific limitations, such as the small size and irregular sampling, hinder the progress in such applications. Recently, deep generative models have been proposed to generate realistic-looking synthetic data, including EHRs, by learning the underlying data distribution without compromising patient privacy. In this study, we first use a deep generative model to generate synthetic data based on a small dataset (364 patients) from a LMIC setting. Next, we use synthetic data to build models that predict the onset of hospital-acquired infections based on minimal information collected at patient ICU admission. The performance of the diagnostic model trained on the synthetic data outperformed models trained on the original and oversampled data using techniques such as SMOTE. We also experiment with varying the size of the synthetic data and observe the impact on the performance and interpretability of the models. Our results show the promise of using deep generative models in enabling healthcare data owners to develop and validate models that serve their needs and applications, despite limitations in dataset size. |
format | Online Article Text |
id | pubmed-10295936 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-102959362023-06-28 Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) Ghosheh, Ghadeer O. Thwaites, C. Louise Zhu, Tingting Biomedicines Article The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many data-specific limitations, such as the small size and irregular sampling, hinder the progress in such applications. Recently, deep generative models have been proposed to generate realistic-looking synthetic data, including EHRs, by learning the underlying data distribution without compromising patient privacy. In this study, we first use a deep generative model to generate synthetic data based on a small dataset (364 patients) from a LMIC setting. Next, we use synthetic data to build models that predict the onset of hospital-acquired infections based on minimal information collected at patient ICU admission. The performance of the diagnostic model trained on the synthetic data outperformed models trained on the original and oversampled data using techniques such as SMOTE. We also experiment with varying the size of the synthetic data and observe the impact on the performance and interpretability of the models. Our results show the promise of using deep generative models in enabling healthcare data owners to develop and validate models that serve their needs and applications, despite limitations in dataset size. MDPI 2023-06-18 /pmc/articles/PMC10295936/ /pubmed/37371844 http://dx.doi.org/10.3390/biomedicines11061749 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ghosheh, Ghadeer O. Thwaites, C. Louise Zhu, Tingting Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) |
title | Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) |
title_full | Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) |
title_fullStr | Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) |
title_full_unstemmed | Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) |
title_short | Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs) |
title_sort | synthesizing electronic health records for predictive models in low-middle-income countries (lmics) |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295936/ https://www.ncbi.nlm.nih.gov/pubmed/37371844 http://dx.doi.org/10.3390/biomedicines11061749 |
work_keys_str_mv | AT ghoshehghadeero synthesizingelectronichealthrecordsforpredictivemodelsinlowmiddleincomecountrieslmics AT thwaitesclouise synthesizingelectronichealthrecordsforpredictivemodelsinlowmiddleincomecountrieslmics AT zhutingting synthesizingelectronichealthrecordsforpredictivemodelsinlowmiddleincomecountrieslmics |