Cargando…

Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications

The recent availability of electronic health records (EHRs) have provided enormous opportunities to develop artificial intelligence (AI) algorithms. However, patient privacy has become a major concern that limits data sharing across hospital settings and subsequently hinders the advances in AI. Synt...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Jin, Cairns, Benjamin J., Li, Jingsong, Zhu, Tingting
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10224668/ https://www.ncbi.nlm.nih.gov/pubmed/37244963 http://dx.doi.org/10.1038/s41746-023-00834-7

_version_	1785050246930235392
author	Li, Jin Cairns, Benjamin J. Li, Jingsong Zhu, Tingting
author_facet	Li, Jin Cairns, Benjamin J. Li, Jingsong Zhu, Tingting
author_sort	Li, Jin
collection	PubMed
description	The recent availability of electronic health records (EHRs) have provided enormous opportunities to develop artificial intelligence (AI) algorithms. However, patient privacy has become a major concern that limits data sharing across hospital settings and subsequently hinders the advances in AI. Synthetic data, which benefits from the development and proliferation of generative models, has served as a promising substitute for real patient EHR data. However, the current generative models are limited as they only generate single type of clinical data for a synthetic patient, i.e., either continuous-valued or discrete-valued. To mimic the nature of clinical decision-making which encompasses various data types/sources, in this study, we propose a generative adversarial network (GAN) entitled EHR-M-GAN that simultaneously synthesizes mixed-type timeseries EHR data. EHR-M-GAN is capable of capturing the multidimensional, heterogeneous, and correlated temporal dynamics in patient trajectories. We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients, and performed privacy risk evaluation of the proposed model. EHR-M-GAN has demonstrated its superiority over state-of-the-art benchmarks for synthesizing clinical timeseries with high fidelity, while addressing the limitations regarding data types and dimensionality in the current generative models. Notably, prediction models for outcomes of intensive care performed significantly better when training data was augmented with the addition of EHR-M-GAN-generated timeseries. EHR-M-GAN may have use in developing AI algorithms in resource-limited settings, lowering the barrier for data acquisition while preserving patient privacy.
format	Online Article Text
id	pubmed-10224668
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-102246682023-05-29 Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications Li, Jin Cairns, Benjamin J. Li, Jingsong Zhu, Tingting NPJ Digit Med Article The recent availability of electronic health records (EHRs) have provided enormous opportunities to develop artificial intelligence (AI) algorithms. However, patient privacy has become a major concern that limits data sharing across hospital settings and subsequently hinders the advances in AI. Synthetic data, which benefits from the development and proliferation of generative models, has served as a promising substitute for real patient EHR data. However, the current generative models are limited as they only generate single type of clinical data for a synthetic patient, i.e., either continuous-valued or discrete-valued. To mimic the nature of clinical decision-making which encompasses various data types/sources, in this study, we propose a generative adversarial network (GAN) entitled EHR-M-GAN that simultaneously synthesizes mixed-type timeseries EHR data. EHR-M-GAN is capable of capturing the multidimensional, heterogeneous, and correlated temporal dynamics in patient trajectories. We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients, and performed privacy risk evaluation of the proposed model. EHR-M-GAN has demonstrated its superiority over state-of-the-art benchmarks for synthesizing clinical timeseries with high fidelity, while addressing the limitations regarding data types and dimensionality in the current generative models. Notably, prediction models for outcomes of intensive care performed significantly better when training data was augmented with the addition of EHR-M-GAN-generated timeseries. EHR-M-GAN may have use in developing AI algorithms in resource-limited settings, lowering the barrier for data acquisition while preserving patient privacy. Nature Publishing Group UK 2023-05-27 /pmc/articles/PMC10224668/ /pubmed/37244963 http://dx.doi.org/10.1038/s41746-023-00834-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Li, Jin Cairns, Benjamin J. Li, Jingsong Zhu, Tingting Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications
title	Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications
title_full	Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications
title_fullStr	Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications
title_full_unstemmed	Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications
title_short	Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications
title_sort	generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10224668/ https://www.ncbi.nlm.nih.gov/pubmed/37244963 http://dx.doi.org/10.1038/s41746-023-00834-7
work_keys_str_mv	AT lijin generatingsyntheticmixedtypelongitudinalelectronichealthrecordsforartificialintelligentapplications AT cairnsbenjaminj generatingsyntheticmixedtypelongitudinalelectronichealthrecordsforartificialintelligentapplications AT lijingsong generatingsyntheticmixedtypelongitudinalelectronichealthrecordsforartificialintelligentapplications AT zhutingting generatingsyntheticmixedtypelongitudinalelectronichealthrecordsforartificialintelligentapplications

Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications

Ejemplares similares