Cargando…

Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data

When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the f...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Anjun, Chen, Drake O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606301/
https://www.ncbi.nlm.nih.gov/pubmed/36289292
http://dx.doi.org/10.1038/s41598-022-23011-4
_version_ 1784818265982238720
author Chen, Anjun
Chen, Drake O.
author_facet Chen, Anjun
Chen, Drake O.
author_sort Chen, Anjun
collection PubMed
description When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the feasibility of developing a simulated ML-enabled LHS using synthetic patient data. The ML-enabled LHS was initialized using a dataset of 30,000 synthetic Synthea patients and a risk prediction XGBoost base model for lung cancer. 4 additional datasets of 30,000 patients were generated and added to the previous updated dataset sequentially to simulate addition of new patients, resulting in datasets of 60,000, 90,000, 120,000 and 150,000 patients. New XGBoost models were built in each instance, and performance improved with data size increase, attaining 0.936 recall and 0.962 AUC (area under curve) in the 150,000 patients dataset. The effectiveness of the new ML-enabled LHS process was verified by implementing XGBoost models for stroke risk prediction on the same Synthea patient populations. By making the ML code and synthetic patient data publicly available for testing and training, this first synthetic LHS process paves the way for more researchers to start developing LHS with real patient data.
format Online
Article
Text
id pubmed-9606301
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96063012022-10-28 Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data Chen, Anjun Chen, Drake O. Sci Rep Article When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the feasibility of developing a simulated ML-enabled LHS using synthetic patient data. The ML-enabled LHS was initialized using a dataset of 30,000 synthetic Synthea patients and a risk prediction XGBoost base model for lung cancer. 4 additional datasets of 30,000 patients were generated and added to the previous updated dataset sequentially to simulate addition of new patients, resulting in datasets of 60,000, 90,000, 120,000 and 150,000 patients. New XGBoost models were built in each instance, and performance improved with data size increase, attaining 0.936 recall and 0.962 AUC (area under curve) in the 150,000 patients dataset. The effectiveness of the new ML-enabled LHS process was verified by implementing XGBoost models for stroke risk prediction on the same Synthea patient populations. By making the ML code and synthetic patient data publicly available for testing and training, this first synthetic LHS process paves the way for more researchers to start developing LHS with real patient data. Nature Publishing Group UK 2022-10-26 /pmc/articles/PMC9606301/ /pubmed/36289292 http://dx.doi.org/10.1038/s41598-022-23011-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Chen, Anjun
Chen, Drake O.
Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
title Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
title_full Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
title_fullStr Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
title_full_unstemmed Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
title_short Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
title_sort simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606301/
https://www.ncbi.nlm.nih.gov/pubmed/36289292
http://dx.doi.org/10.1038/s41598-022-23011-4
work_keys_str_mv AT chenanjun simulationofamachinelearningenabledlearninghealthsystemforriskpredictionusingsyntheticpatientdata
AT chendrakeo simulationofamachinelearningenabledlearninghealthsystemforriskpredictionusingsyntheticpatientdata