Cargando…
Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the f...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606301/ https://www.ncbi.nlm.nih.gov/pubmed/36289292 http://dx.doi.org/10.1038/s41598-022-23011-4 |
_version_ | 1784818265982238720 |
---|---|
author | Chen, Anjun Chen, Drake O. |
author_facet | Chen, Anjun Chen, Drake O. |
author_sort | Chen, Anjun |
collection | PubMed |
description | When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the feasibility of developing a simulated ML-enabled LHS using synthetic patient data. The ML-enabled LHS was initialized using a dataset of 30,000 synthetic Synthea patients and a risk prediction XGBoost base model for lung cancer. 4 additional datasets of 30,000 patients were generated and added to the previous updated dataset sequentially to simulate addition of new patients, resulting in datasets of 60,000, 90,000, 120,000 and 150,000 patients. New XGBoost models were built in each instance, and performance improved with data size increase, attaining 0.936 recall and 0.962 AUC (area under curve) in the 150,000 patients dataset. The effectiveness of the new ML-enabled LHS process was verified by implementing XGBoost models for stroke risk prediction on the same Synthea patient populations. By making the ML code and synthetic patient data publicly available for testing and training, this first synthetic LHS process paves the way for more researchers to start developing LHS with real patient data. |
format | Online Article Text |
id | pubmed-9606301 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-96063012022-10-28 Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data Chen, Anjun Chen, Drake O. Sci Rep Article When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the feasibility of developing a simulated ML-enabled LHS using synthetic patient data. The ML-enabled LHS was initialized using a dataset of 30,000 synthetic Synthea patients and a risk prediction XGBoost base model for lung cancer. 4 additional datasets of 30,000 patients were generated and added to the previous updated dataset sequentially to simulate addition of new patients, resulting in datasets of 60,000, 90,000, 120,000 and 150,000 patients. New XGBoost models were built in each instance, and performance improved with data size increase, attaining 0.936 recall and 0.962 AUC (area under curve) in the 150,000 patients dataset. The effectiveness of the new ML-enabled LHS process was verified by implementing XGBoost models for stroke risk prediction on the same Synthea patient populations. By making the ML code and synthetic patient data publicly available for testing and training, this first synthetic LHS process paves the way for more researchers to start developing LHS with real patient data. Nature Publishing Group UK 2022-10-26 /pmc/articles/PMC9606301/ /pubmed/36289292 http://dx.doi.org/10.1038/s41598-022-23011-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Chen, Anjun Chen, Drake O. Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data |
title | Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data |
title_full | Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data |
title_fullStr | Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data |
title_full_unstemmed | Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data |
title_short | Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data |
title_sort | simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606301/ https://www.ncbi.nlm.nih.gov/pubmed/36289292 http://dx.doi.org/10.1038/s41598-022-23011-4 |
work_keys_str_mv | AT chenanjun simulationofamachinelearningenabledlearninghealthsystemforriskpredictionusingsyntheticpatientdata AT chendrakeo simulationofamachinelearningenabledlearninghealthsystemforriskpredictionusingsyntheticpatientdata |