Cargando…

Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study

BACKGROUND: Stroke risk assessment is an important means of primary prevention, but the applicability of existing stroke risk assessment scales in the Chinese population has always been controversial. A prospective study is a common method of medical research, but it is time-consuming and labor-inte...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Yujie, Zheng, Jing, Du, Zhenzhen, Li, Ye, Cai, Yunpeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8663532/
https://www.ncbi.nlm.nih.gov/pubmed/34757322
http://dx.doi.org/10.2196/30277
_version_ 1784613659150909440
author Yang, Yujie
Zheng, Jing
Du, Zhenzhen
Li, Ye
Cai, Yunpeng
author_facet Yang, Yujie
Zheng, Jing
Du, Zhenzhen
Li, Ye
Cai, Yunpeng
author_sort Yang, Yujie
collection PubMed
description BACKGROUND: Stroke risk assessment is an important means of primary prevention, but the applicability of existing stroke risk assessment scales in the Chinese population has always been controversial. A prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote disease risk factor discovery and prognosis, attracting broad research interest. OBJECTIVE: We aimed to establish a high-precision stroke risk prediction model for hypertensive patients based on historical electronic medical record data and machine learning algorithms. METHODS: Based on the Shenzhen Health Information Big Data Platform, a total of 57,671 patients were screened from 250,788 registered patients with hypertension, of whom 9421 had stroke onset during the 3-year follow-up. In addition to baseline characteristics and historical symptoms, we constructed some trend characteristics from multitemporal medical records. Stratified sampling according to gender ratio and age stratification was implemented to balance the positive and negative cases, and the final 19,953 samples were randomly divided into a training set and test set according to a ratio of 7:3. We used 4 machine learning algorithms for modeling, and the risk prediction performance was compared with the traditional risk scales. We also analyzed the nonlinear effect of continuous characteristics on stroke onset. RESULTS: The tree-based integration algorithm extreme gradient boosting achieved the optimal performance with an area under the receiver operating characteristic curve of 0.9220, surpassing the other 3 traditional machine learning algorithms. Compared with 2 traditional risk scales, the Framingham stroke risk profiles and the Chinese Multiprovincial Cohort Study, our proposed model achieved better performance on the independent validation set, and the area under the receiver operating characteristic value increased by 0.17. Further nonlinear effect analysis revealed the importance of multitemporal trend characteristics in stroke risk prediction, which will benefit the standardized management of hypertensive patients. CONCLUSIONS: A high-precision 3-year stroke risk prediction model for hypertensive patients was established, and the model's performance was verified by comparing it with the traditional risk scales. Multitemporal trend characteristics played an important role in stroke onset, and thus the model could be deployed to electronic health record systems to assist in more pervasive, preemptive stroke risk screening, enabling higher efficiency of early disease prevention and intervention.
format Online
Article
Text
id pubmed-8663532
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-86635322022-01-05 Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study Yang, Yujie Zheng, Jing Du, Zhenzhen Li, Ye Cai, Yunpeng JMIR Med Inform Original Paper BACKGROUND: Stroke risk assessment is an important means of primary prevention, but the applicability of existing stroke risk assessment scales in the Chinese population has always been controversial. A prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote disease risk factor discovery and prognosis, attracting broad research interest. OBJECTIVE: We aimed to establish a high-precision stroke risk prediction model for hypertensive patients based on historical electronic medical record data and machine learning algorithms. METHODS: Based on the Shenzhen Health Information Big Data Platform, a total of 57,671 patients were screened from 250,788 registered patients with hypertension, of whom 9421 had stroke onset during the 3-year follow-up. In addition to baseline characteristics and historical symptoms, we constructed some trend characteristics from multitemporal medical records. Stratified sampling according to gender ratio and age stratification was implemented to balance the positive and negative cases, and the final 19,953 samples were randomly divided into a training set and test set according to a ratio of 7:3. We used 4 machine learning algorithms for modeling, and the risk prediction performance was compared with the traditional risk scales. We also analyzed the nonlinear effect of continuous characteristics on stroke onset. RESULTS: The tree-based integration algorithm extreme gradient boosting achieved the optimal performance with an area under the receiver operating characteristic curve of 0.9220, surpassing the other 3 traditional machine learning algorithms. Compared with 2 traditional risk scales, the Framingham stroke risk profiles and the Chinese Multiprovincial Cohort Study, our proposed model achieved better performance on the independent validation set, and the area under the receiver operating characteristic value increased by 0.17. Further nonlinear effect analysis revealed the importance of multitemporal trend characteristics in stroke risk prediction, which will benefit the standardized management of hypertensive patients. CONCLUSIONS: A high-precision 3-year stroke risk prediction model for hypertensive patients was established, and the model's performance was verified by comparing it with the traditional risk scales. Multitemporal trend characteristics played an important role in stroke onset, and thus the model could be deployed to electronic health record systems to assist in more pervasive, preemptive stroke risk screening, enabling higher efficiency of early disease prevention and intervention. JMIR Publications 2021-11-10 /pmc/articles/PMC8663532/ /pubmed/34757322 http://dx.doi.org/10.2196/30277 Text en ©Yujie Yang, Jing Zheng, Zhenzhen Du, Ye Li, Yunpeng Cai. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 10.11.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Yang, Yujie
Zheng, Jing
Du, Zhenzhen
Li, Ye
Cai, Yunpeng
Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study
title Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study
title_full Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study
title_fullStr Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study
title_full_unstemmed Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study
title_short Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study
title_sort accurate prediction of stroke for hypertensive patients based on medical big data and machine learning algorithms: retrospective study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8663532/
https://www.ncbi.nlm.nih.gov/pubmed/34757322
http://dx.doi.org/10.2196/30277
work_keys_str_mv AT yangyujie accuratepredictionofstrokeforhypertensivepatientsbasedonmedicalbigdataandmachinelearningalgorithmsretrospectivestudy
AT zhengjing accuratepredictionofstrokeforhypertensivepatientsbasedonmedicalbigdataandmachinelearningalgorithmsretrospectivestudy
AT duzhenzhen accuratepredictionofstrokeforhypertensivepatientsbasedonmedicalbigdataandmachinelearningalgorithmsretrospectivestudy
AT liye accuratepredictionofstrokeforhypertensivepatientsbasedonmedicalbigdataandmachinelearningalgorithmsretrospectivestudy
AT caiyunpeng accuratepredictionofstrokeforhypertensivepatientsbasedonmedicalbigdataandmachinelearningalgorithmsretrospectivestudy