Cargando…

Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults

BACKGROUND: Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Ch...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Xiao, Cao, Tianyu, Chen, Liangziqian, Li, Junpei, Tan, Ziheng, Xu, Benjamin, Xu, Richard, Song, Yun, Zhou, Ziyi, Wang, Zhuo, Wei, Yaping, Zhang, Yan, Li, Jianping, Huo, Yong, Qin, Xianhui, Wu, Yanqing, Wang, Xiaobin, Wang, Hong, Cheng, Xiaoshu, Xu, Xiping, Liu, Lishun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9120532/
https://www.ncbi.nlm.nih.gov/pubmed/35600480
http://dx.doi.org/10.3389/fcvm.2022.901240
_version_ 1784710947104882688
author Huang, Xiao
Cao, Tianyu
Chen, Liangziqian
Li, Junpei
Tan, Ziheng
Xu, Benjamin
Xu, Richard
Song, Yun
Zhou, Ziyi
Wang, Zhuo
Wei, Yaping
Zhang, Yan
Li, Jianping
Huo, Yong
Qin, Xianhui
Wu, Yanqing
Wang, Xiaobin
Wang, Hong
Cheng, Xiaoshu
Xu, Xiping
Liu, Lishun
author_facet Huang, Xiao
Cao, Tianyu
Chen, Liangziqian
Li, Junpei
Tan, Ziheng
Xu, Benjamin
Xu, Richard
Song, Yun
Zhou, Ziyi
Wang, Zhuo
Wei, Yaping
Zhang, Yan
Li, Jianping
Huo, Yong
Qin, Xianhui
Wu, Yanqing
Wang, Xiaobin
Wang, Hong
Cheng, Xiaoshu
Xu, Xiping
Liu, Lishun
author_sort Huang, Xiao
collection PubMed
description BACKGROUND: Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis. METHODS: The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case–control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set. RESULTS: The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance. CONCLUSION: Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.
format Online
Article
Text
id pubmed-9120532
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-91205322022-05-21 Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults Huang, Xiao Cao, Tianyu Chen, Liangziqian Li, Junpei Tan, Ziheng Xu, Benjamin Xu, Richard Song, Yun Zhou, Ziyi Wang, Zhuo Wei, Yaping Zhang, Yan Li, Jianping Huo, Yong Qin, Xianhui Wu, Yanqing Wang, Xiaobin Wang, Hong Cheng, Xiaoshu Xu, Xiping Liu, Lishun Front Cardiovasc Med Cardiovascular Medicine BACKGROUND: Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis. METHODS: The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case–control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set. RESULTS: The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance. CONCLUSION: Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models. Frontiers Media S.A. 2022-05-06 /pmc/articles/PMC9120532/ /pubmed/35600480 http://dx.doi.org/10.3389/fcvm.2022.901240 Text en Copyright © 2022 Huang, Cao, Chen, Li, Tan, Xu, Xu, Song, Zhou, Wang, Wei, Zhang, Li, Huo, Qin, Wu, Wang, Wang, Cheng, Xu and Liu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Cardiovascular Medicine
Huang, Xiao
Cao, Tianyu
Chen, Liangziqian
Li, Junpei
Tan, Ziheng
Xu, Benjamin
Xu, Richard
Song, Yun
Zhou, Ziyi
Wang, Zhuo
Wei, Yaping
Zhang, Yan
Li, Jianping
Huo, Yong
Qin, Xianhui
Wu, Yanqing
Wang, Xiaobin
Wang, Hong
Cheng, Xiaoshu
Xu, Xiping
Liu, Lishun
Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults
title Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults
title_full Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults
title_fullStr Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults
title_full_unstemmed Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults
title_short Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults
title_sort novel insights on establishing machine learning-based stroke prediction models among hypertensive adults
topic Cardiovascular Medicine
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9120532/
https://www.ncbi.nlm.nih.gov/pubmed/35600480
http://dx.doi.org/10.3389/fcvm.2022.901240
work_keys_str_mv AT huangxiao novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT caotianyu novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT chenliangziqian novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT lijunpei novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT tanziheng novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT xubenjamin novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT xurichard novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT songyun novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT zhouziyi novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT wangzhuo novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT weiyaping novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT zhangyan novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT lijianping novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT huoyong novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT qinxianhui novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT wuyanqing novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT wangxiaobin novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT wanghong novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT chengxiaoshu novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT xuxiping novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults
AT liulishun novelinsightsonestablishingmachinelearningbasedstrokepredictionmodelsamonghypertensiveadults