Cargando…

Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults

OBJECTIVE: To compare Cox models, machine learning (ML), and ensemble models combining both approaches, for prediction of stroke risk in a prospective study of Chinese adults. MATERIALS AND METHODS: We evaluated models for stroke risk at varying intervals of follow-up (<9 years, 0–3 years, 3–6 ye...

Descripción completa

Detalles Bibliográficos
Autores principales: Chun, Matthew, Clarke, Robert, Cairns, Benjamin J, Clifton, David, Bennett, Derrick, Chen, Yiping, Guo, Yu, Pei, Pei, Lv, Jun, Yu, Canqing, Yang, Ling, Li, Liming, Chen, Zhengming, Zhu, Tingting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8324240/
https://www.ncbi.nlm.nih.gov/pubmed/33969418
http://dx.doi.org/10.1093/jamia/ocab068
_version_ 1783731366188285952
author Chun, Matthew
Clarke, Robert
Cairns, Benjamin J
Clifton, David
Bennett, Derrick
Chen, Yiping
Guo, Yu
Pei, Pei
Lv, Jun
Yu, Canqing
Yang, Ling
Li, Liming
Chen, Zhengming
Zhu, Tingting
author_facet Chun, Matthew
Clarke, Robert
Cairns, Benjamin J
Clifton, David
Bennett, Derrick
Chen, Yiping
Guo, Yu
Pei, Pei
Lv, Jun
Yu, Canqing
Yang, Ling
Li, Liming
Chen, Zhengming
Zhu, Tingting
author_sort Chun, Matthew
collection PubMed
description OBJECTIVE: To compare Cox models, machine learning (ML), and ensemble models combining both approaches, for prediction of stroke risk in a prospective study of Chinese adults. MATERIALS AND METHODS: We evaluated models for stroke risk at varying intervals of follow-up (<9 years, 0–3 years, 3–6 years, 6–9 years) in 503 842 adults without prior history of stroke recruited from 10 areas in China in 2004–2008. Inputs included sociodemographic factors, diet, medical history, physical activity, and physical measurements. We compared discrimination and calibration of Cox regression, logistic regression, support vector machines, random survival forests, gradient boosted trees (GBT), and multilayer perceptrons, benchmarking performance against the 2017 Framingham Stroke Risk Profile. We then developed an ensemble approach to identify individuals at high risk of stroke (>10% predicted 9-yr stroke risk) by selectively applying either a GBT or Cox model based on individual-level characteristics. RESULTS: For 9-yr stroke risk prediction, GBT provided the best discrimination (AUROC: 0.833 in men, 0.836 in women) and calibration, with consistent results in each interval of follow-up. The ensemble approach yielded incrementally higher accuracy (men: 76%, women: 80%), specificity (men: 76%, women: 81%), and positive predictive value (men: 26%, women: 24%) compared to any of the single-model approaches. DISCUSSION AND CONCLUSION: Among several approaches, an ensemble model combining both GBT and Cox models achieved the best performance for identifying individuals at high risk of stroke in a contemporary study of Chinese adults. The results highlight the potential value of expanding the use of ML in clinical practice.
format Online
Article
Text
id pubmed-8324240
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83242402021-08-02 Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults Chun, Matthew Clarke, Robert Cairns, Benjamin J Clifton, David Bennett, Derrick Chen, Yiping Guo, Yu Pei, Pei Lv, Jun Yu, Canqing Yang, Ling Li, Liming Chen, Zhengming Zhu, Tingting J Am Med Inform Assoc Research and Applications OBJECTIVE: To compare Cox models, machine learning (ML), and ensemble models combining both approaches, for prediction of stroke risk in a prospective study of Chinese adults. MATERIALS AND METHODS: We evaluated models for stroke risk at varying intervals of follow-up (<9 years, 0–3 years, 3–6 years, 6–9 years) in 503 842 adults without prior history of stroke recruited from 10 areas in China in 2004–2008. Inputs included sociodemographic factors, diet, medical history, physical activity, and physical measurements. We compared discrimination and calibration of Cox regression, logistic regression, support vector machines, random survival forests, gradient boosted trees (GBT), and multilayer perceptrons, benchmarking performance against the 2017 Framingham Stroke Risk Profile. We then developed an ensemble approach to identify individuals at high risk of stroke (>10% predicted 9-yr stroke risk) by selectively applying either a GBT or Cox model based on individual-level characteristics. RESULTS: For 9-yr stroke risk prediction, GBT provided the best discrimination (AUROC: 0.833 in men, 0.836 in women) and calibration, with consistent results in each interval of follow-up. The ensemble approach yielded incrementally higher accuracy (men: 76%, women: 80%), specificity (men: 76%, women: 81%), and positive predictive value (men: 26%, women: 24%) compared to any of the single-model approaches. DISCUSSION AND CONCLUSION: Among several approaches, an ensemble model combining both GBT and Cox models achieved the best performance for identifying individuals at high risk of stroke in a contemporary study of Chinese adults. The results highlight the potential value of expanding the use of ML in clinical practice. Oxford University Press 2021-05-09 /pmc/articles/PMC8324240/ /pubmed/33969418 http://dx.doi.org/10.1093/jamia/ocab068 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Chun, Matthew
Clarke, Robert
Cairns, Benjamin J
Clifton, David
Bennett, Derrick
Chen, Yiping
Guo, Yu
Pei, Pei
Lv, Jun
Yu, Canqing
Yang, Ling
Li, Liming
Chen, Zhengming
Zhu, Tingting
Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults
title Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults
title_full Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults
title_fullStr Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults
title_full_unstemmed Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults
title_short Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults
title_sort stroke risk prediction using machine learning: a prospective cohort study of 0.5 million chinese adults
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8324240/
https://www.ncbi.nlm.nih.gov/pubmed/33969418
http://dx.doi.org/10.1093/jamia/ocab068
work_keys_str_mv AT chunmatthew strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT clarkerobert strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT cairnsbenjaminj strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT cliftondavid strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT bennettderrick strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT chenyiping strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT guoyu strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT peipei strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT lvjun strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT yucanqing strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT yangling strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT liliming strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT chenzhengming strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT zhutingting strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults
AT strokeriskpredictionusingmachinelearningaprospectivecohortstudyof05millionchineseadults