Cargando…

Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study

AIMS/INTRODUCTION: To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. MATERIALS AND METHODS: This is a 3‐year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mao, Yaqian, Zhu, Zheng, Pan, Shuyao, Lin, Wei, Liang, Jixing, Huang, Huibin, Li, Liantao, Wen, Junping, Chen, Gang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2022
Materias:	Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889616/ https://www.ncbi.nlm.nih.gov/pubmed/36345236 http://dx.doi.org/10.1111/jdi.13937

_version_	1784880768488570880
author	Mao, Yaqian Zhu, Zheng Pan, Shuyao Lin, Wei Liang, Jixing Huang, Huibin Li, Liantao Wen, Junping Chen, Gang
author_facet	Mao, Yaqian Zhu, Zheng Pan, Shuyao Lin, Wei Liang, Jixing Huang, Huibin Li, Liantao Wen, Junping Chen, Gang
author_sort	Mao, Yaqian
collection	PubMed
description	AIMS/INTRODUCTION: To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. MATERIALS AND METHODS: This is a 3‐year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and predictive model building were carried out using logistic regression (LR) analysis and 10‐fold cross‐validation, respectively. In total, six different ML algorithms, including random forests, light gradient boosting machine, extreme gradient boosting, adaptive boosting (AdaBoost), multi‐layer perceptrons and gaussian naive bayes were used for model construction. Model performance was mainly evaluated by the area under the receiver operating characteristic curve. The best performing ML model was selected for comparison with the traditional LR model and visualized using Shapley additive explanations. RESULTS: A total of eight risk factors most associated with the development of diabetes were identified by univariate and multivariate LR analysis, and they were visualized in the form of a nomogram. Among the six different ML models, the random forests model had the best predictive performance. After 10‐fold cross‐validation, its optimal model has an area under the receiver operating characteristic value of 0.855 (95% confidence interval [CI] 0.823–0.886) in the training set and 0.835 (95% CI 0.779–0.892) in the test set. In the traditional LR model, its area under the receiver operating characteristic value is 0.840 (95% CI 0.814–0.866) in the training set and 0.834 (95% CI 0.785–0.884) in the test set. CONCLUSIONS: In the real‐world epidemiological research, the combination of traditional variable screening and ML algorithm to construct a diabetes risk prediction model has satisfactory clinical application value.
format	Online Article Text
id	pubmed-9889616
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-98896162023-02-02 Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study Mao, Yaqian Zhu, Zheng Pan, Shuyao Lin, Wei Liang, Jixing Huang, Huibin Li, Liantao Wen, Junping Chen, Gang J Diabetes Investig Articles AIMS/INTRODUCTION: To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. MATERIALS AND METHODS: This is a 3‐year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and predictive model building were carried out using logistic regression (LR) analysis and 10‐fold cross‐validation, respectively. In total, six different ML algorithms, including random forests, light gradient boosting machine, extreme gradient boosting, adaptive boosting (AdaBoost), multi‐layer perceptrons and gaussian naive bayes were used for model construction. Model performance was mainly evaluated by the area under the receiver operating characteristic curve. The best performing ML model was selected for comparison with the traditional LR model and visualized using Shapley additive explanations. RESULTS: A total of eight risk factors most associated with the development of diabetes were identified by univariate and multivariate LR analysis, and they were visualized in the form of a nomogram. Among the six different ML models, the random forests model had the best predictive performance. After 10‐fold cross‐validation, its optimal model has an area under the receiver operating characteristic value of 0.855 (95% confidence interval [CI] 0.823–0.886) in the training set and 0.835 (95% CI 0.779–0.892) in the test set. In the traditional LR model, its area under the receiver operating characteristic value is 0.840 (95% CI 0.814–0.866) in the training set and 0.834 (95% CI 0.785–0.884) in the test set. CONCLUSIONS: In the real‐world epidemiological research, the combination of traditional variable screening and ML algorithm to construct a diabetes risk prediction model has satisfactory clinical application value. John Wiley and Sons Inc. 2022-11-07 /pmc/articles/PMC9889616/ /pubmed/36345236 http://dx.doi.org/10.1111/jdi.13937 Text en © 2022 The Authors. Journal of Diabetes Investigation published by Asian Association for the Study of Diabetes (AASD) and John Wiley & Sons Australia, Ltd. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	Articles Mao, Yaqian Zhu, Zheng Pan, Shuyao Lin, Wei Liang, Jixing Huang, Huibin Li, Liantao Wen, Junping Chen, Gang Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study
title	Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study
title_full	Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study
title_fullStr	Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study
title_full_unstemmed	Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study
title_short	Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study
title_sort	value of machine learning algorithms for predicting diabetes risk: a subset analysis from a real‐world retrospective cohort study
topic	Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889616/ https://www.ncbi.nlm.nih.gov/pubmed/36345236 http://dx.doi.org/10.1111/jdi.13937
work_keys_str_mv	AT maoyaqian valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT zhuzheng valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT panshuyao valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT linwei valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT liangjixing valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT huanghuibin valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT liliantao valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT wenjunping valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy AT chengang valueofmachinelearningalgorithmsforpredictingdiabetesriskasubsetanalysisfromarealworldretrospectivecohortstudy

Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real‐world retrospective cohort study

Ejemplares similares