Cargando…

Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study

OBJECTIVES: To compare the prediction effects of six models based on machine learning theories, which can provide a methodological reference for predicting the risk of type 2 diabetes mellitus (T2DM). SETTING AND PARTICIPANTS: This study was based on the monitoring data of chronic disease risk facto...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Shu, Chen, Rong, Wang, Shuang, Kong, Danli, Cao, Rudai, Lin, Chunwen, Luo, Ling, Huang, Jialu, Zhang, Qiaoli, Yu, Haibing, Ding, Yuan Lin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10465890/
https://www.ncbi.nlm.nih.gov/pubmed/37643856
http://dx.doi.org/10.1136/bmjopen-2022-069018
_version_ 1785098765171949568
author Wang, Shu
Chen, Rong
Wang, Shuang
Kong, Danli
Cao, Rudai
Lin, Chunwen
Luo, Ling
Huang, Jialu
Zhang, Qiaoli
Yu, Haibing
Ding, Yuan Lin
author_facet Wang, Shu
Chen, Rong
Wang, Shuang
Kong, Danli
Cao, Rudai
Lin, Chunwen
Luo, Ling
Huang, Jialu
Zhang, Qiaoli
Yu, Haibing
Ding, Yuan Lin
author_sort Wang, Shu
collection PubMed
description OBJECTIVES: To compare the prediction effects of six models based on machine learning theories, which can provide a methodological reference for predicting the risk of type 2 diabetes mellitus (T2DM). SETTING AND PARTICIPANTS: This study was based on the monitoring data of chronic disease risk factors in Dongguan residents from 2016 to 2018. The multistage cluster random sampling method was adopted at each monitoring site, and 4157 people were finally selected. In the initial population, we excluded individuals with more than 20% missing data and eventually included 4106 subjects. DESIGN: K nearest neighbour algorithm and synthetic minority oversampling technique were used to process the data. Single factor analysis was used for preliminary selection of variables. The 10-fold cross-validation was used to optimise the parameters of some models. The accuracy, precision, recall and area under receiver operating characteristic curve (AUC) were used to evaluate the prediction effect of models, and Delong test was used to analyse the differences of AUC values of each model. RESULTS: After balancing data, the sample size increased to 8013, of which 4023 are patients with T2DM and 3990 in control group. The comparison results of the six models showed that back propagation neural network model has the best prediction effect with 93.7% accuracy, 94.6% accuracy, 92.8% recall and the AUC value of 0.977, followed by logistic model, support vector machine model, CART decision tree model and C4.5 decision tree model. Deep neural network has the worst prediction performance, with 84.5% accuracy, 86.1% precision, 82.9% recall and the AUC value of 0.845. CONCLUSIONS: In this study, six types of risk prediction models for T2DM were constructed, and the predictive effects of these models were compared based on various indicators. The results showed that back propagation neural network based on the selected data set had the best prediction effect.
format Online
Article
Text
id pubmed-10465890
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-104658902023-08-31 Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study Wang, Shu Chen, Rong Wang, Shuang Kong, Danli Cao, Rudai Lin, Chunwen Luo, Ling Huang, Jialu Zhang, Qiaoli Yu, Haibing Ding, Yuan Lin BMJ Open Public Health OBJECTIVES: To compare the prediction effects of six models based on machine learning theories, which can provide a methodological reference for predicting the risk of type 2 diabetes mellitus (T2DM). SETTING AND PARTICIPANTS: This study was based on the monitoring data of chronic disease risk factors in Dongguan residents from 2016 to 2018. The multistage cluster random sampling method was adopted at each monitoring site, and 4157 people were finally selected. In the initial population, we excluded individuals with more than 20% missing data and eventually included 4106 subjects. DESIGN: K nearest neighbour algorithm and synthetic minority oversampling technique were used to process the data. Single factor analysis was used for preliminary selection of variables. The 10-fold cross-validation was used to optimise the parameters of some models. The accuracy, precision, recall and area under receiver operating characteristic curve (AUC) were used to evaluate the prediction effect of models, and Delong test was used to analyse the differences of AUC values of each model. RESULTS: After balancing data, the sample size increased to 8013, of which 4023 are patients with T2DM and 3990 in control group. The comparison results of the six models showed that back propagation neural network model has the best prediction effect with 93.7% accuracy, 94.6% accuracy, 92.8% recall and the AUC value of 0.977, followed by logistic model, support vector machine model, CART decision tree model and C4.5 decision tree model. Deep neural network has the worst prediction performance, with 84.5% accuracy, 86.1% precision, 82.9% recall and the AUC value of 0.845. CONCLUSIONS: In this study, six types of risk prediction models for T2DM were constructed, and the predictive effects of these models were compared based on various indicators. The results showed that back propagation neural network based on the selected data set had the best prediction effect. BMJ Publishing Group 2023-08-29 /pmc/articles/PMC10465890/ /pubmed/37643856 http://dx.doi.org/10.1136/bmjopen-2022-069018 Text en © Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Public Health
Wang, Shu
Chen, Rong
Wang, Shuang
Kong, Danli
Cao, Rudai
Lin, Chunwen
Luo, Ling
Huang, Jialu
Zhang, Qiaoli
Yu, Haibing
Ding, Yuan Lin
Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study
title Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study
title_full Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study
title_fullStr Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study
title_full_unstemmed Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study
title_short Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study
title_sort comparative study on risk prediction model of type 2 diabetes based on machine learning theory: a cross-sectional study
topic Public Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10465890/
https://www.ncbi.nlm.nih.gov/pubmed/37643856
http://dx.doi.org/10.1136/bmjopen-2022-069018
work_keys_str_mv AT wangshu comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT chenrong comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT wangshuang comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT kongdanli comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT caorudai comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT linchunwen comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT luoling comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT huangjialu comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT zhangqiaoli comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT yuhaibing comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy
AT dingyuanlin comparativestudyonriskpredictionmodeloftype2diabetesbasedonmachinelearningtheoryacrosssectionalstudy