Cargando…

Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type

The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Qin, Yifan, Wu, Jinlong, Xiao, Wen, Wang, Kun, Huang, Anbing, Liu, Bowen, Yu, Jingxuan, Li, Chuhao, Yu, Fengyu, Ren, Zhanbing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9690067/
https://www.ncbi.nlm.nih.gov/pubmed/36429751
http://dx.doi.org/10.3390/ijerph192215027
_version_ 1784836693421981696
author Qin, Yifan
Wu, Jinlong
Xiao, Wen
Wang, Kun
Huang, Anbing
Liu, Bowen
Yu, Jingxuan
Li, Chuhao
Yu, Fengyu
Ren, Zhanbing
author_facet Qin, Yifan
Wu, Jinlong
Xiao, Wen
Wang, Kun
Huang, Anbing
Liu, Bowen
Yu, Jingxuan
Li, Chuhao
Yu, Fengyu
Ren, Zhanbing
author_sort Qin, Yifan
collection PubMed
description The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999–2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.
format Online
Article
Text
id pubmed-9690067
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96900672022-11-25 Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type Qin, Yifan Wu, Jinlong Xiao, Wen Wang, Kun Huang, Anbing Liu, Bowen Yu, Jingxuan Li, Chuhao Yu, Fengyu Ren, Zhanbing Int J Environ Res Public Health Article The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999–2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients. MDPI 2022-11-15 /pmc/articles/PMC9690067/ /pubmed/36429751 http://dx.doi.org/10.3390/ijerph192215027 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Qin, Yifan
Wu, Jinlong
Xiao, Wen
Wang, Kun
Huang, Anbing
Liu, Bowen
Yu, Jingxuan
Li, Chuhao
Yu, Fengyu
Ren, Zhanbing
Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
title Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
title_full Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
title_fullStr Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
title_full_unstemmed Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
title_short Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
title_sort machine learning models for data-driven prediction of diabetes by lifestyle type
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9690067/
https://www.ncbi.nlm.nih.gov/pubmed/36429751
http://dx.doi.org/10.3390/ijerph192215027
work_keys_str_mv AT qinyifan machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT wujinlong machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT xiaowen machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT wangkun machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT huanganbing machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT liubowen machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT yujingxuan machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT lichuhao machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT yufengyu machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype
AT renzhanbing machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype