Cargando…
Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type
The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle dat...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9690067/ https://www.ncbi.nlm.nih.gov/pubmed/36429751 http://dx.doi.org/10.3390/ijerph192215027 |
_version_ | 1784836693421981696 |
---|---|
author | Qin, Yifan Wu, Jinlong Xiao, Wen Wang, Kun Huang, Anbing Liu, Bowen Yu, Jingxuan Li, Chuhao Yu, Fengyu Ren, Zhanbing |
author_facet | Qin, Yifan Wu, Jinlong Xiao, Wen Wang, Kun Huang, Anbing Liu, Bowen Yu, Jingxuan Li, Chuhao Yu, Fengyu Ren, Zhanbing |
author_sort | Qin, Yifan |
collection | PubMed |
description | The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999–2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients. |
format | Online Article Text |
id | pubmed-9690067 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96900672022-11-25 Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type Qin, Yifan Wu, Jinlong Xiao, Wen Wang, Kun Huang, Anbing Liu, Bowen Yu, Jingxuan Li, Chuhao Yu, Fengyu Ren, Zhanbing Int J Environ Res Public Health Article The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999–2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients. MDPI 2022-11-15 /pmc/articles/PMC9690067/ /pubmed/36429751 http://dx.doi.org/10.3390/ijerph192215027 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Qin, Yifan Wu, Jinlong Xiao, Wen Wang, Kun Huang, Anbing Liu, Bowen Yu, Jingxuan Li, Chuhao Yu, Fengyu Ren, Zhanbing Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type |
title | Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type |
title_full | Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type |
title_fullStr | Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type |
title_full_unstemmed | Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type |
title_short | Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type |
title_sort | machine learning models for data-driven prediction of diabetes by lifestyle type |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9690067/ https://www.ncbi.nlm.nih.gov/pubmed/36429751 http://dx.doi.org/10.3390/ijerph192215027 |
work_keys_str_mv | AT qinyifan machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT wujinlong machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT xiaowen machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT wangkun machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT huanganbing machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT liubowen machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT yujingxuan machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT lichuhao machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT yufengyu machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype AT renzhanbing machinelearningmodelsfordatadrivenpredictionofdiabetesbylifestyletype |