Cargando…

Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework

BACKGROUND: An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS: A total of 584,168 adult subjects...

Descripción completa

Detalles Bibliográficos
Autores principales: Xue, Mingyue, Su, Yinxia, Li, Chen, Wang, Shuxia, Yao, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7532405/
https://www.ncbi.nlm.nih.gov/pubmed/33029536
http://dx.doi.org/10.1155/2020/6873891
_version_ 1783589916296347648
author Xue, Mingyue
Su, Yinxia
Li, Chen
Wang, Shuxia
Yao, Hua
author_facet Xue, Mingyue
Su, Yinxia
Li, Chen
Wang, Shuxia
Yao, Hua
author_sort Xue, Mingyue
collection PubMed
description BACKGROUND: An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS: A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. RESULTS: The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F‐1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). CONCLUSIONS: We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
format Online
Article
Text
id pubmed-7532405
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-75324052020-10-06 Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework Xue, Mingyue Su, Yinxia Li, Chen Wang, Shuxia Yao, Hua J Diabetes Res Research Article BACKGROUND: An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS: A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. RESULTS: The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F‐1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). CONCLUSIONS: We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence. Hindawi 2020-09-24 /pmc/articles/PMC7532405/ /pubmed/33029536 http://dx.doi.org/10.1155/2020/6873891 Text en Copyright © 2020 Mingyue Xue et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Xue, Mingyue
Su, Yinxia
Li, Chen
Wang, Shuxia
Yao, Hua
Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework
title Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework
title_full Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework
title_fullStr Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework
title_full_unstemmed Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework
title_short Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework
title_sort identification of potential type ii diabetes in a large-scale chinese population using a systematic machine learning framework
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7532405/
https://www.ncbi.nlm.nih.gov/pubmed/33029536
http://dx.doi.org/10.1155/2020/6873891
work_keys_str_mv AT xuemingyue identificationofpotentialtypeiidiabetesinalargescalechinesepopulationusingasystematicmachinelearningframework
AT suyinxia identificationofpotentialtypeiidiabetesinalargescalechinesepopulationusingasystematicmachinelearningframework
AT lichen identificationofpotentialtypeiidiabetesinalargescalechinesepopulationusingasystematicmachinelearningframework
AT wangshuxia identificationofpotentialtypeiidiabetesinalargescalechinesepopulationusingasystematicmachinelearningframework
AT yaohua identificationofpotentialtypeiidiabetesinalargescalechinesepopulationusingasystematicmachinelearningframework