Cargando…

Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study

With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Liying, Wang, Yikang, Niu, Miaomiao, Wang, Chongjian, Wang, Zhenfei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064542/ https://www.ncbi.nlm.nih.gov/pubmed/32157171 http://dx.doi.org/10.1038/s41598-020-61123-x

_version_	1783504891936768000
author	Zhang, Liying Wang, Yikang Niu, Miaomiao Wang, Chongjian Wang, Zhenfei
author_facet	Zhang, Liying Wang, Yikang Niu, Miaomiao Wang, Chongjian Wang, Zhenfei
author_sort	Zhang, Liying
collection	PubMed
description	With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality.
format	Online Article Text
id	pubmed-7064542
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-70645422020-03-18 Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study Zhang, Liying Wang, Yikang Niu, Miaomiao Wang, Chongjian Wang, Zhenfei Sci Rep Article With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality. Nature Publishing Group UK 2020-03-10 /pmc/articles/PMC7064542/ /pubmed/32157171 http://dx.doi.org/10.1038/s41598-020-61123-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Zhang, Liying Wang, Yikang Niu, Miaomiao Wang, Chongjian Wang, Zhenfei Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study
title	Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study
title_full	Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study
title_fullStr	Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study
title_full_unstemmed	Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study
title_short	Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study
title_sort	machine learning for characterizing risk of type 2 diabetes mellitus in a rural chinese population: the henan rural cohort study
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064542/ https://www.ncbi.nlm.nih.gov/pubmed/32157171 http://dx.doi.org/10.1038/s41598-020-61123-x
work_keys_str_mv	AT zhangliying machinelearningforcharacterizingriskoftype2diabetesmellitusinaruralchinesepopulationthehenanruralcohortstudy AT wangyikang machinelearningforcharacterizingriskoftype2diabetesmellitusinaruralchinesepopulationthehenanruralcohortstudy AT niumiaomiao machinelearningforcharacterizingriskoftype2diabetesmellitusinaruralchinesepopulationthehenanruralcohortstudy AT wangchongjian machinelearningforcharacterizingriskoftype2diabetesmellitusinaruralchinesepopulationthehenanruralcohortstudy AT wangzhenfei machinelearningforcharacterizingriskoftype2diabetesmellitusinaruralchinesepopulationthehenanruralcohortstudy

Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study

Ejemplares similares