Cargando…

Predictive models for diabetes mellitus using machine learning techniques

BACKGROUND: Diabetes Mellitus is an increasingly prevalent chronic disease characterized by the body’s inability to metabolize glucose. The objective of this study was to build an effective predictive model with high sensitivity and selectivity to better identify Canadian patients at risk of having...

Descripción completa

Detalles Bibliográficos
Autores principales: Lai, Hang, Huang, Huaxiong, Keshavjee, Karim, Guergachi, Aziz, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6794897/
https://www.ncbi.nlm.nih.gov/pubmed/31615566
http://dx.doi.org/10.1186/s12902-019-0436-6
_version_ 1783459389746708480
author Lai, Hang
Huang, Huaxiong
Keshavjee, Karim
Guergachi, Aziz
Gao, Xin
author_facet Lai, Hang
Huang, Huaxiong
Keshavjee, Karim
Guergachi, Aziz
Gao, Xin
author_sort Lai, Hang
collection PubMed
description BACKGROUND: Diabetes Mellitus is an increasingly prevalent chronic disease characterized by the body’s inability to metabolize glucose. The objective of this study was to build an effective predictive model with high sensitivity and selectivity to better identify Canadian patients at risk of having Diabetes Mellitus based on patient demographic data and the laboratory results during their visits to medical facilities. METHODS: Using the most recent records of 13,309 Canadian patients aged between 18 and 90 years, along with their laboratory information (age, sex, fasting blood glucose, body mass index, high-density lipoprotein, triglycerides, blood pressure, and low-density lipoprotein), we built predictive models using Logistic Regression and Gradient Boosting Machine (GBM) techniques. The area under the receiver operating characteristic curve (AROC) was used to evaluate the discriminatory capability of these models. We used the adjusted threshold method and the class weight method to improve sensitivity – the proportion of Diabetes Mellitus patients correctly predicted by the model. We also compared these models to other learning machine techniques such as Decision Tree and Random Forest. RESULTS: The AROC for the proposed GBM model is 84.7% with a sensitivity of 71.6% and the AROC for the proposed Logistic Regression model is 84.0% with a sensitivity of 73.4%. The GBM and Logistic Regression models perform better than the Random Forest and Decision Tree models. CONCLUSIONS: The ability of our model to predict patients with Diabetes using some commonly used lab results is high with satisfactory sensitivity. These models can be built into an online computer program to help physicians in predicting patients with future occurrence of diabetes and providing necessary preventive interventions. The model is developed and validated on the Canadian population which is more specific and powerful to apply on Canadian patients than existing models developed from US or other populations. Fasting blood glucose, body mass index, high-density lipoprotein, and triglycerides were the most important predictors in these models.
format Online
Article
Text
id pubmed-6794897
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67948972019-10-21 Predictive models for diabetes mellitus using machine learning techniques Lai, Hang Huang, Huaxiong Keshavjee, Karim Guergachi, Aziz Gao, Xin BMC Endocr Disord Research Article BACKGROUND: Diabetes Mellitus is an increasingly prevalent chronic disease characterized by the body’s inability to metabolize glucose. The objective of this study was to build an effective predictive model with high sensitivity and selectivity to better identify Canadian patients at risk of having Diabetes Mellitus based on patient demographic data and the laboratory results during their visits to medical facilities. METHODS: Using the most recent records of 13,309 Canadian patients aged between 18 and 90 years, along with their laboratory information (age, sex, fasting blood glucose, body mass index, high-density lipoprotein, triglycerides, blood pressure, and low-density lipoprotein), we built predictive models using Logistic Regression and Gradient Boosting Machine (GBM) techniques. The area under the receiver operating characteristic curve (AROC) was used to evaluate the discriminatory capability of these models. We used the adjusted threshold method and the class weight method to improve sensitivity – the proportion of Diabetes Mellitus patients correctly predicted by the model. We also compared these models to other learning machine techniques such as Decision Tree and Random Forest. RESULTS: The AROC for the proposed GBM model is 84.7% with a sensitivity of 71.6% and the AROC for the proposed Logistic Regression model is 84.0% with a sensitivity of 73.4%. The GBM and Logistic Regression models perform better than the Random Forest and Decision Tree models. CONCLUSIONS: The ability of our model to predict patients with Diabetes using some commonly used lab results is high with satisfactory sensitivity. These models can be built into an online computer program to help physicians in predicting patients with future occurrence of diabetes and providing necessary preventive interventions. The model is developed and validated on the Canadian population which is more specific and powerful to apply on Canadian patients than existing models developed from US or other populations. Fasting blood glucose, body mass index, high-density lipoprotein, and triglycerides were the most important predictors in these models. BioMed Central 2019-10-15 /pmc/articles/PMC6794897/ /pubmed/31615566 http://dx.doi.org/10.1186/s12902-019-0436-6 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Lai, Hang
Huang, Huaxiong
Keshavjee, Karim
Guergachi, Aziz
Gao, Xin
Predictive models for diabetes mellitus using machine learning techniques
title Predictive models for diabetes mellitus using machine learning techniques
title_full Predictive models for diabetes mellitus using machine learning techniques
title_fullStr Predictive models for diabetes mellitus using machine learning techniques
title_full_unstemmed Predictive models for diabetes mellitus using machine learning techniques
title_short Predictive models for diabetes mellitus using machine learning techniques
title_sort predictive models for diabetes mellitus using machine learning techniques
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6794897/
https://www.ncbi.nlm.nih.gov/pubmed/31615566
http://dx.doi.org/10.1186/s12902-019-0436-6
work_keys_str_mv AT laihang predictivemodelsfordiabetesmellitususingmachinelearningtechniques
AT huanghuaxiong predictivemodelsfordiabetesmellitususingmachinelearningtechniques
AT keshavjeekarim predictivemodelsfordiabetesmellitususingmachinelearningtechniques
AT guergachiaziz predictivemodelsfordiabetesmellitususingmachinelearningtechniques
AT gaoxin predictivemodelsfordiabetesmellitususingmachinelearningtechniques