Cargando…

Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project

Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting in...

Descripción completa

Detalles Bibliográficos
Autores principales: Alghamdi, Manal, Al-Mallah, Mouaz, Keteyian, Steven, Brawner, Clinton, Ehrman, Jonathan, Sakr, Sherif
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5524285/
https://www.ncbi.nlm.nih.gov/pubmed/28738059
http://dx.doi.org/10.1371/journal.pone.0179805
_version_ 1783252438026813440
author Alghamdi, Manal
Al-Mallah, Mouaz
Keteyian, Steven
Brawner, Clinton
Ehrman, Jonathan
Sakr, Sherif
author_facet Alghamdi, Manal
Al-Mallah, Mouaz
Keteyian, Steven
Brawner, Clinton
Ehrman, Jonathan
Sakr, Sherif
author_sort Alghamdi, Manal
collection PubMed
description Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
format Online
Article
Text
id pubmed-5524285
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55242852017-08-07 Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project Alghamdi, Manal Al-Mallah, Mouaz Keteyian, Steven Brawner, Clinton Ehrman, Jonathan Sakr, Sherif PLoS One Research Article Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data. Public Library of Science 2017-07-24 /pmc/articles/PMC5524285/ /pubmed/28738059 http://dx.doi.org/10.1371/journal.pone.0179805 Text en © 2017 Alghamdi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Alghamdi, Manal
Al-Mallah, Mouaz
Keteyian, Steven
Brawner, Clinton
Ehrman, Jonathan
Sakr, Sherif
Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project
title Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project
title_full Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project
title_fullStr Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project
title_full_unstemmed Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project
title_short Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project
title_sort predicting diabetes mellitus using smote and ensemble machine learning approach: the henry ford exercise testing (fit) project
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5524285/
https://www.ncbi.nlm.nih.gov/pubmed/28738059
http://dx.doi.org/10.1371/journal.pone.0179805
work_keys_str_mv AT alghamdimanal predictingdiabetesmellitususingsmoteandensemblemachinelearningapproachthehenryfordexercisetestingfitproject
AT almallahmouaz predictingdiabetesmellitususingsmoteandensemblemachinelearningapproachthehenryfordexercisetestingfitproject
AT keteyiansteven predictingdiabetesmellitususingsmoteandensemblemachinelearningapproachthehenryfordexercisetestingfitproject
AT brawnerclinton predictingdiabetesmellitususingsmoteandensemblemachinelearningapproachthehenryfordexercisetestingfitproject
AT ehrmanjonathan predictingdiabetesmellitususingsmoteandensemblemachinelearningapproachthehenryfordexercisetestingfitproject
AT sakrsherif predictingdiabetesmellitususingsmoteandensemblemachinelearningapproachthehenryfordexercisetestingfitproject