Cargando…

A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes

Background: We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. Study design: A cross-sectional study. Methods:...

Descripción completa

Detalles Bibliográficos
Autores principales: Esmaily, Habibollah, Tayefi, Maryam, Doosti, Hassan, Ghayour-Mobarhan, Majid, Nezami, Hossein, Amirabadizadeh, Alireza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hamadan University of Medical Sciences 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7204421/
_version_ 1783530061572341760
author Esmaily, Habibollah
Tayefi, Maryam
Doosti, Hassan
Ghayour-Mobarhan, Majid
Nezami, Hossein
Amirabadizadeh, Alireza
author_facet Esmaily, Habibollah
Tayefi, Maryam
Doosti, Hassan
Ghayour-Mobarhan, Majid
Nezami, Hossein
Amirabadizadeh, Alireza
author_sort Esmaily, Habibollah
collection PubMed
description Background: We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. Study design: A cross-sectional study. Methods: The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. Results: The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. Conclusions: The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .
format Online
Article
Text
id pubmed-7204421
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Hamadan University of Medical Sciences
record_format MEDLINE/PubMed
spelling pubmed-72044212020-05-11 A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes Esmaily, Habibollah Tayefi, Maryam Doosti, Hassan Ghayour-Mobarhan, Majid Nezami, Hossein Amirabadizadeh, Alireza J Res Health Sci Original Article Background: We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. Study design: A cross-sectional study. Methods: The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. Results: The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. Conclusions: The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM . Hamadan University of Medical Sciences 2018-04-24 /pmc/articles/PMC7204421/ Text en © 2018 The Author(s); Published by Hamadan University of Medical Sciences. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Esmaily, Habibollah
Tayefi, Maryam
Doosti, Hassan
Ghayour-Mobarhan, Majid
Nezami, Hossein
Amirabadizadeh, Alireza
A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes
title A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes
title_full A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes
title_fullStr A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes
title_full_unstemmed A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes
title_short A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes
title_sort comparison between decision tree and random forest in determining the risk factors associated with type 2 diabetes
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7204421/
work_keys_str_mv AT esmailyhabibollah acomparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT tayefimaryam acomparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT doostihassan acomparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT ghayourmobarhanmajid acomparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT nezamihossein acomparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT amirabadizadehalireza acomparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT esmailyhabibollah comparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT tayefimaryam comparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT doostihassan comparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT ghayourmobarhanmajid comparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT nezamihossein comparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes
AT amirabadizadehalireza comparisonbetweendecisiontreeandrandomforestindeterminingtheriskfactorsassociatedwithtype2diabetes