Cargando…

Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes

BACKGROUND: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 di...

Descripción completa

Detalles Bibliográficos
Autores principales: Esmaeily, Habibollah, Tayefi, Maryam, Ghayour-Mobarhan, Majid, Amirabadizadeh, Alireza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pasteur Institute 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6058191/
https://www.ncbi.nlm.nih.gov/pubmed/29374085
http://dx.doi.org/10.29252/ibj.22.5.303
_version_ 1783341648568123392
author Esmaeily, Habibollah
Tayefi, Maryam
Ghayour-Mobarhan, Majid
Amirabadizadeh, Alireza
author_facet Esmaeily, Habibollah
Tayefi, Maryam
Ghayour-Mobarhan, Majid
Amirabadizadeh, Alireza
author_sort Esmaeily, Habibollah
collection PubMed
description BACKGROUND: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran. METHODS: This study has randomly selected 6654 (70%) cases for training and reserved the remaining 2874 (30%) cases for testing. The three methods were compared with the help of ROC curve. RESULTS: The prevalence rate of type 2 diabetes was 14% in our population. The ANN model had 78.7% accuracy, 63.1% sensitivity, and 81.2% specificity. Also, the values of these three parameters were 76.8%, 64.5%, and 78.9%, for SVM and 77.7%, 60.1%, and 80.5% for MLR. The area under the ROC curve was 0.71 for ANN, 0.73 for SVM, and 0.70 for MLR. CONCLUSION: Our findings showed that ANN performs better than the two models (SVM and MLR) and can be used effectively to identify the associated risk factors of type 2 diabetes.
format Online
Article
Text
id pubmed-6058191
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Pasteur Institute
record_format MEDLINE/PubMed
spelling pubmed-60581912018-09-01 Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes Esmaeily, Habibollah Tayefi, Maryam Ghayour-Mobarhan, Majid Amirabadizadeh, Alireza Iran Biomed J Full Length BACKGROUND: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran. METHODS: This study has randomly selected 6654 (70%) cases for training and reserved the remaining 2874 (30%) cases for testing. The three methods were compared with the help of ROC curve. RESULTS: The prevalence rate of type 2 diabetes was 14% in our population. The ANN model had 78.7% accuracy, 63.1% sensitivity, and 81.2% specificity. Also, the values of these three parameters were 76.8%, 64.5%, and 78.9%, for SVM and 77.7%, 60.1%, and 80.5% for MLR. The area under the ROC curve was 0.71 for ANN, 0.73 for SVM, and 0.70 for MLR. CONCLUSION: Our findings showed that ANN performs better than the two models (SVM and MLR) and can be used effectively to identify the associated risk factors of type 2 diabetes. Pasteur Institute 2018-09 /pmc/articles/PMC6058191/ /pubmed/29374085 http://dx.doi.org/10.29252/ibj.22.5.303 Text en Copyright: © Iranian Biomedical Journal http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License, (http://creativecommons.org/licenses/by/3.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Full Length
Esmaeily, Habibollah
Tayefi, Maryam
Ghayour-Mobarhan, Majid
Amirabadizadeh, Alireza
Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
title Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
title_full Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
title_fullStr Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
title_full_unstemmed Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
title_short Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
title_sort comparing three data mining algorithms for identifying the associated risk factors of type 2 diabetes
topic Full Length
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6058191/
https://www.ncbi.nlm.nih.gov/pubmed/29374085
http://dx.doi.org/10.29252/ibj.22.5.303
work_keys_str_mv AT esmaeilyhabibollah comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes
AT tayefimaryam comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes
AT ghayourmobarhanmajid comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes
AT amirabadizadehalireza comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes