Cargando…
Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes
BACKGROUND: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 di...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Pasteur Institute
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6058191/ https://www.ncbi.nlm.nih.gov/pubmed/29374085 http://dx.doi.org/10.29252/ibj.22.5.303 |
_version_ | 1783341648568123392 |
---|---|
author | Esmaeily, Habibollah Tayefi, Maryam Ghayour-Mobarhan, Majid Amirabadizadeh, Alireza |
author_facet | Esmaeily, Habibollah Tayefi, Maryam Ghayour-Mobarhan, Majid Amirabadizadeh, Alireza |
author_sort | Esmaeily, Habibollah |
collection | PubMed |
description | BACKGROUND: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran. METHODS: This study has randomly selected 6654 (70%) cases for training and reserved the remaining 2874 (30%) cases for testing. The three methods were compared with the help of ROC curve. RESULTS: The prevalence rate of type 2 diabetes was 14% in our population. The ANN model had 78.7% accuracy, 63.1% sensitivity, and 81.2% specificity. Also, the values of these three parameters were 76.8%, 64.5%, and 78.9%, for SVM and 77.7%, 60.1%, and 80.5% for MLR. The area under the ROC curve was 0.71 for ANN, 0.73 for SVM, and 0.70 for MLR. CONCLUSION: Our findings showed that ANN performs better than the two models (SVM and MLR) and can be used effectively to identify the associated risk factors of type 2 diabetes. |
format | Online Article Text |
id | pubmed-6058191 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Pasteur Institute |
record_format | MEDLINE/PubMed |
spelling | pubmed-60581912018-09-01 Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes Esmaeily, Habibollah Tayefi, Maryam Ghayour-Mobarhan, Majid Amirabadizadeh, Alireza Iran Biomed J Full Length BACKGROUND: Increasing the prevalence of type 2 diabetes has given rise to a global health burden and a concern among health service providers and health administrators. The current study aimed at developing and comparing some statistical models to identify the risk factors associated with type 2 diabetes. In this light, artificial neural network (ANN), support vector machines (SVMs), and multiple logistic regression (MLR) models were applied, using demographic, anthropometric, and biochemical characteristics, on a sample of 9528 individuals from Mashhad City in Iran. METHODS: This study has randomly selected 6654 (70%) cases for training and reserved the remaining 2874 (30%) cases for testing. The three methods were compared with the help of ROC curve. RESULTS: The prevalence rate of type 2 diabetes was 14% in our population. The ANN model had 78.7% accuracy, 63.1% sensitivity, and 81.2% specificity. Also, the values of these three parameters were 76.8%, 64.5%, and 78.9%, for SVM and 77.7%, 60.1%, and 80.5% for MLR. The area under the ROC curve was 0.71 for ANN, 0.73 for SVM, and 0.70 for MLR. CONCLUSION: Our findings showed that ANN performs better than the two models (SVM and MLR) and can be used effectively to identify the associated risk factors of type 2 diabetes. Pasteur Institute 2018-09 /pmc/articles/PMC6058191/ /pubmed/29374085 http://dx.doi.org/10.29252/ibj.22.5.303 Text en Copyright: © Iranian Biomedical Journal http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License, (http://creativecommons.org/licenses/by/3.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Full Length Esmaeily, Habibollah Tayefi, Maryam Ghayour-Mobarhan, Majid Amirabadizadeh, Alireza Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes |
title | Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes |
title_full | Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes |
title_fullStr | Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes |
title_full_unstemmed | Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes |
title_short | Comparing Three Data Mining Algorithms for Identifying the Associated Risk Factors of Type 2 Diabetes |
title_sort | comparing three data mining algorithms for identifying the associated risk factors of type 2 diabetes |
topic | Full Length |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6058191/ https://www.ncbi.nlm.nih.gov/pubmed/29374085 http://dx.doi.org/10.29252/ibj.22.5.303 |
work_keys_str_mv | AT esmaeilyhabibollah comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes AT tayefimaryam comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes AT ghayourmobarhanmajid comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes AT amirabadizadehalireza comparingthreedataminingalgorithmsforidentifyingtheassociatedriskfactorsoftype2diabetes |