Cargando…

Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran

OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant an...

Descripción completa

Detalles Bibliográficos
Autores principales: Tapak, Lily, Mahjub, Hossein, Hamidi, Omid, Poorolajal, Jalal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korean Society of Medical Informatics 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810525/
https://www.ncbi.nlm.nih.gov/pubmed/24175116
http://dx.doi.org/10.4258/hir.2013.19.3.177
_version_ 1782288803510091776
author Tapak, Lily
Mahjub, Hossein
Hamidi, Omid
Poorolajal, Jalal
author_facet Tapak, Lily
Mahjub, Hossein
Hamidi, Omid
Poorolajal, Jalal
author_sort Tapak, Lily
collection PubMed
description OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
format Online
Article
Text
id pubmed-3810525
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Korean Society of Medical Informatics
record_format MEDLINE/PubMed
spelling pubmed-38105252013-10-30 Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran Tapak, Lily Mahjub, Hossein Hamidi, Omid Poorolajal, Jalal Healthc Inform Res Original Article OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases. Korean Society of Medical Informatics 2013-09 2013-09-30 /pmc/articles/PMC3810525/ /pubmed/24175116 http://dx.doi.org/10.4258/hir.2013.19.3.177 Text en © 2013 The Korean Society of Medical Informatics http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Tapak, Lily
Mahjub, Hossein
Hamidi, Omid
Poorolajal, Jalal
Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
title Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
title_full Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
title_fullStr Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
title_full_unstemmed Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
title_short Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
title_sort real-data comparison of data mining methods in prediction of diabetes in iran
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810525/
https://www.ncbi.nlm.nih.gov/pubmed/24175116
http://dx.doi.org/10.4258/hir.2013.19.3.177
work_keys_str_mv AT tapaklily realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran
AT mahjubhossein realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran
AT hamidiomid realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran
AT poorolajaljalal realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran