Cargando…
Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran
OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant an...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Korean Society of Medical Informatics
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810525/ https://www.ncbi.nlm.nih.gov/pubmed/24175116 http://dx.doi.org/10.4258/hir.2013.19.3.177 |
_version_ | 1782288803510091776 |
---|---|
author | Tapak, Lily Mahjub, Hossein Hamidi, Omid Poorolajal, Jalal |
author_facet | Tapak, Lily Mahjub, Hossein Hamidi, Omid Poorolajal, Jalal |
author_sort | Tapak, Lily |
collection | PubMed |
description | OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases. |
format | Online Article Text |
id | pubmed-3810525 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Korean Society of Medical Informatics |
record_format | MEDLINE/PubMed |
spelling | pubmed-38105252013-10-30 Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran Tapak, Lily Mahjub, Hossein Hamidi, Omid Poorolajal, Jalal Healthc Inform Res Original Article OBJECTIVES: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. METHODS: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. RESULTS: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). CONCLUSIONS: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases. Korean Society of Medical Informatics 2013-09 2013-09-30 /pmc/articles/PMC3810525/ /pubmed/24175116 http://dx.doi.org/10.4258/hir.2013.19.3.177 Text en © 2013 The Korean Society of Medical Informatics http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Tapak, Lily Mahjub, Hossein Hamidi, Omid Poorolajal, Jalal Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran |
title | Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran |
title_full | Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran |
title_fullStr | Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran |
title_full_unstemmed | Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran |
title_short | Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran |
title_sort | real-data comparison of data mining methods in prediction of diabetes in iran |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810525/ https://www.ncbi.nlm.nih.gov/pubmed/24175116 http://dx.doi.org/10.4258/hir.2013.19.3.177 |
work_keys_str_mv | AT tapaklily realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran AT mahjubhossein realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran AT hamidiomid realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran AT poorolajaljalal realdatacomparisonofdataminingmethodsinpredictionofdiabetesiniran |