Cargando…
Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
BACKGROUND: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180705/ https://www.ncbi.nlm.nih.gov/pubmed/21849043 http://dx.doi.org/10.1186/1756-0500-4-299 |
_version_ | 1782212682637639680 |
---|---|
author | Maroco, João Silva, Dina Rodrigues, Ana Guerreiro, Manuela Santana, Isabel de Mendonça, Alexandre |
author_facet | Maroco, João Silva, Dina Rodrigues, Ana Guerreiro, Manuela Santana, Isabel de Mendonça, Alexandre |
author_sort | Maroco, João |
collection | PubMed |
description | BACKGROUND: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. RESULTS: Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. CONCLUSIONS: When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. |
format | Online Article Text |
id | pubmed-3180705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31807052011-09-28 Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests Maroco, João Silva, Dina Rodrigues, Ana Guerreiro, Manuela Santana, Isabel de Mendonça, Alexandre BMC Res Notes Research Article BACKGROUND: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. RESULTS: Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. CONCLUSIONS: When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. BioMed Central 2011-08-17 /pmc/articles/PMC3180705/ /pubmed/21849043 http://dx.doi.org/10.1186/1756-0500-4-299 Text en Copyright ©2011 Maroco et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Maroco, João Silva, Dina Rodrigues, Ana Guerreiro, Manuela Santana, Isabel de Mendonça, Alexandre Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests |
title | Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests |
title_full | Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests |
title_fullStr | Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests |
title_full_unstemmed | Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests |
title_short | Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests |
title_sort | data mining methods in the prediction of dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180705/ https://www.ncbi.nlm.nih.gov/pubmed/21849043 http://dx.doi.org/10.1186/1756-0500-4-299 |
work_keys_str_mv | AT marocojoao dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests AT silvadina dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests AT rodriguesana dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests AT guerreiromanuela dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests AT santanaisabel dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests AT demendoncaalexandre dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests |