Cargando…

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

BACKGROUND: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We...

Descripción completa

Detalles Bibliográficos
Autores principales: Maroco, João, Silva, Dina, Rodrigues, Ana, Guerreiro, Manuela, Santana, Isabel, de Mendonça, Alexandre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180705/
https://www.ncbi.nlm.nih.gov/pubmed/21849043
http://dx.doi.org/10.1186/1756-0500-4-299
_version_ 1782212682637639680
author Maroco, João
Silva, Dina
Rodrigues, Ana
Guerreiro, Manuela
Santana, Isabel
de Mendonça, Alexandre
author_facet Maroco, João
Silva, Dina
Rodrigues, Ana
Guerreiro, Manuela
Santana, Isabel
de Mendonça, Alexandre
author_sort Maroco, João
collection PubMed
description BACKGROUND: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. RESULTS: Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. CONCLUSIONS: When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
format Online
Article
Text
id pubmed-3180705
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31807052011-09-28 Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests Maroco, João Silva, Dina Rodrigues, Ana Guerreiro, Manuela Santana, Isabel de Mendonça, Alexandre BMC Res Notes Research Article BACKGROUND: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. RESULTS: Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. CONCLUSIONS: When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. BioMed Central 2011-08-17 /pmc/articles/PMC3180705/ /pubmed/21849043 http://dx.doi.org/10.1186/1756-0500-4-299 Text en Copyright ©2011 Maroco et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Maroco, João
Silva, Dina
Rodrigues, Ana
Guerreiro, Manuela
Santana, Isabel
de Mendonça, Alexandre
Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
title Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
title_full Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
title_fullStr Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
title_full_unstemmed Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
title_short Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
title_sort data mining methods in the prediction of dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180705/
https://www.ncbi.nlm.nih.gov/pubmed/21849043
http://dx.doi.org/10.1186/1756-0500-4-299
work_keys_str_mv AT marocojoao dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests
AT silvadina dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests
AT rodriguesana dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests
AT guerreiromanuela dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests
AT santanaisabel dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests
AT demendoncaalexandre dataminingmethodsinthepredictionofdementiaarealdatacomparisonoftheaccuracysensitivityandspecificityoflineardiscriminantanalysislogisticregressionneuralnetworkssupportvectormachinesclassificationtreesandrandomforests