Cargando…

Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm †

Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used acti...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Minseok, Jung, Hyeyoom, Lee, Seungyong, Kim, Donghyeon, Ahn, Minkyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8065661/
https://www.ncbi.nlm.nih.gov/pubmed/33918453
http://dx.doi.org/10.3390/brainsci11040453
_version_ 1783682393219006464
author Song, Minseok
Jung, Hyeyoom
Lee, Seungyong
Kim, Donghyeon
Ahn, Minkyu
author_facet Song, Minseok
Jung, Hyeyoom
Lee, Seungyong
Kim, Donghyeon
Ahn, Minkyu
author_sort Song, Minseok
collection PubMed
description Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used actively to predict Alzheimer’s disease (AD) with brain MRIs. Recent studies have reported RF’s effectiveness in predicting AD, but the test sample sizes were too small to draw any solid conclusions. Thus, it is timely to compare RF with other learning model methods, including deep learning, particularly with large amounts of data. In this study, we tested RF and various machine learning models with regional volumes from 2250 brain MRIs: 687 normal controls (NC), 1094 mild cognitive impairment (MCI), and 469 AD that ADNI (Alzheimer’s Disease Neuroimaging Initiative database) provided. Three types of features sets (63, 29, and 22 features) were selected, and classification accuracies were computed with RF, Support vector machine (SVM), Multi-layer perceptron (MLP), and Convolutional neural network (CNN). As a result, RF, MLP, and CNN showed high performances of 90.2%, 89.6%, and 90.5% with 63 features. Interestingly, when 22 features were used, RF showed the smallest decrease in accuracy, −3.8%, and the standard deviation did not change significantly, while MLP and CNN yielded decreases in accuracy of −6.8% and −4.5% with changes in the standard deviation from 3.3% to 4.0% for MLP and 2.1% to 7.0% for CNN, indicating that RF predicts AD more reliably with fewer features. In addition, we investigated the importance of the features that RF provides, and identified the hippocampus, amygdala, and inferior lateral ventricle as the major contributors in classifying NC, MCI, and AD. On average, AD showed smaller hippocampus and amygdala volumes and a larger volume of inferior lateral ventricle than those of MCI and NC.
format Online
Article
Text
id pubmed-8065661
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-80656612021-04-25 Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm † Song, Minseok Jung, Hyeyoom Lee, Seungyong Kim, Donghyeon Ahn, Minkyu Brain Sci Article Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used actively to predict Alzheimer’s disease (AD) with brain MRIs. Recent studies have reported RF’s effectiveness in predicting AD, but the test sample sizes were too small to draw any solid conclusions. Thus, it is timely to compare RF with other learning model methods, including deep learning, particularly with large amounts of data. In this study, we tested RF and various machine learning models with regional volumes from 2250 brain MRIs: 687 normal controls (NC), 1094 mild cognitive impairment (MCI), and 469 AD that ADNI (Alzheimer’s Disease Neuroimaging Initiative database) provided. Three types of features sets (63, 29, and 22 features) were selected, and classification accuracies were computed with RF, Support vector machine (SVM), Multi-layer perceptron (MLP), and Convolutional neural network (CNN). As a result, RF, MLP, and CNN showed high performances of 90.2%, 89.6%, and 90.5% with 63 features. Interestingly, when 22 features were used, RF showed the smallest decrease in accuracy, −3.8%, and the standard deviation did not change significantly, while MLP and CNN yielded decreases in accuracy of −6.8% and −4.5% with changes in the standard deviation from 3.3% to 4.0% for MLP and 2.1% to 7.0% for CNN, indicating that RF predicts AD more reliably with fewer features. In addition, we investigated the importance of the features that RF provides, and identified the hippocampus, amygdala, and inferior lateral ventricle as the major contributors in classifying NC, MCI, and AD. On average, AD showed smaller hippocampus and amygdala volumes and a larger volume of inferior lateral ventricle than those of MCI and NC. MDPI 2021-04-02 /pmc/articles/PMC8065661/ /pubmed/33918453 http://dx.doi.org/10.3390/brainsci11040453 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Song, Minseok
Jung, Hyeyoom
Lee, Seungyong
Kim, Donghyeon
Ahn, Minkyu
Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm †
title Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm †
title_full Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm †
title_fullStr Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm †
title_full_unstemmed Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm †
title_short Diagnostic Classification and Biomarker Identification of Alzheimer’s Disease with Random Forest Algorithm †
title_sort diagnostic classification and biomarker identification of alzheimer’s disease with random forest algorithm †
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8065661/
https://www.ncbi.nlm.nih.gov/pubmed/33918453
http://dx.doi.org/10.3390/brainsci11040453
work_keys_str_mv AT songminseok diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT junghyeyoom diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT leeseungyong diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT kimdonghyeon diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm
AT ahnminkyu diagnosticclassificationandbiomarkeridentificationofalzheimersdiseasewithrandomforestalgorithm