Cargando…

XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease

BACKGROUND: Due to the class imbalance issue faced when Alzheimer’s disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis p...

Descripción completa

Detalles Bibliográficos
Autores principales: Yi, Fuliang, Yang, Hui, Chen, Durong, Qin, Yao, Han, Hongjuan, Cui, Jing, Bai, Wenlin, Ma, Yifei, Zhang, Rong, Yu, Hongmei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369804/
https://www.ncbi.nlm.nih.gov/pubmed/37491248
http://dx.doi.org/10.1186/s12911-023-02238-9
_version_ 1785077838926315520
author Yi, Fuliang
Yang, Hui
Chen, Durong
Qin, Yao
Han, Hongjuan
Cui, Jing
Bai, Wenlin
Ma, Yifei
Zhang, Rong
Yu, Hongmei
author_facet Yi, Fuliang
Yang, Hui
Chen, Durong
Qin, Yao
Han, Hongjuan
Cui, Jing
Bai, Wenlin
Ma, Yifei
Zhang, Rong
Yu, Hongmei
author_sort Yi, Fuliang
collection PubMed
description BACKGROUND: Due to the class imbalance issue faced when Alzheimer’s disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. METHODS: We obtained patient data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer’s Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. RESULTS: Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. CONCLUSIONS: The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02238-9.
format Online
Article
Text
id pubmed-10369804
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-103698042023-07-27 XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease Yi, Fuliang Yang, Hui Chen, Durong Qin, Yao Han, Hongjuan Cui, Jing Bai, Wenlin Ma, Yifei Zhang, Rong Yu, Hongmei BMC Med Inform Decis Mak Research BACKGROUND: Due to the class imbalance issue faced when Alzheimer’s disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. METHODS: We obtained patient data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer’s Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. RESULTS: Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. CONCLUSIONS: The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02238-9. BioMed Central 2023-07-25 /pmc/articles/PMC10369804/ /pubmed/37491248 http://dx.doi.org/10.1186/s12911-023-02238-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Yi, Fuliang
Yang, Hui
Chen, Durong
Qin, Yao
Han, Hongjuan
Cui, Jing
Bai, Wenlin
Ma, Yifei
Zhang, Rong
Yu, Hongmei
XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
title XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
title_full XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
title_fullStr XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
title_full_unstemmed XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
title_short XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
title_sort xgboost-shap-based interpretable diagnostic framework for alzheimer’s disease
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369804/
https://www.ncbi.nlm.nih.gov/pubmed/37491248
http://dx.doi.org/10.1186/s12911-023-02238-9
work_keys_str_mv AT yifuliang xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT yanghui xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT chendurong xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT qinyao xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT hanhongjuan xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT cuijing xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT baiwenlin xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT mayifei xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT zhangrong xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease
AT yuhongmei xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease