Cargando…
XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease
BACKGROUND: Due to the class imbalance issue faced when Alzheimer’s disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis p...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369804/ https://www.ncbi.nlm.nih.gov/pubmed/37491248 http://dx.doi.org/10.1186/s12911-023-02238-9 |
_version_ | 1785077838926315520 |
---|---|
author | Yi, Fuliang Yang, Hui Chen, Durong Qin, Yao Han, Hongjuan Cui, Jing Bai, Wenlin Ma, Yifei Zhang, Rong Yu, Hongmei |
author_facet | Yi, Fuliang Yang, Hui Chen, Durong Qin, Yao Han, Hongjuan Cui, Jing Bai, Wenlin Ma, Yifei Zhang, Rong Yu, Hongmei |
author_sort | Yi, Fuliang |
collection | PubMed |
description | BACKGROUND: Due to the class imbalance issue faced when Alzheimer’s disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. METHODS: We obtained patient data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer’s Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. RESULTS: Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. CONCLUSIONS: The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02238-9. |
format | Online Article Text |
id | pubmed-10369804 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-103698042023-07-27 XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease Yi, Fuliang Yang, Hui Chen, Durong Qin, Yao Han, Hongjuan Cui, Jing Bai, Wenlin Ma, Yifei Zhang, Rong Yu, Hongmei BMC Med Inform Decis Mak Research BACKGROUND: Due to the class imbalance issue faced when Alzheimer’s disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. METHODS: We obtained patient data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer’s Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. RESULTS: Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. CONCLUSIONS: The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02238-9. BioMed Central 2023-07-25 /pmc/articles/PMC10369804/ /pubmed/37491248 http://dx.doi.org/10.1186/s12911-023-02238-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Yi, Fuliang Yang, Hui Chen, Durong Qin, Yao Han, Hongjuan Cui, Jing Bai, Wenlin Ma, Yifei Zhang, Rong Yu, Hongmei XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease |
title | XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease |
title_full | XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease |
title_fullStr | XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease |
title_full_unstemmed | XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease |
title_short | XGBoost-SHAP-based interpretable diagnostic framework for alzheimer’s disease |
title_sort | xgboost-shap-based interpretable diagnostic framework for alzheimer’s disease |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369804/ https://www.ncbi.nlm.nih.gov/pubmed/37491248 http://dx.doi.org/10.1186/s12911-023-02238-9 |
work_keys_str_mv | AT yifuliang xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT yanghui xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT chendurong xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT qinyao xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT hanhongjuan xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT cuijing xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT baiwenlin xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT mayifei xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT zhangrong xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease AT yuhongmei xgboostshapbasedinterpretablediagnosticframeworkforalzheimersdisease |