Cargando…
Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification
Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (S...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10379972/ https://www.ncbi.nlm.nih.gov/pubmed/37510441 http://dx.doi.org/10.3390/healthcare11142000 |
_version_ | 1785080092512223232 |
---|---|
author | Sun, Jeffrey Sun, Cheuk-Kay Tang, Yun-Xuan Liu, Tzu-Chi Lu, Chi-Jie |
author_facet | Sun, Jeffrey Sun, Cheuk-Kay Tang, Yun-Xuan Liu, Tzu-Chi Lu, Chi-Jie |
author_sort | Sun, Jeffrey |
collection | PubMed |
description | Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine. |
format | Online Article Text |
id | pubmed-10379972 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-103799722023-07-29 Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification Sun, Jeffrey Sun, Cheuk-Kay Tang, Yun-Xuan Liu, Tzu-Chi Lu, Chi-Jie Healthcare (Basel) Article Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine. MDPI 2023-07-11 /pmc/articles/PMC10379972/ /pubmed/37510441 http://dx.doi.org/10.3390/healthcare11142000 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Sun, Jeffrey Sun, Cheuk-Kay Tang, Yun-Xuan Liu, Tzu-Chi Lu, Chi-Jie Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification |
title | Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification |
title_full | Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification |
title_fullStr | Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification |
title_full_unstemmed | Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification |
title_short | Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification |
title_sort | application of shap for explainable machine learning on age-based subgrouping mammography questionnaire data for positive mammography prediction and risk factor identification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10379972/ https://www.ncbi.nlm.nih.gov/pubmed/37510441 http://dx.doi.org/10.3390/healthcare11142000 |
work_keys_str_mv | AT sunjeffrey applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification AT suncheukkay applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification AT tangyunxuan applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification AT liutzuchi applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification AT luchijie applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification |