Cargando…

Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification

Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (S...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Jeffrey, Sun, Cheuk-Kay, Tang, Yun-Xuan, Liu, Tzu-Chi, Lu, Chi-Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10379972/
https://www.ncbi.nlm.nih.gov/pubmed/37510441
http://dx.doi.org/10.3390/healthcare11142000
_version_ 1785080092512223232
author Sun, Jeffrey
Sun, Cheuk-Kay
Tang, Yun-Xuan
Liu, Tzu-Chi
Lu, Chi-Jie
author_facet Sun, Jeffrey
Sun, Cheuk-Kay
Tang, Yun-Xuan
Liu, Tzu-Chi
Lu, Chi-Jie
author_sort Sun, Jeffrey
collection PubMed
description Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine.
format Online
Article
Text
id pubmed-10379972
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103799722023-07-29 Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification Sun, Jeffrey Sun, Cheuk-Kay Tang, Yun-Xuan Liu, Tzu-Chi Lu, Chi-Jie Healthcare (Basel) Article Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine. MDPI 2023-07-11 /pmc/articles/PMC10379972/ /pubmed/37510441 http://dx.doi.org/10.3390/healthcare11142000 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sun, Jeffrey
Sun, Cheuk-Kay
Tang, Yun-Xuan
Liu, Tzu-Chi
Lu, Chi-Jie
Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification
title Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification
title_full Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification
title_fullStr Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification
title_full_unstemmed Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification
title_short Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification
title_sort application of shap for explainable machine learning on age-based subgrouping mammography questionnaire data for positive mammography prediction and risk factor identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10379972/
https://www.ncbi.nlm.nih.gov/pubmed/37510441
http://dx.doi.org/10.3390/healthcare11142000
work_keys_str_mv AT sunjeffrey applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification
AT suncheukkay applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification
AT tangyunxuan applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification
AT liutzuchi applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification
AT luchijie applicationofshapforexplainablemachinelearningonagebasedsubgroupingmammographyquestionnairedataforpositivemammographypredictionandriskfactoridentification