Cargando…
Machine learning-based models for the prediction of breast cancer recurrence risk
Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10688055/ https://www.ncbi.nlm.nih.gov/pubmed/38031071 http://dx.doi.org/10.1186/s12911-023-02377-z |
_version_ | 1785152101941248000 |
---|---|
author | Zuo, Duo Yang, Lexin Jin, Yu Qi, Huan Liu, Yahui Ren, Li |
author_facet | Zuo, Duo Yang, Lexin Jin, Yu Qi, Huan Liu, Yahui Ren, Li |
author_sort | Zuo, Duo |
collection | PubMed |
description | Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02377-z. |
format | Online Article Text |
id | pubmed-10688055 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106880552023-11-30 Machine learning-based models for the prediction of breast cancer recurrence risk Zuo, Duo Yang, Lexin Jin, Yu Qi, Huan Liu, Yahui Ren, Li BMC Med Inform Decis Mak Research Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02377-z. BioMed Central 2023-11-29 /pmc/articles/PMC10688055/ /pubmed/38031071 http://dx.doi.org/10.1186/s12911-023-02377-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zuo, Duo Yang, Lexin Jin, Yu Qi, Huan Liu, Yahui Ren, Li Machine learning-based models for the prediction of breast cancer recurrence risk |
title | Machine learning-based models for the prediction of breast cancer recurrence risk |
title_full | Machine learning-based models for the prediction of breast cancer recurrence risk |
title_fullStr | Machine learning-based models for the prediction of breast cancer recurrence risk |
title_full_unstemmed | Machine learning-based models for the prediction of breast cancer recurrence risk |
title_short | Machine learning-based models for the prediction of breast cancer recurrence risk |
title_sort | machine learning-based models for the prediction of breast cancer recurrence risk |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10688055/ https://www.ncbi.nlm.nih.gov/pubmed/38031071 http://dx.doi.org/10.1186/s12911-023-02377-z |
work_keys_str_mv | AT zuoduo machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk AT yanglexin machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk AT jinyu machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk AT qihuan machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk AT liuyahui machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk AT renli machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk |