Cargando…

Machine learning-based models for the prediction of breast cancer recurrence risk

Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different...

Descripción completa

Detalles Bibliográficos
Autores principales: Zuo, Duo, Yang, Lexin, Jin, Yu, Qi, Huan, Liu, Yahui, Ren, Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10688055/
https://www.ncbi.nlm.nih.gov/pubmed/38031071
http://dx.doi.org/10.1186/s12911-023-02377-z
_version_ 1785152101941248000
author Zuo, Duo
Yang, Lexin
Jin, Yu
Qi, Huan
Liu, Yahui
Ren, Li
author_facet Zuo, Duo
Yang, Lexin
Jin, Yu
Qi, Huan
Liu, Yahui
Ren, Li
author_sort Zuo, Duo
collection PubMed
description Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02377-z.
format Online
Article
Text
id pubmed-10688055
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106880552023-11-30 Machine learning-based models for the prediction of breast cancer recurrence risk Zuo, Duo Yang, Lexin Jin, Yu Qi, Huan Liu, Yahui Ren, Li BMC Med Inform Decis Mak Research Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02377-z. BioMed Central 2023-11-29 /pmc/articles/PMC10688055/ /pubmed/38031071 http://dx.doi.org/10.1186/s12911-023-02377-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zuo, Duo
Yang, Lexin
Jin, Yu
Qi, Huan
Liu, Yahui
Ren, Li
Machine learning-based models for the prediction of breast cancer recurrence risk
title Machine learning-based models for the prediction of breast cancer recurrence risk
title_full Machine learning-based models for the prediction of breast cancer recurrence risk
title_fullStr Machine learning-based models for the prediction of breast cancer recurrence risk
title_full_unstemmed Machine learning-based models for the prediction of breast cancer recurrence risk
title_short Machine learning-based models for the prediction of breast cancer recurrence risk
title_sort machine learning-based models for the prediction of breast cancer recurrence risk
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10688055/
https://www.ncbi.nlm.nih.gov/pubmed/38031071
http://dx.doi.org/10.1186/s12911-023-02377-z
work_keys_str_mv AT zuoduo machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk
AT yanglexin machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk
AT jinyu machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk
AT qihuan machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk
AT liuyahui machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk
AT renli machinelearningbasedmodelsforthepredictionofbreastcancerrecurrencerisk