Cargando…

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

BACKGROUND: Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random sur...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiao, Jialong, Mo, Miao, Wang, Zezhou, Zhou, Changming, Shen, Jie, Yuan, Jing, He, Yulian, Zheng, Ying
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8900909/ https://www.ncbi.nlm.nih.gov/pubmed/35179504 http://dx.doi.org/10.2196/33440

_version_	1784664230083952640
author	Xiao, Jialong Mo, Miao Wang, Zezhou Zhou, Changming Shen, Jie Yuan, Jing He, Yulian Zheng, Ying
author_facet	Xiao, Jialong Mo, Miao Wang, Zezhou Zhou, Changming Shen, Jie Yuan, Jing He, Yulian Zheng, Ying
author_sort	Xiao, Jialong
collection	PubMed
description	BACKGROUND: Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance. OBJECTIVE: This study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression. METHODS: This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70%) and a test set (6653 cases, 30%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score. RESULTS: The RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95% CI 6.542, 8.930) and a C-index of 0.827 (95% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P<.001). The 4 models’ 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer. CONCLUSIONS: The RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis.
format	Online Article Text
id	pubmed-8900909
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-89009092022-03-10 The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study Xiao, Jialong Mo, Miao Wang, Zezhou Zhou, Changming Shen, Jie Yuan, Jing He, Yulian Zheng, Ying JMIR Med Inform Original Paper BACKGROUND: Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance. OBJECTIVE: This study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression. METHODS: This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70%) and a test set (6653 cases, 30%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score. RESULTS: The RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95% CI 6.542, 8.930) and a C-index of 0.827 (95% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P<.001). The 4 models’ 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer. CONCLUSIONS: The RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis. JMIR Publications 2022-02-18 /pmc/articles/PMC8900909/ /pubmed/35179504 http://dx.doi.org/10.2196/33440 Text en ©Jialong Xiao, Miao Mo, Zezhou Wang, Changming Zhou, Jie Shen, Jing Yuan, Yulian He, Ying Zheng. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 18.02.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Xiao, Jialong Mo, Miao Wang, Zezhou Zhou, Changming Shen, Jie Yuan, Jing He, Yulian Zheng, Ying The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
title	The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
title_full	The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
title_fullStr	The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
title_full_unstemmed	The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
title_short	The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
title_sort	application and comparison of machine learning models for the prediction of breast cancer prognosis: retrospective cohort study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8900909/ https://www.ncbi.nlm.nih.gov/pubmed/35179504 http://dx.doi.org/10.2196/33440
work_keys_str_mv	AT xiaojialong theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT momiao theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT wangzezhou theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT zhouchangming theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT shenjie theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT yuanjing theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT heyulian theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT zhengying theapplicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT xiaojialong applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT momiao applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT wangzezhou applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT zhouchangming applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT shenjie applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT yuanjing applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT heyulian applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy AT zhengying applicationandcomparisonofmachinelearningmodelsforthepredictionofbreastcancerprognosisretrospectivecohortstudy

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Ejemplares similares