Cargando…

Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer

BACKGROUND: Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gong, Xian, Zheng, Bin, Xu, Guobing, Chen, Hao, Chen, Chun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	AME Publishing Company 2021
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662490/ https://www.ncbi.nlm.nih.gov/pubmed/34992804 http://dx.doi.org/10.21037/jtd-21-1107

_version_	1784613448989016064
author	Gong, Xian Zheng, Bin Xu, Guobing Chen, Hao Chen, Chun
author_facet	Gong, Xian Zheng, Bin Xu, Guobing Chen, Hao Chen, Chun
author_sort	Gong, Xian
collection	PubMed
description	BACKGROUND: Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms. METHODS: We retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement. RESULTS: After excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the χ(2) test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes. CONCLUSIONS: The XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future.
format	Online Article Text
id	pubmed-8662490
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	AME Publishing Company
record_format	MEDLINE/PubMed
spelling	pubmed-86624902022-01-05 Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer Gong, Xian Zheng, Bin Xu, Guobing Chen, Hao Chen, Chun J Thorac Dis Original Article BACKGROUND: Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms. METHODS: We retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement. RESULTS: After excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the χ(2) test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes. CONCLUSIONS: The XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future. AME Publishing Company 2021-11 /pmc/articles/PMC8662490/ /pubmed/34992804 http://dx.doi.org/10.21037/jtd-21-1107 Text en 2021 Journal of Thoracic Disease. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle	Original Article Gong, Xian Zheng, Bin Xu, Guobing Chen, Hao Chen, Chun Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer
title	Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer
title_full	Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer
title_fullStr	Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer
title_full_unstemmed	Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer
title_short	Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer
title_sort	application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662490/ https://www.ncbi.nlm.nih.gov/pubmed/34992804 http://dx.doi.org/10.21037/jtd-21-1107
work_keys_str_mv	AT gongxian applicationofmachinelearningapproachestopredictthe5yearsurvivalstatusofpatientswithesophagealcancer AT zhengbin applicationofmachinelearningapproachestopredictthe5yearsurvivalstatusofpatientswithesophagealcancer AT xuguobing applicationofmachinelearningapproachestopredictthe5yearsurvivalstatusofpatientswithesophagealcancer AT chenhao applicationofmachinelearningapproachestopredictthe5yearsurvivalstatusofpatientswithesophagealcancer AT chenchun applicationofmachinelearningapproachestopredictthe5yearsurvivalstatusofpatientswithesophagealcancer

Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer

Ejemplares similares