Cargando…

Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort

PURPOSE: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Mo, Gao, Lihao, He, Bin, Yang, Yufei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Dove 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8742582/
https://www.ncbi.nlm.nih.gov/pubmed/35018119
http://dx.doi.org/10.2147/CMAR.S340739
_version_ 1784629747687358464
author Tang, Mo
Gao, Lihao
He, Bin
Yang, Yufei
author_facet Tang, Mo
Gao, Lihao
He, Bin
Yang, Yufei
author_sort Tang, Mo
collection PubMed
description PURPOSE: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. PATIENTS AND METHODS: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with non-metastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. RESULTS: The XGBoost approach showed the highest AUC values of 0.86 (0.84–0.88), 0.82 (0.81–0.83), and 0.81 (0.79–0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64–0.79), 0.79 (0.74–0.86), and 0.89 (0.82–0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. CONCLUSION: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
format Online
Article
Text
id pubmed-8742582
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Dove
record_format MEDLINE/PubMed
spelling pubmed-87425822022-01-10 Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort Tang, Mo Gao, Lihao He, Bin Yang, Yufei Cancer Manag Res Original Research PURPOSE: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. PATIENTS AND METHODS: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with non-metastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. RESULTS: The XGBoost approach showed the highest AUC values of 0.86 (0.84–0.88), 0.82 (0.81–0.83), and 0.81 (0.79–0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64–0.79), 0.79 (0.74–0.86), and 0.89 (0.82–0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. CONCLUSION: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered. Dove 2022-01-04 /pmc/articles/PMC8742582/ /pubmed/35018119 http://dx.doi.org/10.2147/CMAR.S340739 Text en © 2022 Tang et al. https://creativecommons.org/licenses/by-nc/3.0/This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/ (https://creativecommons.org/licenses/by-nc/3.0/) ). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php).
spellingShingle Original Research
Tang, Mo
Gao, Lihao
He, Bin
Yang, Yufei
Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort
title Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort
title_full Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort
title_fullStr Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort
title_full_unstemmed Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort
title_short Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort
title_sort machine learning-based prognostic prediction models of non-metastatic colon cancer: analyses based on surveillance, epidemiology and end results database and a chinese cohort
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8742582/
https://www.ncbi.nlm.nih.gov/pubmed/35018119
http://dx.doi.org/10.2147/CMAR.S340739
work_keys_str_mv AT tangmo machinelearningbasedprognosticpredictionmodelsofnonmetastaticcoloncanceranalysesbasedonsurveillanceepidemiologyandendresultsdatabaseandachinesecohort
AT gaolihao machinelearningbasedprognosticpredictionmodelsofnonmetastaticcoloncanceranalysesbasedonsurveillanceepidemiologyandendresultsdatabaseandachinesecohort
AT hebin machinelearningbasedprognosticpredictionmodelsofnonmetastaticcoloncanceranalysesbasedonsurveillanceepidemiologyandendresultsdatabaseandachinesecohort
AT yangyufei machinelearningbasedprognosticpredictionmodelsofnonmetastaticcoloncanceranalysesbasedonsurveillanceepidemiologyandendresultsdatabaseandachinesecohort