Cargando…
Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database
SIMPLE SUMMARY: Formulating accurate survival prediction models of oral and pharyngeal cancers (OPCs) is important, as they might impact the decisions of clinicians and patients. Improving the quality of these clinical prediction modelling studies can benefit the reliability of the developed models...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7600270/ https://www.ncbi.nlm.nih.gov/pubmed/33003533 http://dx.doi.org/10.3390/cancers12102802 |
_version_ | 1783603103728140288 |
---|---|
author | Du, Mi Haag, Dandara G. Lynch, John W. Mittinty, Murthy N. |
author_facet | Du, Mi Haag, Dandara G. Lynch, John W. Mittinty, Murthy N. |
author_sort | Du, Mi |
collection | PubMed |
description | SIMPLE SUMMARY: Formulating accurate survival prediction models of oral and pharyngeal cancers (OPCs) is important, as they might impact the decisions of clinicians and patients. Improving the quality of these clinical prediction modelling studies can benefit the reliability of the developed models and facilitate their implementations in clinical practice. Given the growing trend on the application of machine learning methods in cancer research, we present the use of popular tree-based machine learning algorithms and compare them to the standard Cox regression as an aim to predict OPCs survival. The predictive models discussed here are based on a large cancer registry dataset incorporating various prognosis factors and different forms of bias. The comparable predictive performance between Cox and tree-based models suggested that these machine learning algorithms provide non-parametric alternatives to Cox regression and are of clinical use for estimating the survival probability of OPCs patients. ABSTRACT: This study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs between 2004 and 2009 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Three tree-based machine learning algorithms (survival tree (ST), random forest (RF) and conditional inference forest (CF)), together with a reference technique (Cox proportional hazard models (Cox)), were used to develop the survival prediction models. To handle the missing values in predictors, we applied the substantive model compatible version of the fully conditional specification imputation approach to the Cox model, whereas we used RF to impute missing data for the ST, RF and CF models. For internal validation, we used 10-fold cross-validation with 50 iterations in the model development datasets. Following this, model performance was evaluated using the C-index, integrated Brier score (IBS) and calibration curves in the test datasets. For predicting the 3-year survival of OPCs with the complete cases, the C-index in the development sets were 0.77 (0.77, 0.77), 0.70 (0.70, 0.70), 0.83 (0.83, 0.84) and 0.83 (0.83, 0.86) for Cox, ST, RF and CF, respectively. Similar results were observed in the 5-year survival prediction models, with C-index for Cox, ST, RF and CF being 0.76 (0.76, 0.76), 0.69 (0.69, 0.70), 0.83 (0.83, 0.83) and 0.85 (0.84, 0.86), respectively, in development datasets. The prediction error curves based on IBS showed a similar pattern for these models. The predictive performance remained unchanged in the analyses with imputed data. Additionally, a free web-based calculator was developed for potential clinical use. In conclusion, compared to Cox regression, ST had a lower and RF and CF had a higher predictive accuracy in predicting the 3- and 5-year OPCs survival using SEER data. The RF and CF algorithms provide non-parametric alternatives to Cox regression to be of clinical use for estimating the survival probability of OPCs patients. |
format | Online Article Text |
id | pubmed-7600270 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-76002702020-11-01 Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database Du, Mi Haag, Dandara G. Lynch, John W. Mittinty, Murthy N. Cancers (Basel) Article SIMPLE SUMMARY: Formulating accurate survival prediction models of oral and pharyngeal cancers (OPCs) is important, as they might impact the decisions of clinicians and patients. Improving the quality of these clinical prediction modelling studies can benefit the reliability of the developed models and facilitate their implementations in clinical practice. Given the growing trend on the application of machine learning methods in cancer research, we present the use of popular tree-based machine learning algorithms and compare them to the standard Cox regression as an aim to predict OPCs survival. The predictive models discussed here are based on a large cancer registry dataset incorporating various prognosis factors and different forms of bias. The comparable predictive performance between Cox and tree-based models suggested that these machine learning algorithms provide non-parametric alternatives to Cox regression and are of clinical use for estimating the survival probability of OPCs patients. ABSTRACT: This study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs between 2004 and 2009 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Three tree-based machine learning algorithms (survival tree (ST), random forest (RF) and conditional inference forest (CF)), together with a reference technique (Cox proportional hazard models (Cox)), were used to develop the survival prediction models. To handle the missing values in predictors, we applied the substantive model compatible version of the fully conditional specification imputation approach to the Cox model, whereas we used RF to impute missing data for the ST, RF and CF models. For internal validation, we used 10-fold cross-validation with 50 iterations in the model development datasets. Following this, model performance was evaluated using the C-index, integrated Brier score (IBS) and calibration curves in the test datasets. For predicting the 3-year survival of OPCs with the complete cases, the C-index in the development sets were 0.77 (0.77, 0.77), 0.70 (0.70, 0.70), 0.83 (0.83, 0.84) and 0.83 (0.83, 0.86) for Cox, ST, RF and CF, respectively. Similar results were observed in the 5-year survival prediction models, with C-index for Cox, ST, RF and CF being 0.76 (0.76, 0.76), 0.69 (0.69, 0.70), 0.83 (0.83, 0.83) and 0.85 (0.84, 0.86), respectively, in development datasets. The prediction error curves based on IBS showed a similar pattern for these models. The predictive performance remained unchanged in the analyses with imputed data. Additionally, a free web-based calculator was developed for potential clinical use. In conclusion, compared to Cox regression, ST had a lower and RF and CF had a higher predictive accuracy in predicting the 3- and 5-year OPCs survival using SEER data. The RF and CF algorithms provide non-parametric alternatives to Cox regression to be of clinical use for estimating the survival probability of OPCs patients. MDPI 2020-09-29 /pmc/articles/PMC7600270/ /pubmed/33003533 http://dx.doi.org/10.3390/cancers12102802 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Du, Mi Haag, Dandara G. Lynch, John W. Mittinty, Murthy N. Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database |
title | Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database |
title_full | Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database |
title_fullStr | Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database |
title_full_unstemmed | Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database |
title_short | Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database |
title_sort | comparison of the tree-based machine learning algorithms to cox regression in predicting the survival of oral and pharyngeal cancers: analyses based on seer database |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7600270/ https://www.ncbi.nlm.nih.gov/pubmed/33003533 http://dx.doi.org/10.3390/cancers12102802 |
work_keys_str_mv | AT dumi comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase AT haagdandarag comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase AT lynchjohnw comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase AT mittintymurthyn comparisonofthetreebasedmachinelearningalgorithmstocoxregressioninpredictingthesurvivaloforalandpharyngealcancersanalysesbasedonseerdatabase |