Cargando…

Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features

To improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans. 553 OPC Patients randomly split into training (80%) and validation (20%), were classified into 2 or 3 risk groups by applyin...

Descripción completa

Detalles Bibliográficos
Autores principales: Patel, Harsh, Vock, David M., Marai, G. Elisabeta, Fuller, Clifton D., Mohamed, Abdallah S. R., Canahuate, Guadalupe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8263609/
https://www.ncbi.nlm.nih.gov/pubmed/34234160
http://dx.doi.org/10.1038/s41598-021-92072-8
_version_ 1783719421135552512
author Patel, Harsh
Vock, David M.
Marai, G. Elisabeta
Fuller, Clifton D.
Mohamed, Abdallah S. R.
Canahuate, Guadalupe
author_facet Patel, Harsh
Vock, David M.
Marai, G. Elisabeta
Fuller, Clifton D.
Mohamed, Abdallah S. R.
Canahuate, Guadalupe
author_sort Patel, Harsh
collection PubMed
description To improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans. 553 OPC Patients randomly split into training (80%) and validation (20%), were classified into 2 or 3 risk groups by applying hierarchical clustering over the co-occurrence matrix obtained from a random survival forest (RSF) trained over 301 radiomic features. The cluster label was included together with other clinical data to train an ensemble model using five predictive models (Cox, random forest, RSF, logistic regression, and logistic-elastic net). Ensemble performance was evaluated over the independent test set for both recurrence free survival (RFS) and overall survival (OS). The Kaplan–Meier curves for OS stratified by cluster label show significant differences for both training and testing (p val < 0.0001). When compared to the models trained using clinical data only, the inclusion of the cluster label improves AUC test performance from .62 to .79 and from .66 to .80 for OS and RFS, respectively. The extraction of a single feature, namely a cluster label, to represent the high-dimensional radiomic feature space reduces the dimensionality and sparsity of the data. Moreover, inclusion of the cluster label improves model performance compared to clinical data only and offers comparable performance to the models including raw radiomic features.
format Online
Article
Text
id pubmed-8263609
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-82636092021-07-09 Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features Patel, Harsh Vock, David M. Marai, G. Elisabeta Fuller, Clifton D. Mohamed, Abdallah S. R. Canahuate, Guadalupe Sci Rep Article To improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans. 553 OPC Patients randomly split into training (80%) and validation (20%), were classified into 2 or 3 risk groups by applying hierarchical clustering over the co-occurrence matrix obtained from a random survival forest (RSF) trained over 301 radiomic features. The cluster label was included together with other clinical data to train an ensemble model using five predictive models (Cox, random forest, RSF, logistic regression, and logistic-elastic net). Ensemble performance was evaluated over the independent test set for both recurrence free survival (RFS) and overall survival (OS). The Kaplan–Meier curves for OS stratified by cluster label show significant differences for both training and testing (p val < 0.0001). When compared to the models trained using clinical data only, the inclusion of the cluster label improves AUC test performance from .62 to .79 and from .66 to .80 for OS and RFS, respectively. The extraction of a single feature, namely a cluster label, to represent the high-dimensional radiomic feature space reduces the dimensionality and sparsity of the data. Moreover, inclusion of the cluster label improves model performance compared to clinical data only and offers comparable performance to the models including raw radiomic features. Nature Publishing Group UK 2021-07-07 /pmc/articles/PMC8263609/ /pubmed/34234160 http://dx.doi.org/10.1038/s41598-021-92072-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Patel, Harsh
Vock, David M.
Marai, G. Elisabeta
Fuller, Clifton D.
Mohamed, Abdallah S. R.
Canahuate, Guadalupe
Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_full Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_fullStr Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_full_unstemmed Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_short Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_sort oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8263609/
https://www.ncbi.nlm.nih.gov/pubmed/34234160
http://dx.doi.org/10.1038/s41598-021-92072-8
work_keys_str_mv AT patelharsh oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT vockdavidm oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT maraigelisabeta oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT fullercliftond oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT mohamedabdallahsr oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT canahuateguadalupe oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures