Cargando…

An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction

Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for...

Descripción completa

Detalles Bibliográficos
Autores principales: Sedighi-Maman, Zahra, Heath, Jonathan J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503480/
https://www.ncbi.nlm.nih.gov/pubmed/36146145
http://dx.doi.org/10.3390/s22186783
_version_ 1784795973734629376
author Sedighi-Maman, Zahra
Heath, Jonathan J.
author_facet Sedighi-Maman, Zahra
Heath, Jonathan J.
author_sort Sedighi-Maman, Zahra
collection PubMed
description Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach.
format Online
Article
Text
id pubmed-9503480
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-95034802022-09-24 An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction Sedighi-Maman, Zahra Heath, Jonathan J. Sensors (Basel) Article Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach. MDPI 2022-09-08 /pmc/articles/PMC9503480/ /pubmed/36146145 http://dx.doi.org/10.3390/s22186783 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sedighi-Maman, Zahra
Heath, Jonathan J.
An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction
title An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction
title_full An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction
title_fullStr An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction
title_full_unstemmed An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction
title_short An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction
title_sort interpretable two-phase modeling approach for lung cancer survivability prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503480/
https://www.ncbi.nlm.nih.gov/pubmed/36146145
http://dx.doi.org/10.3390/s22186783
work_keys_str_mv AT sedighimamanzahra aninterpretabletwophasemodelingapproachforlungcancersurvivabilityprediction
AT heathjonathanj aninterpretabletwophasemodelingapproachforlungcancersurvivabilityprediction
AT sedighimamanzahra interpretabletwophasemodelingapproachforlungcancersurvivabilityprediction
AT heathjonathanj interpretabletwophasemodelingapproachforlungcancersurvivabilityprediction