Cargando…
An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction
Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503480/ https://www.ncbi.nlm.nih.gov/pubmed/36146145 http://dx.doi.org/10.3390/s22186783 |
_version_ | 1784795973734629376 |
---|---|
author | Sedighi-Maman, Zahra Heath, Jonathan J. |
author_facet | Sedighi-Maman, Zahra Heath, Jonathan J. |
author_sort | Sedighi-Maman, Zahra |
collection | PubMed |
description | Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach. |
format | Online Article Text |
id | pubmed-9503480 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-95034802022-09-24 An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction Sedighi-Maman, Zahra Heath, Jonathan J. Sensors (Basel) Article Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach. MDPI 2022-09-08 /pmc/articles/PMC9503480/ /pubmed/36146145 http://dx.doi.org/10.3390/s22186783 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Sedighi-Maman, Zahra Heath, Jonathan J. An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction |
title | An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction |
title_full | An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction |
title_fullStr | An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction |
title_full_unstemmed | An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction |
title_short | An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction |
title_sort | interpretable two-phase modeling approach for lung cancer survivability prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503480/ https://www.ncbi.nlm.nih.gov/pubmed/36146145 http://dx.doi.org/10.3390/s22186783 |
work_keys_str_mv | AT sedighimamanzahra aninterpretabletwophasemodelingapproachforlungcancersurvivabilityprediction AT heathjonathanj aninterpretabletwophasemodelingapproachforlungcancersurvivabilityprediction AT sedighimamanzahra interpretabletwophasemodelingapproachforlungcancersurvivabilityprediction AT heathjonathanj interpretabletwophasemodelingapproachforlungcancersurvivabilityprediction |