Cargando…
Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data
BACKGROUND: Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation. METHODS: A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an app...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280940/ https://www.ncbi.nlm.nih.gov/pubmed/22067671 http://dx.doi.org/10.1186/1475-925X-10-97 |
_version_ | 1782223884451315712 |
---|---|
author | Behera, Madhusmita Fowler, Erin E Owonikoko, Taofeek K Land, Walker H Mayfield, William Chen, Zhengjia Khuri, Fadlo R Ramalingam, Suresh S Heine, John J |
author_facet | Behera, Madhusmita Fowler, Erin E Owonikoko, Taofeek K Land, Walker H Mayfield, William Chen, Zhengjia Khuri, Fadlo R Ramalingam, Suresh S Heine, John J |
author_sort | Behera, Madhusmita |
collection | PubMed |
description | BACKGROUND: Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation. METHODS: A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an approach for using SL methods as a preprocessing step for survival analysis. A stochastic method of training a probabilistic neural network (PNN) was used with differential evolution (DE) optimization. Survival scores were derived stochastically by combining CVs with the PNN. Patients (n = 151) were dichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome groups. These PNN derived scores were used with logistic regression (LR) modeling to predict favorable survival outcome and were integrated into the survival analysis (i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared with the respective modeling using raw CVs. The area under the receiver operating characteristic curve (Az) was used to compare model predictive capability. Odds ratios (ORs) and hazard ratios (HRs) were used to compare disease associations with 95% confidence intervals (CIs). RESULTS: The LR model with the best predictive capability gave Az = 0.703. While controlling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard deviation (SD) increase in age indicates increasing age confers unfavorable outcome. The hybrid LR model gave Az = 0.778 by combining age and tumor grade with the PNN and controlling for gender. The PNN score and age translate inversely with respect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates those patients with decreased score confer unfavorable outcome. The tumor grade adjusted hazard for patients above the median age compared with those below the median was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below the median PNN score compared to those above the median was HR = 4.0 (CI: 2.13, 7.14). CONCLUSION: We have provided preliminary evidence showing that the SL preprocessing may provide benefits in comparison with accepted approaches. The work will require further evaluation with varying datasets to confirm these findings. |
format | Online Article Text |
id | pubmed-3280940 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32809402012-02-17 Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data Behera, Madhusmita Fowler, Erin E Owonikoko, Taofeek K Land, Walker H Mayfield, William Chen, Zhengjia Khuri, Fadlo R Ramalingam, Suresh S Heine, John J Biomed Eng Online Research BACKGROUND: Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation. METHODS: A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an approach for using SL methods as a preprocessing step for survival analysis. A stochastic method of training a probabilistic neural network (PNN) was used with differential evolution (DE) optimization. Survival scores were derived stochastically by combining CVs with the PNN. Patients (n = 151) were dichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome groups. These PNN derived scores were used with logistic regression (LR) modeling to predict favorable survival outcome and were integrated into the survival analysis (i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared with the respective modeling using raw CVs. The area under the receiver operating characteristic curve (Az) was used to compare model predictive capability. Odds ratios (ORs) and hazard ratios (HRs) were used to compare disease associations with 95% confidence intervals (CIs). RESULTS: The LR model with the best predictive capability gave Az = 0.703. While controlling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard deviation (SD) increase in age indicates increasing age confers unfavorable outcome. The hybrid LR model gave Az = 0.778 by combining age and tumor grade with the PNN and controlling for gender. The PNN score and age translate inversely with respect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates those patients with decreased score confer unfavorable outcome. The tumor grade adjusted hazard for patients above the median age compared with those below the median was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below the median PNN score compared to those above the median was HR = 4.0 (CI: 2.13, 7.14). CONCLUSION: We have provided preliminary evidence showing that the SL preprocessing may provide benefits in comparison with accepted approaches. The work will require further evaluation with varying datasets to confirm these findings. BioMed Central 2011-11-08 /pmc/articles/PMC3280940/ /pubmed/22067671 http://dx.doi.org/10.1186/1475-925X-10-97 Text en Copyright ©2011 Behera et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Behera, Madhusmita Fowler, Erin E Owonikoko, Taofeek K Land, Walker H Mayfield, William Chen, Zhengjia Khuri, Fadlo R Ramalingam, Suresh S Heine, John J Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data |
title | Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data |
title_full | Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data |
title_fullStr | Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data |
title_full_unstemmed | Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data |
title_short | Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data |
title_sort | statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280940/ https://www.ncbi.nlm.nih.gov/pubmed/22067671 http://dx.doi.org/10.1186/1475-925X-10-97 |
work_keys_str_mv | AT beheramadhusmita statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT fowlererine statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT owonikokotaofeekk statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT landwalkerh statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT mayfieldwilliam statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT chenzhengjia statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT khurifadlor statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT ramalingamsureshs statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata AT heinejohnj statisticallearningmethodsasapreprocessingstepforsurvivalanalysisevaluationofconceptusinglungcancerdata |