Cargando…

Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outco...

Descripción completa

Detalles Bibliográficos
Autores principales: Laimighofer, Michael, Krumsiek, Jan, Buettner, Florian, Theis, Fabian J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827277/
https://www.ncbi.nlm.nih.gov/pubmed/26894327
http://dx.doi.org/10.1089/cmb.2015.0192
_version_ 1782426450650988544
author Laimighofer, Michael
Krumsiek, Jan
Buettner, Florian
Theis, Fabian J.
author_facet Laimighofer, Michael
Krumsiek, Jan
Buettner, Florian
Theis, Fabian J.
author_sort Laimighofer, Michael
collection PubMed
description With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN.
format Online
Article
Text
id pubmed-4827277
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Mary Ann Liebert, Inc.
record_format MEDLINE/PubMed
spelling pubmed-48272772016-04-20 Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression Laimighofer, Michael Krumsiek, Jan Buettner, Florian Theis, Fabian J. J Comput Biol Research Articles With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN. Mary Ann Liebert, Inc. 2016-04-01 /pmc/articles/PMC4827277/ /pubmed/26894327 http://dx.doi.org/10.1089/cmb.2015.0192 Text en © Michael Laimighofer, et al., 2016. Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommericial License (http://creativecommons.org/licenses/by/4.0) which permits any noncommericial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Research Articles
Laimighofer, Michael
Krumsiek, Jan
Buettner, Florian
Theis, Fabian J.
Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
title Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
title_full Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
title_fullStr Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
title_full_unstemmed Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
title_short Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression
title_sort unbiased prediction and feature selection in high-dimensional survival regression
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827277/
https://www.ncbi.nlm.nih.gov/pubmed/26894327
http://dx.doi.org/10.1089/cmb.2015.0192
work_keys_str_mv AT laimighofermichael unbiasedpredictionandfeatureselectioninhighdimensionalsurvivalregression
AT krumsiekjan unbiasedpredictionandfeatureselectioninhighdimensionalsurvivalregression
AT buettnerflorian unbiasedpredictionandfeatureselectioninhighdimensionalsurvivalregression
AT theisfabianj unbiasedpredictionandfeatureselectioninhighdimensionalsurvivalregression