Cargando…

Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling

Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments consider...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Sangkyun, Rahnenführer, Jörg, Lang, Michel, De Preter, Katleen, Mestdagh, Pieter, Koster, Jan, Versteeg, Rogier, Stallings, Raymond L., Varesio, Luigi, Asgharzadeh, Shahab, Schulte, Johannes H., Fielitz, Kathrin, Schwermer, Melanie, Morik, Katharina, Schramm, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4190101/
https://www.ncbi.nlm.nih.gov/pubmed/25295525
http://dx.doi.org/10.1371/journal.pone.0108818
_version_ 1782338459615100928
author Lee, Sangkyun
Rahnenführer, Jörg
Lang, Michel
De Preter, Katleen
Mestdagh, Pieter
Koster, Jan
Versteeg, Rogier
Stallings, Raymond L.
Varesio, Luigi
Asgharzadeh, Shahab
Schulte, Johannes H.
Fielitz, Kathrin
Schwermer, Melanie
Morik, Katharina
Schramm, Alexander
author_facet Lee, Sangkyun
Rahnenführer, Jörg
Lang, Michel
De Preter, Katleen
Mestdagh, Pieter
Koster, Jan
Versteeg, Rogier
Stallings, Raymond L.
Varesio, Luigi
Asgharzadeh, Shahab
Schulte, Johannes H.
Fielitz, Kathrin
Schwermer, Melanie
Morik, Katharina
Schramm, Alexander
author_sort Lee, Sangkyun
collection PubMed
description Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org).
format Online
Article
Text
id pubmed-4190101
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41901012014-10-10 Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling Lee, Sangkyun Rahnenführer, Jörg Lang, Michel De Preter, Katleen Mestdagh, Pieter Koster, Jan Versteeg, Rogier Stallings, Raymond L. Varesio, Luigi Asgharzadeh, Shahab Schulte, Johannes H. Fielitz, Kathrin Schwermer, Melanie Morik, Katharina Schramm, Alexander PLoS One Research Article Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org). Public Library of Science 2014-10-08 /pmc/articles/PMC4190101/ /pubmed/25295525 http://dx.doi.org/10.1371/journal.pone.0108818 Text en © 2014 Lee et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lee, Sangkyun
Rahnenführer, Jörg
Lang, Michel
De Preter, Katleen
Mestdagh, Pieter
Koster, Jan
Versteeg, Rogier
Stallings, Raymond L.
Varesio, Luigi
Asgharzadeh, Shahab
Schulte, Johannes H.
Fielitz, Kathrin
Schwermer, Melanie
Morik, Katharina
Schramm, Alexander
Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling
title Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling
title_full Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling
title_fullStr Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling
title_full_unstemmed Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling
title_short Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling
title_sort robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4190101/
https://www.ncbi.nlm.nih.gov/pubmed/25295525
http://dx.doi.org/10.1371/journal.pone.0108818
work_keys_str_mv AT leesangkyun robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT rahnenfuhrerjorg robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT langmichel robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT depreterkatleen robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT mestdaghpieter robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT kosterjan robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT versteegrogier robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT stallingsraymondl robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT varesioluigi robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT asgharzadehshahab robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT schultejohannesh robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT fielitzkathrin robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT schwermermelanie robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT morikkatharina robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling
AT schrammalexander robustselectionofcancersurvivalsignaturesfromhighthroughputgenomicdatausingtwofoldsubsampling