Cargando…

Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures

Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk...

Descripción completa

Detalles Bibliográficos
Autores principales: Zwiener, Isabella, Frisch, Barbara, Binder, Harald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3885686/
https://www.ncbi.nlm.nih.gov/pubmed/24416353
http://dx.doi.org/10.1371/journal.pone.0085150
_version_ 1782298796413157376
author Zwiener, Isabella
Frisch, Barbara
Binder, Harald
author_facet Zwiener, Isabella
Frisch, Barbara
Binder, Harald
author_sort Zwiener, Isabella
collection PubMed
description Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.
format Online
Article
Text
id pubmed-3885686
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38856862014-01-10 Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures Zwiener, Isabella Frisch, Barbara Binder, Harald PLoS One Research Article Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques. Public Library of Science 2014-01-08 /pmc/articles/PMC3885686/ /pubmed/24416353 http://dx.doi.org/10.1371/journal.pone.0085150 Text en © 2014 Zwiener et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Zwiener, Isabella
Frisch, Barbara
Binder, Harald
Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
title Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
title_full Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
title_fullStr Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
title_full_unstemmed Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
title_short Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
title_sort transforming rna-seq data to improve the performance of prognostic gene signatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3885686/
https://www.ncbi.nlm.nih.gov/pubmed/24416353
http://dx.doi.org/10.1371/journal.pone.0085150
work_keys_str_mv AT zwienerisabella transformingrnaseqdatatoimprovetheperformanceofprognosticgenesignatures
AT frischbarbara transformingrnaseqdatatoimprovetheperformanceofprognosticgenesignatures
AT binderharald transformingrnaseqdatatoimprovetheperformanceofprognosticgenesignatures