Cargando…

Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes

BACKGROUND: A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguisha...

Descripción completa

Detalles Bibliográficos
Autores principales: Hellwig, Birte, Hengstler, Jan G, Schmidt, Marcus, Gehrmann, Mathias C, Schormann, Wiebke, Rahnenführer, Jörg
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892466/
https://www.ncbi.nlm.nih.gov/pubmed/20500820
http://dx.doi.org/10.1186/1471-2105-11-276
_version_ 1782182950794690560
author Hellwig, Birte
Hengstler, Jan G
Schmidt, Marcus
Gehrmann, Mathias C
Schormann, Wiebke
Rahnenführer, Jörg
author_facet Hellwig, Birte
Hengstler, Jan G
Schmidt, Marcus
Gehrmann, Mathias C
Schormann, Wiebke
Rahnenführer, Jörg
author_sort Hellwig, Birte
collection PubMed
description BACKGROUND: A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and low expression groups is easier compared to genes with unimodal expression distributions. Recently, several methods for the identification of genes with bimodal distributions have been introduced. A straightforward approach is to cluster the expression values and score the distance between the two distributions. Other scores directly measure properties of the distribution. The kurtosis, e.g., measures divergence from a normal distribution. An alternative is the outlier-sum statistic that identifies genes with extremely high or low expression values in a subset of the samples. RESULTS: We compare and discuss scores for bimodality for expression data. For the genome-wide identification of bimodal genes we apply all scores to expression data from 194 patients with node-negative breast cancer. Further, we present the first comprehensive genome-wide evaluation of the prognostic relevance of bimodal genes. We first rank genes according to bimodality scores and define two patient subgroups based on expression values. Then we assess the prognostic significance of the top ranking bimodal genes by comparing the survival functions of the two patient subgroups. We also evaluate the global association between the bimodal shape of expression distributions and survival times with an enrichment type analysis. Various cluster-based methods lead to a significant overrepresentation of prognostic genes. A striking result is obtained with the outlier-sum statistic (p < 10(-12)). Many genes with heavy tails generate subgroups of patients with different prognosis. CONCLUSIONS: Genes with high bimodality scores are promising candidates for defining prognostic patient subgroups from expression data. We discuss advantages and disadvantages of the different scores for prognostic purposes. The outlier-sum statistic may be particularly valuable for the identification of genes to be included in prognostic signatures. Among the genes identified as bimodal in the breast cancer data set several have not yet previously been recognized to be prognostic and bimodally expressed in breast cancer.
format Text
id pubmed-2892466
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28924662010-06-26 Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes Hellwig, Birte Hengstler, Jan G Schmidt, Marcus Gehrmann, Mathias C Schormann, Wiebke Rahnenführer, Jörg BMC Bioinformatics Research article BACKGROUND: A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and low expression groups is easier compared to genes with unimodal expression distributions. Recently, several methods for the identification of genes with bimodal distributions have been introduced. A straightforward approach is to cluster the expression values and score the distance between the two distributions. Other scores directly measure properties of the distribution. The kurtosis, e.g., measures divergence from a normal distribution. An alternative is the outlier-sum statistic that identifies genes with extremely high or low expression values in a subset of the samples. RESULTS: We compare and discuss scores for bimodality for expression data. For the genome-wide identification of bimodal genes we apply all scores to expression data from 194 patients with node-negative breast cancer. Further, we present the first comprehensive genome-wide evaluation of the prognostic relevance of bimodal genes. We first rank genes according to bimodality scores and define two patient subgroups based on expression values. Then we assess the prognostic significance of the top ranking bimodal genes by comparing the survival functions of the two patient subgroups. We also evaluate the global association between the bimodal shape of expression distributions and survival times with an enrichment type analysis. Various cluster-based methods lead to a significant overrepresentation of prognostic genes. A striking result is obtained with the outlier-sum statistic (p < 10(-12)). Many genes with heavy tails generate subgroups of patients with different prognosis. CONCLUSIONS: Genes with high bimodality scores are promising candidates for defining prognostic patient subgroups from expression data. We discuss advantages and disadvantages of the different scores for prognostic purposes. The outlier-sum statistic may be particularly valuable for the identification of genes to be included in prognostic signatures. Among the genes identified as bimodal in the breast cancer data set several have not yet previously been recognized to be prognostic and bimodally expressed in breast cancer. BioMed Central 2010-05-25 /pmc/articles/PMC2892466/ /pubmed/20500820 http://dx.doi.org/10.1186/1471-2105-11-276 Text en Copyright ©2010 Hellwig et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Hellwig, Birte
Hengstler, Jan G
Schmidt, Marcus
Gehrmann, Mathias C
Schormann, Wiebke
Rahnenführer, Jörg
Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
title Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
title_full Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
title_fullStr Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
title_full_unstemmed Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
title_short Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
title_sort comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892466/
https://www.ncbi.nlm.nih.gov/pubmed/20500820
http://dx.doi.org/10.1186/1471-2105-11-276
work_keys_str_mv AT hellwigbirte comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes
AT hengstlerjang comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes
AT schmidtmarcus comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes
AT gehrmannmathiasc comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes
AT schormannwiebke comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes
AT rahnenfuhrerjorg comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes