Cargando…
Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
BACKGROUND: A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguisha...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892466/ https://www.ncbi.nlm.nih.gov/pubmed/20500820 http://dx.doi.org/10.1186/1471-2105-11-276 |
_version_ | 1782182950794690560 |
---|---|
author | Hellwig, Birte Hengstler, Jan G Schmidt, Marcus Gehrmann, Mathias C Schormann, Wiebke Rahnenführer, Jörg |
author_facet | Hellwig, Birte Hengstler, Jan G Schmidt, Marcus Gehrmann, Mathias C Schormann, Wiebke Rahnenführer, Jörg |
author_sort | Hellwig, Birte |
collection | PubMed |
description | BACKGROUND: A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and low expression groups is easier compared to genes with unimodal expression distributions. Recently, several methods for the identification of genes with bimodal distributions have been introduced. A straightforward approach is to cluster the expression values and score the distance between the two distributions. Other scores directly measure properties of the distribution. The kurtosis, e.g., measures divergence from a normal distribution. An alternative is the outlier-sum statistic that identifies genes with extremely high or low expression values in a subset of the samples. RESULTS: We compare and discuss scores for bimodality for expression data. For the genome-wide identification of bimodal genes we apply all scores to expression data from 194 patients with node-negative breast cancer. Further, we present the first comprehensive genome-wide evaluation of the prognostic relevance of bimodal genes. We first rank genes according to bimodality scores and define two patient subgroups based on expression values. Then we assess the prognostic significance of the top ranking bimodal genes by comparing the survival functions of the two patient subgroups. We also evaluate the global association between the bimodal shape of expression distributions and survival times with an enrichment type analysis. Various cluster-based methods lead to a significant overrepresentation of prognostic genes. A striking result is obtained with the outlier-sum statistic (p < 10(-12)). Many genes with heavy tails generate subgroups of patients with different prognosis. CONCLUSIONS: Genes with high bimodality scores are promising candidates for defining prognostic patient subgroups from expression data. We discuss advantages and disadvantages of the different scores for prognostic purposes. The outlier-sum statistic may be particularly valuable for the identification of genes to be included in prognostic signatures. Among the genes identified as bimodal in the breast cancer data set several have not yet previously been recognized to be prognostic and bimodally expressed in breast cancer. |
format | Text |
id | pubmed-2892466 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28924662010-06-26 Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes Hellwig, Birte Hengstler, Jan G Schmidt, Marcus Gehrmann, Mathias C Schormann, Wiebke Rahnenführer, Jörg BMC Bioinformatics Research article BACKGROUND: A major goal of the analysis of high-dimensional RNA expression data from tumor tissue is to identify prognostic signatures for discriminating patient subgroups. For this purpose genome-wide identification of bimodally expressed genes from gene array data is relevant because distinguishability of high and low expression groups is easier compared to genes with unimodal expression distributions. Recently, several methods for the identification of genes with bimodal distributions have been introduced. A straightforward approach is to cluster the expression values and score the distance between the two distributions. Other scores directly measure properties of the distribution. The kurtosis, e.g., measures divergence from a normal distribution. An alternative is the outlier-sum statistic that identifies genes with extremely high or low expression values in a subset of the samples. RESULTS: We compare and discuss scores for bimodality for expression data. For the genome-wide identification of bimodal genes we apply all scores to expression data from 194 patients with node-negative breast cancer. Further, we present the first comprehensive genome-wide evaluation of the prognostic relevance of bimodal genes. We first rank genes according to bimodality scores and define two patient subgroups based on expression values. Then we assess the prognostic significance of the top ranking bimodal genes by comparing the survival functions of the two patient subgroups. We also evaluate the global association between the bimodal shape of expression distributions and survival times with an enrichment type analysis. Various cluster-based methods lead to a significant overrepresentation of prognostic genes. A striking result is obtained with the outlier-sum statistic (p < 10(-12)). Many genes with heavy tails generate subgroups of patients with different prognosis. CONCLUSIONS: Genes with high bimodality scores are promising candidates for defining prognostic patient subgroups from expression data. We discuss advantages and disadvantages of the different scores for prognostic purposes. The outlier-sum statistic may be particularly valuable for the identification of genes to be included in prognostic signatures. Among the genes identified as bimodal in the breast cancer data set several have not yet previously been recognized to be prognostic and bimodally expressed in breast cancer. BioMed Central 2010-05-25 /pmc/articles/PMC2892466/ /pubmed/20500820 http://dx.doi.org/10.1186/1471-2105-11-276 Text en Copyright ©2010 Hellwig et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research article Hellwig, Birte Hengstler, Jan G Schmidt, Marcus Gehrmann, Mathias C Schormann, Wiebke Rahnenführer, Jörg Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes |
title | Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes |
title_full | Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes |
title_fullStr | Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes |
title_full_unstemmed | Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes |
title_short | Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes |
title_sort | comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes |
topic | Research article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892466/ https://www.ncbi.nlm.nih.gov/pubmed/20500820 http://dx.doi.org/10.1186/1471-2105-11-276 |
work_keys_str_mv | AT hellwigbirte comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes AT hengstlerjang comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes AT schmidtmarcus comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes AT gehrmannmathiasc comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes AT schormannwiebke comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes AT rahnenfuhrerjorg comparisonofscoresforbimodalityofgeneexpressiondistributionsandgenomewideevaluationoftheprognosticrelevanceofhighscoringgenes |