Cargando…

Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets

BACKGROUND: Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The fi...

Descripción completa

Detalles Bibliográficos
Autores principales: Czwan, Esteban, Brors, Benedikt, Kipling, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824674/
https://www.ncbi.nlm.nih.gov/pubmed/20064243
http://dx.doi.org/10.1186/1471-2105-11-19
_version_ 1782177715115261952
author Czwan, Esteban
Brors, Benedikt
Kipling, David
author_facet Czwan, Esteban
Brors, Benedikt
Kipling, David
author_sort Czwan, Esteban
collection PubMed
description BACKGROUND: Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis. RESULTS: To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer. CONCLUSIONS: Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets.
format Text
id pubmed-2824674
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28246742010-02-19 Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets Czwan, Esteban Brors, Benedikt Kipling, David BMC Bioinformatics Methodology article BACKGROUND: Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis. RESULTS: To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer. CONCLUSIONS: Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets. BioMed Central 2010-01-11 /pmc/articles/PMC2824674/ /pubmed/20064243 http://dx.doi.org/10.1186/1471-2105-11-19 Text en Copyright ©2010 Czwan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Czwan, Esteban
Brors, Benedikt
Kipling, David
Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_full Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_fullStr Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_full_unstemmed Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_short Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_sort modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824674/
https://www.ncbi.nlm.nih.gov/pubmed/20064243
http://dx.doi.org/10.1186/1471-2105-11-19
work_keys_str_mv AT czwanesteban modellingpvaluedistributionstoimprovethemedrivensurvivalanalysisofcancertranscriptomedatasets
AT brorsbenedikt modellingpvaluedistributionstoimprovethemedrivensurvivalanalysisofcancertranscriptomedatasets
AT kiplingdavid modellingpvaluedistributionstoimprovethemedrivensurvivalanalysisofcancertranscriptomedatasets