Cargando…

Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets

BACKGROUND: Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The fi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Czwan, Esteban, Brors, Benedikt, Kipling, David
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824674/ https://www.ncbi.nlm.nih.gov/pubmed/20064243 http://dx.doi.org/10.1186/1471-2105-11-19

_version_	1782177715115261952
author	Czwan, Esteban Brors, Benedikt Kipling, David
author_facet	Czwan, Esteban Brors, Benedikt Kipling, David
author_sort	Czwan, Esteban
collection	PubMed
description	BACKGROUND: Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis. RESULTS: To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer. CONCLUSIONS: Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets.
format	Text
id	pubmed-2824674
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28246742010-02-19 Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets Czwan, Esteban Brors, Benedikt Kipling, David BMC Bioinformatics Methodology article BACKGROUND: Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis. RESULTS: To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer. CONCLUSIONS: Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets. BioMed Central 2010-01-11 /pmc/articles/PMC2824674/ /pubmed/20064243 http://dx.doi.org/10.1186/1471-2105-11-19 Text en Copyright ©2010 Czwan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology article Czwan, Esteban Brors, Benedikt Kipling, David Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title	Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_full	Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_fullStr	Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_full_unstemmed	Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_short	Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
title_sort	modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets
topic	Methodology article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824674/ https://www.ncbi.nlm.nih.gov/pubmed/20064243 http://dx.doi.org/10.1186/1471-2105-11-19
work_keys_str_mv	AT czwanesteban modellingpvaluedistributionstoimprovethemedrivensurvivalanalysisofcancertranscriptomedatasets AT brorsbenedikt modellingpvaluedistributionstoimprovethemedrivensurvivalanalysisofcancertranscriptomedatasets AT kiplingdavid modellingpvaluedistributionstoimprovethemedrivensurvivalanalysisofcancertranscriptomedatasets

Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets

Ejemplares similares