Cargando…

Comparative evaluation of gene-set analysis methods

BACKGROUND: Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Qi, Dinu, Irina, Adewale, Adeniyi J, Potter, John D, Yasui, Yutaka
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238724/ https://www.ncbi.nlm.nih.gov/pubmed/17988400 http://dx.doi.org/10.1186/1471-2105-8-431

_version_	1782150450380800000
author	Liu, Qi Dinu, Irina Adewale, Adeniyi J Potter, John D Yasui, Yutaka
author_facet	Liu, Qi Dinu, Irina Adewale, Adeniyi J Potter, John D Yasui, Yutaka
author_sort	Liu, Qi
collection	PubMed
description	BACKGROUND: Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Test, ANCOVA Global Test, and SAM-GS, that test "self-contained null hypotheses" Via. subject sampling. The three methods were compared based on a simulation experiment and analyses of three real-world microarray datasets. RESULTS: In the simulation experiment, we found that the use of the asymptotic distribution in the two Global Tests leads to a statistical test with an incorrect size. Specifically, p-values calculated by the scaled χ(2 )distribution of Global Test and the asymptotic distribution of ANCOVA Global Test are too liberal, while the asymptotic distribution with a quadratic form of the Global Test results in p-values that are too conservative. The two Global Tests with permutation-based inference, however, gave a correct size. While the three methods showed similar power using permutation inference after a proper standardization of gene expression data, SAM-GS showed slightly higher power than the Global Tests. In the analysis of a real-world microarray dataset, the two Global Tests gave markedly different results, compared to SAM-GS, in identifying pathways whose gene expressions are associated with p53 mutation in cancer cell lines. A proper standardization of gene expression variances is necessary for the two Global Tests in order to produce biologically sensible results. After the standardization, the three methods gave very similar biologically-sensible results, with slightly higher statistical significance given by SAM-GS. The three methods gave similar patterns of results in the analysis of the other two microarray datasets. CONCLUSION: An appropriate standardization makes the performance of all three methods similar, given the use of permutation-based inference. SAM-GS tends to have slightly higher power in the lower α-level region (i.e. gene sets that are of the greatest interest). Global Test and ANCOVA Global Test have the important advantage of being able to analyze continuous and survival phenotypes and to adjust for covariates. A free Microsoft Excel Add-In to perform SAM-GS is available from .
format	Text
id	pubmed-2238724
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-22387242008-02-12 Comparative evaluation of gene-set analysis methods Liu, Qi Dinu, Irina Adewale, Adeniyi J Potter, John D Yasui, Yutaka BMC Bioinformatics Methodology Article BACKGROUND: Multiple data-analytic methods have been proposed for evaluating gene-expression levels in specific biological pathways, assessing differential expression associated with a binary phenotype. Following Goeman and Bühlmann's recent review, we compared statistical performance of three methods, namely Global Test, ANCOVA Global Test, and SAM-GS, that test "self-contained null hypotheses" Via. subject sampling. The three methods were compared based on a simulation experiment and analyses of three real-world microarray datasets. RESULTS: In the simulation experiment, we found that the use of the asymptotic distribution in the two Global Tests leads to a statistical test with an incorrect size. Specifically, p-values calculated by the scaled χ(2 )distribution of Global Test and the asymptotic distribution of ANCOVA Global Test are too liberal, while the asymptotic distribution with a quadratic form of the Global Test results in p-values that are too conservative. The two Global Tests with permutation-based inference, however, gave a correct size. While the three methods showed similar power using permutation inference after a proper standardization of gene expression data, SAM-GS showed slightly higher power than the Global Tests. In the analysis of a real-world microarray dataset, the two Global Tests gave markedly different results, compared to SAM-GS, in identifying pathways whose gene expressions are associated with p53 mutation in cancer cell lines. A proper standardization of gene expression variances is necessary for the two Global Tests in order to produce biologically sensible results. After the standardization, the three methods gave very similar biologically-sensible results, with slightly higher statistical significance given by SAM-GS. The three methods gave similar patterns of results in the analysis of the other two microarray datasets. CONCLUSION: An appropriate standardization makes the performance of all three methods similar, given the use of permutation-based inference. SAM-GS tends to have slightly higher power in the lower α-level region (i.e. gene sets that are of the greatest interest). Global Test and ANCOVA Global Test have the important advantage of being able to analyze continuous and survival phenotypes and to adjust for covariates. A free Microsoft Excel Add-In to perform SAM-GS is available from . BioMed Central 2007-11-07 /pmc/articles/PMC2238724/ /pubmed/17988400 http://dx.doi.org/10.1186/1471-2105-8-431 Text en Copyright © 2007 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Liu, Qi Dinu, Irina Adewale, Adeniyi J Potter, John D Yasui, Yutaka Comparative evaluation of gene-set analysis methods
title	Comparative evaluation of gene-set analysis methods
title_full	Comparative evaluation of gene-set analysis methods
title_fullStr	Comparative evaluation of gene-set analysis methods
title_full_unstemmed	Comparative evaluation of gene-set analysis methods
title_short	Comparative evaluation of gene-set analysis methods
title_sort	comparative evaluation of gene-set analysis methods
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238724/ https://www.ncbi.nlm.nih.gov/pubmed/17988400 http://dx.doi.org/10.1186/1471-2105-8-431
work_keys_str_mv	AT liuqi comparativeevaluationofgenesetanalysismethods AT dinuirina comparativeevaluationofgenesetanalysismethods AT adewaleadeniyij comparativeevaluationofgenesetanalysismethods AT potterjohnd comparativeevaluationofgenesetanalysismethods AT yasuiyutaka comparativeevaluationofgenesetanalysismethods

Comparative evaluation of gene-set analysis methods

Ejemplares similares