Cargando…

Comparative study of gene set enrichment methods

BACKGROUND: The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a...

Descripción completa

Detalles Bibliográficos
Autores principales: Abatangelo, Luca, Maglietta, Rosalia, Distaso, Angela, D'Addabbo, Annarita, Creanza, Teresa Maria, Mukherjee, Sayan, Ancona, Nicola
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746222/
https://www.ncbi.nlm.nih.gov/pubmed/19725948
http://dx.doi.org/10.1186/1471-2105-10-275
_version_ 1782172026122797056
author Abatangelo, Luca
Maglietta, Rosalia
Distaso, Angela
D'Addabbo, Annarita
Creanza, Teresa Maria
Mukherjee, Sayan
Ancona, Nicola
author_facet Abatangelo, Luca
Maglietta, Rosalia
Distaso, Angela
D'Addabbo, Annarita
Creanza, Teresa Maria
Mukherjee, Sayan
Ancona, Nicola
author_sort Abatangelo, Luca
collection PubMed
description BACKGROUND: The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited. RESULTS: The simulation study highlights that none of the three method outperforms all others consistently. GSEA and RS are able to detect weak signals of deregulation and they perform differently when genes in a gene set are both differentially up and down regulated. GLAPA is more conservative and large differences between the two phenotypes are required to allow the method to detect differential deregulation in gene sets. This is due to the fact that the enrichment statistic in GLAPA is prediction error which is a stronger criteria than classical two sample statistic as used in RS and GSEA. This was reflected in the analysis on real data sets as GSEA and RS were seen to be significant for particular gene sets while GLAPA was not, suggesting a small effect size. We find that the rank of gene set enrichment induced by GLAPA is more similar to RS than GSEA. More importantly, the rankings of the three methods share significant overlap. CONCLUSION: The three methods considered in our study recover relevant gene sets known to be deregulated in the experimental conditions and pathologies analyzed. There are differences between the three methods and GSEA seems to be more consistent in finding enriched gene sets, although no method uniformly dominates over all data sets. Our analysis highlights the deep difference existing between associative and predictive methods for detecting enrichment and the use of both to better interpret results of pathway analysis. We close with suggestions for users of gene set methods.
format Text
id pubmed-2746222
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27462222009-09-18 Comparative study of gene set enrichment methods Abatangelo, Luca Maglietta, Rosalia Distaso, Angela D'Addabbo, Annarita Creanza, Teresa Maria Mukherjee, Sayan Ancona, Nicola BMC Bioinformatics Research Article BACKGROUND: The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited. RESULTS: The simulation study highlights that none of the three method outperforms all others consistently. GSEA and RS are able to detect weak signals of deregulation and they perform differently when genes in a gene set are both differentially up and down regulated. GLAPA is more conservative and large differences between the two phenotypes are required to allow the method to detect differential deregulation in gene sets. This is due to the fact that the enrichment statistic in GLAPA is prediction error which is a stronger criteria than classical two sample statistic as used in RS and GSEA. This was reflected in the analysis on real data sets as GSEA and RS were seen to be significant for particular gene sets while GLAPA was not, suggesting a small effect size. We find that the rank of gene set enrichment induced by GLAPA is more similar to RS than GSEA. More importantly, the rankings of the three methods share significant overlap. CONCLUSION: The three methods considered in our study recover relevant gene sets known to be deregulated in the experimental conditions and pathologies analyzed. There are differences between the three methods and GSEA seems to be more consistent in finding enriched gene sets, although no method uniformly dominates over all data sets. Our analysis highlights the deep difference existing between associative and predictive methods for detecting enrichment and the use of both to better interpret results of pathway analysis. We close with suggestions for users of gene set methods. BioMed Central 2009-09-02 /pmc/articles/PMC2746222/ /pubmed/19725948 http://dx.doi.org/10.1186/1471-2105-10-275 Text en Copyright © 2009 Abatangelo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Abatangelo, Luca
Maglietta, Rosalia
Distaso, Angela
D'Addabbo, Annarita
Creanza, Teresa Maria
Mukherjee, Sayan
Ancona, Nicola
Comparative study of gene set enrichment methods
title Comparative study of gene set enrichment methods
title_full Comparative study of gene set enrichment methods
title_fullStr Comparative study of gene set enrichment methods
title_full_unstemmed Comparative study of gene set enrichment methods
title_short Comparative study of gene set enrichment methods
title_sort comparative study of gene set enrichment methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2746222/
https://www.ncbi.nlm.nih.gov/pubmed/19725948
http://dx.doi.org/10.1186/1471-2105-10-275
work_keys_str_mv AT abatangeloluca comparativestudyofgenesetenrichmentmethods
AT magliettarosalia comparativestudyofgenesetenrichmentmethods
AT distasoangela comparativestudyofgenesetenrichmentmethods
AT daddabboannarita comparativestudyofgenesetenrichmentmethods
AT creanzateresamaria comparativestudyofgenesetenrichmentmethods
AT mukherjeesayan comparativestudyofgenesetenrichmentmethods
AT anconanicola comparativestudyofgenesetenrichmentmethods