Cargando…

Improving gene set analysis of microarray data by SAM-GS

BACKGROUND: Gene-set analysis evaluates the expression of biological pathways, or a priori defined gene sets, rather than that of individual genes, in association with a binary phenotype, and is of great biologic interest in many DNA microarray studies. Gene Set Enrichment Analysis (GSEA) has been a...

Descripción completa

Detalles Bibliográficos
Autores principales: Dinu, Irina, Potter, John D, Mueller, Thomas, Liu, Qi, Adewale, Adeniyi J, Jhangri, Gian S, Einecke, Gunilla, Famulski, Konrad S, Halloran, Philip, Yasui, Yutaka
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1931607/
https://www.ncbi.nlm.nih.gov/pubmed/17612399
http://dx.doi.org/10.1186/1471-2105-8-242
_version_ 1782134288117923840
author Dinu, Irina
Potter, John D
Mueller, Thomas
Liu, Qi
Adewale, Adeniyi J
Jhangri, Gian S
Einecke, Gunilla
Famulski, Konrad S
Halloran, Philip
Yasui, Yutaka
author_facet Dinu, Irina
Potter, John D
Mueller, Thomas
Liu, Qi
Adewale, Adeniyi J
Jhangri, Gian S
Einecke, Gunilla
Famulski, Konrad S
Halloran, Philip
Yasui, Yutaka
author_sort Dinu, Irina
collection PubMed
description BACKGROUND: Gene-set analysis evaluates the expression of biological pathways, or a priori defined gene sets, rather than that of individual genes, in association with a binary phenotype, and is of great biologic interest in many DNA microarray studies. Gene Set Enrichment Analysis (GSEA) has been applied widely as a tool for gene-set analyses. We describe here some critical problems with GSEA and propose an alternative method by extending the individual-gene analysis method, Significance Analysis of Microarray (SAM), to gene-set analyses (SAM-GS). RESULTS: Using a mouse microarray dataset with simulated gene sets, we illustrate that GSEA gives statistical significance to gene sets that have no gene associated with the phenotype (null gene sets), and has very low power to detect gene sets in which half the genes are moderately or strongly associated with the phenotype (truly-associated gene sets). SAM-GS, on the other hand, performs very well. The two methods are also compared in the analyses of three real microarray datasets and relevant pathways, the diverging results of which clearly show advantages of SAM-GS over GSEA, both statistically and biologically. In a microarray study for identifying biological pathways whose gene expressions are associated with p53 mutation in cancer cell lines, we found biologically relevant performance differences between the two methods. Specifically, there are 31 additional pathways identified as significant by SAM-GS over GSEA, that are associated with the presence vs. absence of p53. Of the 31 gene sets, 11 actually involve p53 directly as a member. A further 6 gene sets directly involve the extrinsic and intrinsic apoptosis pathways, 3 involve the cell-cycle machinery, and 3 involve cytokines and/or JAK/STAT signaling. Each of these 12 gene sets, then, is in a direct, well-established relationship with aspects of p53 signaling. Of the remaining 8 gene sets, 6 have plausible, if less well established, links with p53. CONCLUSION: We conclude that GSEA has important limitations as a gene-set analysis approach for microarray experiments for identifying biological pathways associated with a binary phenotype. As an alternative statistically-sound method, we propose SAM-GS. A free Excel Add-In for performing SAM-GS is available for public use.
format Text
id pubmed-1931607
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19316072007-07-25 Improving gene set analysis of microarray data by SAM-GS Dinu, Irina Potter, John D Mueller, Thomas Liu, Qi Adewale, Adeniyi J Jhangri, Gian S Einecke, Gunilla Famulski, Konrad S Halloran, Philip Yasui, Yutaka BMC Bioinformatics Methodology Article BACKGROUND: Gene-set analysis evaluates the expression of biological pathways, or a priori defined gene sets, rather than that of individual genes, in association with a binary phenotype, and is of great biologic interest in many DNA microarray studies. Gene Set Enrichment Analysis (GSEA) has been applied widely as a tool for gene-set analyses. We describe here some critical problems with GSEA and propose an alternative method by extending the individual-gene analysis method, Significance Analysis of Microarray (SAM), to gene-set analyses (SAM-GS). RESULTS: Using a mouse microarray dataset with simulated gene sets, we illustrate that GSEA gives statistical significance to gene sets that have no gene associated with the phenotype (null gene sets), and has very low power to detect gene sets in which half the genes are moderately or strongly associated with the phenotype (truly-associated gene sets). SAM-GS, on the other hand, performs very well. The two methods are also compared in the analyses of three real microarray datasets and relevant pathways, the diverging results of which clearly show advantages of SAM-GS over GSEA, both statistically and biologically. In a microarray study for identifying biological pathways whose gene expressions are associated with p53 mutation in cancer cell lines, we found biologically relevant performance differences between the two methods. Specifically, there are 31 additional pathways identified as significant by SAM-GS over GSEA, that are associated with the presence vs. absence of p53. Of the 31 gene sets, 11 actually involve p53 directly as a member. A further 6 gene sets directly involve the extrinsic and intrinsic apoptosis pathways, 3 involve the cell-cycle machinery, and 3 involve cytokines and/or JAK/STAT signaling. Each of these 12 gene sets, then, is in a direct, well-established relationship with aspects of p53 signaling. Of the remaining 8 gene sets, 6 have plausible, if less well established, links with p53. CONCLUSION: We conclude that GSEA has important limitations as a gene-set analysis approach for microarray experiments for identifying biological pathways associated with a binary phenotype. As an alternative statistically-sound method, we propose SAM-GS. A free Excel Add-In for performing SAM-GS is available for public use. BioMed Central 2007-07-05 /pmc/articles/PMC1931607/ /pubmed/17612399 http://dx.doi.org/10.1186/1471-2105-8-242 Text en Copyright © 2007 Dinu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Dinu, Irina
Potter, John D
Mueller, Thomas
Liu, Qi
Adewale, Adeniyi J
Jhangri, Gian S
Einecke, Gunilla
Famulski, Konrad S
Halloran, Philip
Yasui, Yutaka
Improving gene set analysis of microarray data by SAM-GS
title Improving gene set analysis of microarray data by SAM-GS
title_full Improving gene set analysis of microarray data by SAM-GS
title_fullStr Improving gene set analysis of microarray data by SAM-GS
title_full_unstemmed Improving gene set analysis of microarray data by SAM-GS
title_short Improving gene set analysis of microarray data by SAM-GS
title_sort improving gene set analysis of microarray data by sam-gs
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1931607/
https://www.ncbi.nlm.nih.gov/pubmed/17612399
http://dx.doi.org/10.1186/1471-2105-8-242
work_keys_str_mv AT dinuirina improvinggenesetanalysisofmicroarraydatabysamgs
AT potterjohnd improvinggenesetanalysisofmicroarraydatabysamgs
AT muellerthomas improvinggenesetanalysisofmicroarraydatabysamgs
AT liuqi improvinggenesetanalysisofmicroarraydatabysamgs
AT adewaleadeniyij improvinggenesetanalysisofmicroarraydatabysamgs
AT jhangrigians improvinggenesetanalysisofmicroarraydatabysamgs
AT eineckegunilla improvinggenesetanalysisofmicroarraydatabysamgs
AT famulskikonrads improvinggenesetanalysisofmicroarraydatabysamgs
AT halloranphilip improvinggenesetanalysisofmicroarraydatabysamgs
AT yasuiyutaka improvinggenesetanalysisofmicroarraydatabysamgs