Cargando…

Appearance frequency modulated gene set enrichment testing

BACKGROUND: Gene set enrichment testing has helped bridge the gap from an individual gene to a systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, current methods for gene set enrichment testing treat all genes equal. It is well-k...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Jun, Sartor, Maureen A, Jagadish, HV
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3213687/
https://www.ncbi.nlm.nih.gov/pubmed/21418606
http://dx.doi.org/10.1186/1471-2105-12-81
_version_ 1782216171653693440
author Ma, Jun
Sartor, Maureen A
Jagadish, HV
author_facet Ma, Jun
Sartor, Maureen A
Jagadish, HV
author_sort Ma, Jun
collection PubMed
description BACKGROUND: Gene set enrichment testing has helped bridge the gap from an individual gene to a systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, current methods for gene set enrichment testing treat all genes equal. It is well-known that some genes, such as those responsible for housekeeping functions, appear in many pathways, whereas other genes are more specialized and play a unique role in a single pathway. Drawing inspiration from the field of information retrieval, we have developed and present here an approach to incorporate gene appearance frequency (in KEGG pathways) into two current methods, Gene Set Enrichment Analysis (GSEA) and logistic regression-based LRpath framework, to generate more reproducible and biologically meaningful results. RESULTS: Two breast cancer microarray datasets were analyzed to identify gene sets differentially expressed between histological grade 1 and 3 breast cancer. The correlation of Normalized Enrichment Scores (NES) between gene sets, generated by the original GSEA and GSEA with the appearance frequency of genes incorporated (GSEA-AF), was compared. GSEA-AF resulted in higher correlation between experiments and more overlapping top gene sets. Several cancer related gene sets achieved higher NES in GSEA-AF as well. The same datasets were also analyzed by LRpath and LRpath with the appearance frequency of genes incorporated (LRpath-AF). Two well-studied lung cancer datasets were also analyzed in the same manner to demonstrate the validity of the method, and similar results were obtained. CONCLUSIONS: We introduce an alternative way to integrate KEGG PATHWAY information into gene set enrichment testing. The performance of GSEA and LRpath can be enhanced with the integration of appearance frequency of genes. We conclude that, generally, gene set analysis methods with the integration of information from KEGG PATHWAY performs better both statistically and biologically.
format Online
Article
Text
id pubmed-3213687
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32136872011-11-12 Appearance frequency modulated gene set enrichment testing Ma, Jun Sartor, Maureen A Jagadish, HV BMC Bioinformatics Research Article BACKGROUND: Gene set enrichment testing has helped bridge the gap from an individual gene to a systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, current methods for gene set enrichment testing treat all genes equal. It is well-known that some genes, such as those responsible for housekeeping functions, appear in many pathways, whereas other genes are more specialized and play a unique role in a single pathway. Drawing inspiration from the field of information retrieval, we have developed and present here an approach to incorporate gene appearance frequency (in KEGG pathways) into two current methods, Gene Set Enrichment Analysis (GSEA) and logistic regression-based LRpath framework, to generate more reproducible and biologically meaningful results. RESULTS: Two breast cancer microarray datasets were analyzed to identify gene sets differentially expressed between histological grade 1 and 3 breast cancer. The correlation of Normalized Enrichment Scores (NES) between gene sets, generated by the original GSEA and GSEA with the appearance frequency of genes incorporated (GSEA-AF), was compared. GSEA-AF resulted in higher correlation between experiments and more overlapping top gene sets. Several cancer related gene sets achieved higher NES in GSEA-AF as well. The same datasets were also analyzed by LRpath and LRpath with the appearance frequency of genes incorporated (LRpath-AF). Two well-studied lung cancer datasets were also analyzed in the same manner to demonstrate the validity of the method, and similar results were obtained. CONCLUSIONS: We introduce an alternative way to integrate KEGG PATHWAY information into gene set enrichment testing. The performance of GSEA and LRpath can be enhanced with the integration of appearance frequency of genes. We conclude that, generally, gene set analysis methods with the integration of information from KEGG PATHWAY performs better both statistically and biologically. BioMed Central 2011-03-20 /pmc/articles/PMC3213687/ /pubmed/21418606 http://dx.doi.org/10.1186/1471-2105-12-81 Text en Copyright ©2011 Ma et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ma, Jun
Sartor, Maureen A
Jagadish, HV
Appearance frequency modulated gene set enrichment testing
title Appearance frequency modulated gene set enrichment testing
title_full Appearance frequency modulated gene set enrichment testing
title_fullStr Appearance frequency modulated gene set enrichment testing
title_full_unstemmed Appearance frequency modulated gene set enrichment testing
title_short Appearance frequency modulated gene set enrichment testing
title_sort appearance frequency modulated gene set enrichment testing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3213687/
https://www.ncbi.nlm.nih.gov/pubmed/21418606
http://dx.doi.org/10.1186/1471-2105-12-81
work_keys_str_mv AT majun appearancefrequencymodulatedgenesetenrichmenttesting
AT sartormaureena appearancefrequencymodulatedgenesetenrichmenttesting
AT jagadishhv appearancefrequencymodulatedgenesetenrichmenttesting