Cargando…

Gene set enrichment analysis for non-monotone association and multiple experimental categories

BACKGROUND: Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Rongheng, Dai, Shuangshuang, Irwin, Richard D, Heinloth, Alexandra N, Boorman, Gary A, Li, Leping
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2636811/
https://www.ncbi.nlm.nih.gov/pubmed/19014579
http://dx.doi.org/10.1186/1471-2105-9-481
_version_ 1782164310730997760
author Lin, Rongheng
Dai, Shuangshuang
Irwin, Richard D
Heinloth, Alexandra N
Boorman, Gary A
Li, Leping
author_facet Lin, Rongheng
Dai, Shuangshuang
Irwin, Richard D
Heinloth, Alexandra N
Boorman, Gary A
Li, Leping
author_sort Lin, Rongheng
collection PubMed
description BACKGROUND: Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed. RESULTS: For continuous or ordinal phenotypic outcome, we propose to use as the local statistic the coefficient of multiple determination (i.e., the square of multiple correlation coefficient) R(2 )from fitting natural cubic spline models to the phenotype-expression relationship. Next, we incorporate this association measure into the GSEA/SAFE framework to identify significant gene sets. Unsigned local statistics, signed global statistics and one-sided p-values are used to reflect our inferential interest. Furthermore, we describe a procedure for inference across multiple GSEA/SAFE analyses. We illustrate our approach using gene expression and liver injury data from liver and blood samples from rats treated with eight hepatotoxicants under multiple time and dose combinations. We set out to identify biological pathways/processes associated with liver injury as manifested by increased blood levels of alanine transaminase in common for most of the eight compounds. Potential statistical dependency resulting from the experimental design is addressed in permutation based hypothesis testing. CONCLUSION: The proposed framework captures both linear and non-linear association between gene expression level and a phenotypic endpoint and thus can be viewed as extending the current GSEA/SAFE methodology. The framework for combining results from multiple GSEA/SAFE analyses is flexible to address practical inference interests. Our methods can be applied to microarray data with continuous phenotypes with multi-level design or the meta-analysis of multiple microarray data sets.
format Text
id pubmed-2636811
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26368112009-02-06 Gene set enrichment analysis for non-monotone association and multiple experimental categories Lin, Rongheng Dai, Shuangshuang Irwin, Richard D Heinloth, Alexandra N Boorman, Gary A Li, Leping BMC Bioinformatics Methodology Article BACKGROUND: Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed. RESULTS: For continuous or ordinal phenotypic outcome, we propose to use as the local statistic the coefficient of multiple determination (i.e., the square of multiple correlation coefficient) R(2 )from fitting natural cubic spline models to the phenotype-expression relationship. Next, we incorporate this association measure into the GSEA/SAFE framework to identify significant gene sets. Unsigned local statistics, signed global statistics and one-sided p-values are used to reflect our inferential interest. Furthermore, we describe a procedure for inference across multiple GSEA/SAFE analyses. We illustrate our approach using gene expression and liver injury data from liver and blood samples from rats treated with eight hepatotoxicants under multiple time and dose combinations. We set out to identify biological pathways/processes associated with liver injury as manifested by increased blood levels of alanine transaminase in common for most of the eight compounds. Potential statistical dependency resulting from the experimental design is addressed in permutation based hypothesis testing. CONCLUSION: The proposed framework captures both linear and non-linear association between gene expression level and a phenotypic endpoint and thus can be viewed as extending the current GSEA/SAFE methodology. The framework for combining results from multiple GSEA/SAFE analyses is flexible to address practical inference interests. Our methods can be applied to microarray data with continuous phenotypes with multi-level design or the meta-analysis of multiple microarray data sets. BioMed Central 2008-11-14 /pmc/articles/PMC2636811/ /pubmed/19014579 http://dx.doi.org/10.1186/1471-2105-9-481 Text en Copyright © 2008 Lin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lin, Rongheng
Dai, Shuangshuang
Irwin, Richard D
Heinloth, Alexandra N
Boorman, Gary A
Li, Leping
Gene set enrichment analysis for non-monotone association and multiple experimental categories
title Gene set enrichment analysis for non-monotone association and multiple experimental categories
title_full Gene set enrichment analysis for non-monotone association and multiple experimental categories
title_fullStr Gene set enrichment analysis for non-monotone association and multiple experimental categories
title_full_unstemmed Gene set enrichment analysis for non-monotone association and multiple experimental categories
title_short Gene set enrichment analysis for non-monotone association and multiple experimental categories
title_sort gene set enrichment analysis for non-monotone association and multiple experimental categories
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2636811/
https://www.ncbi.nlm.nih.gov/pubmed/19014579
http://dx.doi.org/10.1186/1471-2105-9-481
work_keys_str_mv AT linrongheng genesetenrichmentanalysisfornonmonotoneassociationandmultipleexperimentalcategories
AT daishuangshuang genesetenrichmentanalysisfornonmonotoneassociationandmultipleexperimentalcategories
AT irwinrichardd genesetenrichmentanalysisfornonmonotoneassociationandmultipleexperimentalcategories
AT heinlothalexandran genesetenrichmentanalysisfornonmonotoneassociationandmultipleexperimentalcategories
AT boormangarya genesetenrichmentanalysisfornonmonotoneassociationandmultipleexperimentalcategories
AT lileping genesetenrichmentanalysisfornonmonotoneassociationandmultipleexperimentalcategories