Cargando…

Spectral gene set enrichment (SGSE)

BACKGROUND: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist...

Descripción completa

Detalles Bibliográficos
Autores principales: Frost, H Robert, Li, Zhigang, Moore, Jason H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4365810/
https://www.ncbi.nlm.nih.gov/pubmed/25879888
http://dx.doi.org/10.1186/s12859-015-0490-7
_version_ 1782362283983241216
author Frost, H Robert
Li, Zhigang
Moore, Jason H
author_facet Frost, H Robert
Li, Zhigang
Moore, Jason H
author_sort Frost, H Robert
collection PubMed
description BACKGROUND: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. RESULTS: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. CONCLUSIONS: Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0490-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4365810
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43658102015-03-20 Spectral gene set enrichment (SGSE) Frost, H Robert Li, Zhigang Moore, Jason H BMC Bioinformatics Methodology Article BACKGROUND: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. RESULTS: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. CONCLUSIONS: Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0490-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-03 /pmc/articles/PMC4365810/ /pubmed/25879888 http://dx.doi.org/10.1186/s12859-015-0490-7 Text en © Frost et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Frost, H Robert
Li, Zhigang
Moore, Jason H
Spectral gene set enrichment (SGSE)
title Spectral gene set enrichment (SGSE)
title_full Spectral gene set enrichment (SGSE)
title_fullStr Spectral gene set enrichment (SGSE)
title_full_unstemmed Spectral gene set enrichment (SGSE)
title_short Spectral gene set enrichment (SGSE)
title_sort spectral gene set enrichment (sgse)
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4365810/
https://www.ncbi.nlm.nih.gov/pubmed/25879888
http://dx.doi.org/10.1186/s12859-015-0490-7
work_keys_str_mv AT frosthrobert spectralgenesetenrichmentsgse
AT lizhigang spectralgenesetenrichmentsgse
AT moorejasonh spectralgenesetenrichmentsgse