Cargando…
Spectral gene set enrichment (SGSE)
BACKGROUND: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4365810/ https://www.ncbi.nlm.nih.gov/pubmed/25879888 http://dx.doi.org/10.1186/s12859-015-0490-7 |
_version_ | 1782362283983241216 |
---|---|
author | Frost, H Robert Li, Zhigang Moore, Jason H |
author_facet | Frost, H Robert Li, Zhigang Moore, Jason H |
author_sort | Frost, H Robert |
collection | PubMed |
description | BACKGROUND: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. RESULTS: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. CONCLUSIONS: Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0490-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4365810 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43658102015-03-20 Spectral gene set enrichment (SGSE) Frost, H Robert Li, Zhigang Moore, Jason H BMC Bioinformatics Methodology Article BACKGROUND: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. RESULTS: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. CONCLUSIONS: Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0490-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-03 /pmc/articles/PMC4365810/ /pubmed/25879888 http://dx.doi.org/10.1186/s12859-015-0490-7 Text en © Frost et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Frost, H Robert Li, Zhigang Moore, Jason H Spectral gene set enrichment (SGSE) |
title | Spectral gene set enrichment (SGSE) |
title_full | Spectral gene set enrichment (SGSE) |
title_fullStr | Spectral gene set enrichment (SGSE) |
title_full_unstemmed | Spectral gene set enrichment (SGSE) |
title_short | Spectral gene set enrichment (SGSE) |
title_sort | spectral gene set enrichment (sgse) |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4365810/ https://www.ncbi.nlm.nih.gov/pubmed/25879888 http://dx.doi.org/10.1186/s12859-015-0490-7 |
work_keys_str_mv | AT frosthrobert spectralgenesetenrichmentsgse AT lizhigang spectralgenesetenrichmentsgse AT moorejasonh spectralgenesetenrichmentsgse |