Cargando…

Moment based gene set tests

BACKGROUND: Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tes...

Descripción completa

Detalles Bibliográficos
Autores principales: Larson, Jessica L, Owen, Art B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4419444/
https://www.ncbi.nlm.nih.gov/pubmed/25928861
http://dx.doi.org/10.1186/s12859-015-0571-7
_version_ 1782369574632554496
author Larson, Jessica L
Owen, Art B
author_facet Larson, Jessica L
Owen, Art B
author_sort Larson, Jessica L
collection PubMed
description BACKGROUND: Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests. RESULTS: We study two gene set methods based on sums and sums of squared correlations. The statistics we study are among the best performers in the extensive simulation of 261 gene set methods by Ackermann and Strimmer in 2009. Our approach calculates exact relevant moments of these statistics and uses them to fit parametric distributions. The computational cost of our algorithm for the linear case is on the order of doing |G| permutations, where |G| is the number of genes in set G. For the quadratic statistics, the cost is on the order of |G|(2) permutations which can still be orders of magnitude faster than plain permutation sampling. We applied the permutation approximation method to three public Parkinson’s Disease expression datasets and discovered enriched gene sets not previously discussed. We found that the moment-based gene set enrichment p-values closely approximate the permutation method p-values at a tiny fraction of their cost. They also gave nearly identical rankings to the gene sets being compared. CONCLUSIONS: We have developed a moment based approximation to linear and quadratic gene set test statistics’ permutation distribution. This allows approximate testing to be done orders of magnitude faster than one could do by sampling permutations. We have implemented our method as a publicly available Bioconductor package, npGSEA (www.bioconductor.org). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0571-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4419444
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44194442015-05-06 Moment based gene set tests Larson, Jessica L Owen, Art B BMC Bioinformatics Methodology Article BACKGROUND: Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests. RESULTS: We study two gene set methods based on sums and sums of squared correlations. The statistics we study are among the best performers in the extensive simulation of 261 gene set methods by Ackermann and Strimmer in 2009. Our approach calculates exact relevant moments of these statistics and uses them to fit parametric distributions. The computational cost of our algorithm for the linear case is on the order of doing |G| permutations, where |G| is the number of genes in set G. For the quadratic statistics, the cost is on the order of |G|(2) permutations which can still be orders of magnitude faster than plain permutation sampling. We applied the permutation approximation method to three public Parkinson’s Disease expression datasets and discovered enriched gene sets not previously discussed. We found that the moment-based gene set enrichment p-values closely approximate the permutation method p-values at a tiny fraction of their cost. They also gave nearly identical rankings to the gene sets being compared. CONCLUSIONS: We have developed a moment based approximation to linear and quadratic gene set test statistics’ permutation distribution. This allows approximate testing to be done orders of magnitude faster than one could do by sampling permutations. We have implemented our method as a publicly available Bioconductor package, npGSEA (www.bioconductor.org). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0571-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-28 /pmc/articles/PMC4419444/ /pubmed/25928861 http://dx.doi.org/10.1186/s12859-015-0571-7 Text en © Larson and Owen; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Larson, Jessica L
Owen, Art B
Moment based gene set tests
title Moment based gene set tests
title_full Moment based gene set tests
title_fullStr Moment based gene set tests
title_full_unstemmed Moment based gene set tests
title_short Moment based gene set tests
title_sort moment based gene set tests
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4419444/
https://www.ncbi.nlm.nih.gov/pubmed/25928861
http://dx.doi.org/10.1186/s12859-015-0571-7
work_keys_str_mv AT larsonjessical momentbasedgenesettests
AT owenartb momentbasedgenesettests