Cargando…

Improving the power of gene set enrichment analyses

BACKGROUND: Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phen...

Descripción completa

Detalles Bibliográficos
Autores principales: Roder, Joanna, Linstid, Benjamin, Oliveira, Carlos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6525372/
https://www.ncbi.nlm.nih.gov/pubmed/31101008
http://dx.doi.org/10.1186/s12859-019-2850-1
_version_ 1783419713994358784
author Roder, Joanna
Linstid, Benjamin
Oliveira, Carlos
author_facet Roder, Joanna
Linstid, Benjamin
Oliveira, Carlos
author_sort Roder, Joanna
collection PubMed
description BACKGROUND: Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phenotype and each molecular attribute. Estimates of the significance of the associations are determined via a null distribution generated from phenotype permutation. We investigate some statistical properties of this method and demonstrate how alternative assessments of enrichment can be used to increase the statistical power of such analyses to detect associations between phenotype and biological processes and pathways. RESULTS: For this category of set enrichment analysis, the null distribution is largely independent of the number of samples with available molecular data. Hence, providing the sample cohort is not too small, we show that increased statistical power to identify associations between biological processes and phenotype can be achieved by splitting the cohort into two halves and using the average of the enrichment scores evaluated for each half as an alternative test statistic. Further, we demonstrate that this principle can be extended by averaging over multiple random splits of the cohort into halves. This enables the calculation of an enrichment statistic and associated p value of arbitrary precision, independent of the exact random splits used. CONCLUSIONS: It is possible to increase the statistical power of gene set enrichment analyses that employ enrichment scores created from running sums of univariate phenotype-attribute correlations and phenotype-permutation generated null distributions. This increase can be achieved by using alternative test statistics that average enrichment scores calculated for splits of the dataset. Apart from the special case of a close balance between up- and down-regulated genes within a gene set, statistical power can be improved, or at least maintained, by this method down to small sample sizes, where accurate assessment of univariate phenotype-gene correlations becomes unfeasible.
format Online
Article
Text
id pubmed-6525372
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65253722019-05-24 Improving the power of gene set enrichment analyses Roder, Joanna Linstid, Benjamin Oliveira, Carlos BMC Bioinformatics Methodology Article BACKGROUND: Set enrichment methods are commonly used to analyze high-dimensional molecular data and gain biological insight into molecular or clinical phenotypes. One important category of analysis methods employs an enrichment score, which is created from ranked univariate correlations between phenotype and each molecular attribute. Estimates of the significance of the associations are determined via a null distribution generated from phenotype permutation. We investigate some statistical properties of this method and demonstrate how alternative assessments of enrichment can be used to increase the statistical power of such analyses to detect associations between phenotype and biological processes and pathways. RESULTS: For this category of set enrichment analysis, the null distribution is largely independent of the number of samples with available molecular data. Hence, providing the sample cohort is not too small, we show that increased statistical power to identify associations between biological processes and phenotype can be achieved by splitting the cohort into two halves and using the average of the enrichment scores evaluated for each half as an alternative test statistic. Further, we demonstrate that this principle can be extended by averaging over multiple random splits of the cohort into halves. This enables the calculation of an enrichment statistic and associated p value of arbitrary precision, independent of the exact random splits used. CONCLUSIONS: It is possible to increase the statistical power of gene set enrichment analyses that employ enrichment scores created from running sums of univariate phenotype-attribute correlations and phenotype-permutation generated null distributions. This increase can be achieved by using alternative test statistics that average enrichment scores calculated for splits of the dataset. Apart from the special case of a close balance between up- and down-regulated genes within a gene set, statistical power can be improved, or at least maintained, by this method down to small sample sizes, where accurate assessment of univariate phenotype-gene correlations becomes unfeasible. BioMed Central 2019-05-17 /pmc/articles/PMC6525372/ /pubmed/31101008 http://dx.doi.org/10.1186/s12859-019-2850-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Roder, Joanna
Linstid, Benjamin
Oliveira, Carlos
Improving the power of gene set enrichment analyses
title Improving the power of gene set enrichment analyses
title_full Improving the power of gene set enrichment analyses
title_fullStr Improving the power of gene set enrichment analyses
title_full_unstemmed Improving the power of gene set enrichment analyses
title_short Improving the power of gene set enrichment analyses
title_sort improving the power of gene set enrichment analyses
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6525372/
https://www.ncbi.nlm.nih.gov/pubmed/31101008
http://dx.doi.org/10.1186/s12859-019-2850-1
work_keys_str_mv AT roderjoanna improvingthepowerofgenesetenrichmentanalyses
AT linstidbenjamin improvingthepowerofgenesetenrichmentanalyses
AT oliveiracarlos improvingthepowerofgenesetenrichmentanalyses