Cargando…
Ranking metrics in gene set enrichment analysis: do they matter?
BACKGROUND: There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5427619/ https://www.ncbi.nlm.nih.gov/pubmed/28499413 http://dx.doi.org/10.1186/s12859-017-1674-0 |
_version_ | 1783235668105756672 |
---|---|
author | Zyla, Joanna Marczyk, Michal Weiner, January Polanska, Joanna |
author_facet | Zyla, Joanna Marczyk, Michal Weiner, January Polanska, Joanna |
author_sort | Zyla, Joanna |
collection | PubMed |
description | BACKGROUND: There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. METHODS AND RESULTS: In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA. CONCLUSIONS: Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1674-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5427619 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54276192017-05-15 Ranking metrics in gene set enrichment analysis: do they matter? Zyla, Joanna Marczyk, Michal Weiner, January Polanska, Joanna BMC Bioinformatics Research Article BACKGROUND: There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. METHODS AND RESULTS: In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA. CONCLUSIONS: Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1674-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-12 /pmc/articles/PMC5427619/ /pubmed/28499413 http://dx.doi.org/10.1186/s12859-017-1674-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Zyla, Joanna Marczyk, Michal Weiner, January Polanska, Joanna Ranking metrics in gene set enrichment analysis: do they matter? |
title | Ranking metrics in gene set enrichment analysis: do they matter? |
title_full | Ranking metrics in gene set enrichment analysis: do they matter? |
title_fullStr | Ranking metrics in gene set enrichment analysis: do they matter? |
title_full_unstemmed | Ranking metrics in gene set enrichment analysis: do they matter? |
title_short | Ranking metrics in gene set enrichment analysis: do they matter? |
title_sort | ranking metrics in gene set enrichment analysis: do they matter? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5427619/ https://www.ncbi.nlm.nih.gov/pubmed/28499413 http://dx.doi.org/10.1186/s12859-017-1674-0 |
work_keys_str_mv | AT zylajoanna rankingmetricsingenesetenrichmentanalysisdotheymatter AT marczykmichal rankingmetricsingenesetenrichmentanalysisdotheymatter AT weinerjanuary rankingmetricsingenesetenrichmentanalysisdotheymatter AT polanskajoanna rankingmetricsingenesetenrichmentanalysisdotheymatter |