Cargando…

Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential

In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the...

Descripción completa

Detalles Bibliográficos
Autores principales: Tripathi, Shailesh, Glazko, Galina V., Emmert-Streib, Frank
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3627569/
https://www.ncbi.nlm.nih.gov/pubmed/23389952
http://dx.doi.org/10.1093/nar/gkt054
_version_ 1782266319229419520
author Tripathi, Shailesh
Glazko, Galina V.
Emmert-Streib, Frank
author_facet Tripathi, Shailesh
Glazko, Galina V.
Emmert-Streib, Frank
author_sort Tripathi, Shailesh
collection PubMed
description In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.
format Online
Article
Text
id pubmed-3627569
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36275692013-04-17 Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential Tripathi, Shailesh Glazko, Galina V. Emmert-Streib, Frank Nucleic Acids Res Methods Online In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets. Oxford University Press 2013-04 2013-02-05 /pmc/articles/PMC3627569/ /pubmed/23389952 http://dx.doi.org/10.1093/nar/gkt054 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Tripathi, Shailesh
Glazko, Galina V.
Emmert-Streib, Frank
Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential
title Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential
title_full Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential
title_fullStr Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential
title_full_unstemmed Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential
title_short Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential
title_sort ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3627569/
https://www.ncbi.nlm.nih.gov/pubmed/23389952
http://dx.doi.org/10.1093/nar/gkt054
work_keys_str_mv AT tripathishailesh ensuringthestatisticalsoundnessofcompetitivegenesetapproachesgenefilteringandgenomescalecoverageareessential
AT glazkogalinav ensuringthestatisticalsoundnessofcompetitivegenesetapproachesgenefilteringandgenomescalecoverageareessential
AT emmertstreibfrank ensuringthestatisticalsoundnessofcompetitivegenesetapproachesgenefilteringandgenomescalecoverageareessential