Cargando…
Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alterna...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933223/ https://www.ncbi.nlm.nih.gov/pubmed/20838429 http://dx.doi.org/10.1371/journal.pone.0012336 |
_version_ | 1782186121842655232 |
---|---|
author | Jeanmougin, Marine de Reynies, Aurelien Marisa, Laetitia Paccard, Caroline Nuel, Gregory Guedj, Mickael |
author_facet | Jeanmougin, Marine de Reynies, Aurelien Marisa, Laetitia Paccard, Caroline Nuel, Gregory Guedj, Mickael |
author_sort | Jeanmougin, Marine |
collection | PubMed |
description | High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data. |
format | Text |
id | pubmed-2933223 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-29332232010-09-13 Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies Jeanmougin, Marine de Reynies, Aurelien Marisa, Laetitia Paccard, Caroline Nuel, Gregory Guedj, Mickael PLoS One Research Article High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data. Public Library of Science 2010-09-03 /pmc/articles/PMC2933223/ /pubmed/20838429 http://dx.doi.org/10.1371/journal.pone.0012336 Text en Jeanmougin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Jeanmougin, Marine de Reynies, Aurelien Marisa, Laetitia Paccard, Caroline Nuel, Gregory Guedj, Mickael Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies |
title | Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies |
title_full | Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies |
title_fullStr | Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies |
title_full_unstemmed | Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies |
title_short | Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies |
title_sort | should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933223/ https://www.ncbi.nlm.nih.gov/pubmed/20838429 http://dx.doi.org/10.1371/journal.pone.0012336 |
work_keys_str_mv | AT jeanmouginmarine shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies AT dereyniesaurelien shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies AT marisalaetitia shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies AT paccardcaroline shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies AT nuelgregory shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies AT guedjmickael shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies |