Cargando…

Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies

High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alterna...

Descripción completa

Detalles Bibliográficos
Autores principales: Jeanmougin, Marine, de Reynies, Aurelien, Marisa, Laetitia, Paccard, Caroline, Nuel, Gregory, Guedj, Mickael
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933223/
https://www.ncbi.nlm.nih.gov/pubmed/20838429
http://dx.doi.org/10.1371/journal.pone.0012336
_version_ 1782186121842655232
author Jeanmougin, Marine
de Reynies, Aurelien
Marisa, Laetitia
Paccard, Caroline
Nuel, Gregory
Guedj, Mickael
author_facet Jeanmougin, Marine
de Reynies, Aurelien
Marisa, Laetitia
Paccard, Caroline
Nuel, Gregory
Guedj, Mickael
author_sort Jeanmougin, Marine
collection PubMed
description High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.
format Text
id pubmed-2933223
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29332232010-09-13 Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies Jeanmougin, Marine de Reynies, Aurelien Marisa, Laetitia Paccard, Caroline Nuel, Gregory Guedj, Mickael PLoS One Research Article High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data. Public Library of Science 2010-09-03 /pmc/articles/PMC2933223/ /pubmed/20838429 http://dx.doi.org/10.1371/journal.pone.0012336 Text en Jeanmougin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Jeanmougin, Marine
de Reynies, Aurelien
Marisa, Laetitia
Paccard, Caroline
Nuel, Gregory
Guedj, Mickael
Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
title Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
title_full Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
title_fullStr Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
title_full_unstemmed Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
title_short Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
title_sort should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933223/
https://www.ncbi.nlm.nih.gov/pubmed/20838429
http://dx.doi.org/10.1371/journal.pone.0012336
work_keys_str_mv AT jeanmouginmarine shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies
AT dereyniesaurelien shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies
AT marisalaetitia shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies
AT paccardcaroline shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies
AT nuelgregory shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies
AT guedjmickael shouldweabandonthettestintheanalysisofgeneexpressionmicroarraydataacomparisonofvariancemodelingstrategies