Cargando…

Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing

BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justi...

Descripción completa

Detalles Bibliográficos
Autores principales: Yanofsky, Corey M, Bickel, David R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224549/
https://www.ncbi.nlm.nih.gov/pubmed/20109217
http://dx.doi.org/10.1186/1471-2105-11-63
_version_ 1782217407105859584
author Yanofsky, Corey M
Bickel, David R
author_facet Yanofsky, Corey M
Bickel, David R
author_sort Yanofsky, Corey M
collection PubMed
description BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable. RESULTS: Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings. CONCLUSIONS: Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation.
format Online
Article
Text
id pubmed-3224549
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32245492011-11-27 Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing Yanofsky, Corey M Bickel, David R BMC Bioinformatics Methodology Article BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable. RESULTS: Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings. CONCLUSIONS: Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation. BioMed Central 2010-01-28 /pmc/articles/PMC3224549/ /pubmed/20109217 http://dx.doi.org/10.1186/1471-2105-11-63 Text en Copyright ©2010 Yanofsky and Bickel; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Yanofsky, Corey M
Bickel, David R
Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
title Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
title_full Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
title_fullStr Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
title_full_unstemmed Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
title_short Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
title_sort validation of differential gene expression algorithms: application comparing fold-change estimation to hypothesis testing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224549/
https://www.ncbi.nlm.nih.gov/pubmed/20109217
http://dx.doi.org/10.1186/1471-2105-11-63
work_keys_str_mv AT yanofskycoreym validationofdifferentialgeneexpressionalgorithmsapplicationcomparingfoldchangeestimationtohypothesistesting
AT bickeldavidr validationofdifferentialgeneexpressionalgorithmsapplicationcomparingfoldchangeestimationtohypothesistesting