Cargando…
Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing
BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224549/ https://www.ncbi.nlm.nih.gov/pubmed/20109217 http://dx.doi.org/10.1186/1471-2105-11-63 |
_version_ | 1782217407105859584 |
---|---|
author | Yanofsky, Corey M Bickel, David R |
author_facet | Yanofsky, Corey M Bickel, David R |
author_sort | Yanofsky, Corey M |
collection | PubMed |
description | BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable. RESULTS: Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings. CONCLUSIONS: Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation. |
format | Online Article Text |
id | pubmed-3224549 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32245492011-11-27 Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing Yanofsky, Corey M Bickel, David R BMC Bioinformatics Methodology Article BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable. RESULTS: Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings. CONCLUSIONS: Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups. According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation. BioMed Central 2010-01-28 /pmc/articles/PMC3224549/ /pubmed/20109217 http://dx.doi.org/10.1186/1471-2105-11-63 Text en Copyright ©2010 Yanofsky and Bickel; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Yanofsky, Corey M Bickel, David R Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing |
title | Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing |
title_full | Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing |
title_fullStr | Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing |
title_full_unstemmed | Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing |
title_short | Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing |
title_sort | validation of differential gene expression algorithms: application comparing fold-change estimation to hypothesis testing |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224549/ https://www.ncbi.nlm.nih.gov/pubmed/20109217 http://dx.doi.org/10.1186/1471-2105-11-63 |
work_keys_str_mv | AT yanofskycoreym validationofdifferentialgeneexpressionalgorithmsapplicationcomparingfoldchangeestimationtohypothesistesting AT bickeldavidr validationofdifferentialgeneexpressionalgorithmsapplicationcomparingfoldchangeestimationtohypothesistesting |