Cargando…

Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate

BACKGROUND: In investigating differentially expressed genes or other selected features, researchers conduct hypothesis tests to determine which biological categories, such as those of the Gene Ontology (GO), are enriched for the selected features. Multiple comparison procedures (MCPs) are commonly u...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Zhenyu, Li, Zuojing, Bickel, David R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3658916/ https://www.ncbi.nlm.nih.gov/pubmed/23497228 http://dx.doi.org/10.1186/1471-2105-14-87

_version_	1782270361822298112
author	Yang, Zhenyu Li, Zuojing Bickel, David R
author_facet	Yang, Zhenyu Li, Zuojing Bickel, David R
author_sort	Yang, Zhenyu
collection	PubMed
description	BACKGROUND: In investigating differentially expressed genes or other selected features, researchers conduct hypothesis tests to determine which biological categories, such as those of the Gene Ontology (GO), are enriched for the selected features. Multiple comparison procedures (MCPs) are commonly used to prevent excessive false positive rates. Traditional MCPs, e.g., the Bonferroni method, go to the opposite extreme: strictly controlling a family-wise error rate, resulting in excessive false negative rates. Researchers generally prefer the more balanced approach of instead controlling the false discovery rate (FDR). However, the q-values that methods of FDR control assign to biological categories tend to be too low to reliably estimate the probability that a biological category is not enriched for the preselected features. Thus, we study an application of the other estimators of that probability, which is called the local FDR (LFDR). RESULTS: We considered five LFDR estimators for detecting enriched GO terms: a binomial-based estimator (BBE), a maximum likelihood estimator (MLE), a normalized MLE (NMLE), a histogram-based estimator assuming a theoretical null hypothesis (HBE), and a histogram-based estimator assuming an empirical null hypothesis (HBE-EN). Since NMLE depends not only on the data but also on the specified value of Π(0), the proportion of non-enriched GO terms, it is only advantageous when either Π(0) is already known with sufficient accuracy or there are data for only 1 GO term. By contrast, the other estimators work without specifying Π(0) but require data for at least 2 GO terms. Our simulation studies yielded the following summaries of the relative performance of each of those four estimators. HBE and HBE-EN produced larger biases for 2, 4, 8, 32, and 100 GO terms than BBE and MLE. BBE has the lowest bias if Π(0) is 1 and if the number of GO terms is between 2 and 32. The bias of MLE is no worse than that of BBE for 100 GO terms even when the ideal number of components in its underlying mixture model is unknown, but has high bias when the number of GO terms is small compared to the number of estimated parameters. For unknown values of Π(0), BBE has the lowest bias for a small number of GO terms (2-32 GO terms), and MLE has the lowest bias for a medium number of GO terms (100 GO terms). CONCLUSIONS: For enrichment detection, we recommend estimating the LFDR by MLE given at least a medium number of GO terms, by BBE given a small number of GO terms, and by NMLE given either only 1 GO term or precise knowledge of Π(0).
format	Online Article Text
id	pubmed-3658916
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36589162013-05-23 Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate Yang, Zhenyu Li, Zuojing Bickel, David R BMC Bioinformatics Methodology Article BACKGROUND: In investigating differentially expressed genes or other selected features, researchers conduct hypothesis tests to determine which biological categories, such as those of the Gene Ontology (GO), are enriched for the selected features. Multiple comparison procedures (MCPs) are commonly used to prevent excessive false positive rates. Traditional MCPs, e.g., the Bonferroni method, go to the opposite extreme: strictly controlling a family-wise error rate, resulting in excessive false negative rates. Researchers generally prefer the more balanced approach of instead controlling the false discovery rate (FDR). However, the q-values that methods of FDR control assign to biological categories tend to be too low to reliably estimate the probability that a biological category is not enriched for the preselected features. Thus, we study an application of the other estimators of that probability, which is called the local FDR (LFDR). RESULTS: We considered five LFDR estimators for detecting enriched GO terms: a binomial-based estimator (BBE), a maximum likelihood estimator (MLE), a normalized MLE (NMLE), a histogram-based estimator assuming a theoretical null hypothesis (HBE), and a histogram-based estimator assuming an empirical null hypothesis (HBE-EN). Since NMLE depends not only on the data but also on the specified value of Π(0), the proportion of non-enriched GO terms, it is only advantageous when either Π(0) is already known with sufficient accuracy or there are data for only 1 GO term. By contrast, the other estimators work without specifying Π(0) but require data for at least 2 GO terms. Our simulation studies yielded the following summaries of the relative performance of each of those four estimators. HBE and HBE-EN produced larger biases for 2, 4, 8, 32, and 100 GO terms than BBE and MLE. BBE has the lowest bias if Π(0) is 1 and if the number of GO terms is between 2 and 32. The bias of MLE is no worse than that of BBE for 100 GO terms even when the ideal number of components in its underlying mixture model is unknown, but has high bias when the number of GO terms is small compared to the number of estimated parameters. For unknown values of Π(0), BBE has the lowest bias for a small number of GO terms (2-32 GO terms), and MLE has the lowest bias for a medium number of GO terms (100 GO terms). CONCLUSIONS: For enrichment detection, we recommend estimating the LFDR by MLE given at least a medium number of GO terms, by BBE given a small number of GO terms, and by NMLE given either only 1 GO term or precise knowledge of Π(0). BioMed Central 2013-03-06 /pmc/articles/PMC3658916/ /pubmed/23497228 http://dx.doi.org/10.1186/1471-2105-14-87 Text en Copyright © 2013 Yang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Yang, Zhenyu Li, Zuojing Bickel, David R Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate
title	Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate
title_full	Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate
title_fullStr	Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate
title_full_unstemmed	Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate
title_short	Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate
title_sort	empirical bayes estimation of posterior probabilities of enrichment: a comparative study of five estimators of the local false discovery rate
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3658916/ https://www.ncbi.nlm.nih.gov/pubmed/23497228 http://dx.doi.org/10.1186/1471-2105-14-87
work_keys_str_mv	AT yangzhenyu empiricalbayesestimationofposteriorprobabilitiesofenrichmentacomparativestudyoffiveestimatorsofthelocalfalsediscoveryrate AT lizuojing empiricalbayesestimationofposteriorprobabilitiesofenrichmentacomparativestudyoffiveestimatorsofthelocalfalsediscoveryrate AT bickeldavidr empiricalbayesestimationofposteriorprobabilitiesofenrichmentacomparativestudyoffiveestimatorsofthelocalfalsediscoveryrate

Empirical Bayes estimation of posterior probabilities of enrichment: A comparative study of five estimators of the local false discovery rate

Ejemplares similares