Cargando…

Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups

BACKGROUND: High-throughtput technologies enable the testing of tens of thousands of measurements simultaneously. Identification of genes that are differentially expressed or associated with clinical outcomes invokes the multiple testing problem. False Discovery Rate (FDR) control is a statistical m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Jiexin, Coombes, Kevin R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426804/ https://www.ncbi.nlm.nih.gov/pubmed/23320794 http://dx.doi.org/10.1186/1471-2105-13-S13-S1

_version_	1782241546018488320
author	Zhang, Jiexin Coombes, Kevin R
author_facet	Zhang, Jiexin Coombes, Kevin R
author_sort	Zhang, Jiexin
collection	PubMed
description	BACKGROUND: High-throughtput technologies enable the testing of tens of thousands of measurements simultaneously. Identification of genes that are differentially expressed or associated with clinical outcomes invokes the multiple testing problem. False Discovery Rate (FDR) control is a statistical method used to correct for multiple comparisons for independent or weakly dependent test statistics. Although FDR control is frequently applied to microarray data analysis, gene expression is usually correlated, which might lead to inaccurate estimates. In this paper, we evaluate the accuracy of FDR estimation. METHODS: Using two real data sets, we resampled subgroups of patients and recalculated statistics of interest to illustrate the imprecision of FDR estimation. Next, we generated many simulated data sets with block correlation structures and realistic noise parameters, using the Ultimate Microarray Prediction, Inference, and Reality Engine (UMPIRE) R package. We estimated FDR using a beta-uniform mixture (BUM) model, and examined the variation in FDR estimation. RESULTS: The three major sources of variation in FDR estimation are the sample size, correlations among genes, and the true proportion of differentially expressed genes (DEGs). The sample size and proportion of DEGs affect both magnitude and precision of FDR estimation, while the correlation structure mainly affects the variation of the estimated parameters. CONCLUSIONS: We have decomposed various factors that affect FDR estimation, and illustrated the direction and extent of the impact. We found that the proportion of DEGs has a significant impact on FDR; this factor might have been overlooked in previous studies and deserves more thought when controlling FDR.
format	Online Article Text
id	pubmed-3426804
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34268042012-08-24 Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups Zhang, Jiexin Coombes, Kevin R BMC Bioinformatics Research BACKGROUND: High-throughtput technologies enable the testing of tens of thousands of measurements simultaneously. Identification of genes that are differentially expressed or associated with clinical outcomes invokes the multiple testing problem. False Discovery Rate (FDR) control is a statistical method used to correct for multiple comparisons for independent or weakly dependent test statistics. Although FDR control is frequently applied to microarray data analysis, gene expression is usually correlated, which might lead to inaccurate estimates. In this paper, we evaluate the accuracy of FDR estimation. METHODS: Using two real data sets, we resampled subgroups of patients and recalculated statistics of interest to illustrate the imprecision of FDR estimation. Next, we generated many simulated data sets with block correlation structures and realistic noise parameters, using the Ultimate Microarray Prediction, Inference, and Reality Engine (UMPIRE) R package. We estimated FDR using a beta-uniform mixture (BUM) model, and examined the variation in FDR estimation. RESULTS: The three major sources of variation in FDR estimation are the sample size, correlations among genes, and the true proportion of differentially expressed genes (DEGs). The sample size and proportion of DEGs affect both magnitude and precision of FDR estimation, while the correlation structure mainly affects the variation of the estimated parameters. CONCLUSIONS: We have decomposed various factors that affect FDR estimation, and illustrated the direction and extent of the impact. We found that the proportion of DEGs has a significant impact on FDR; this factor might have been overlooked in previous studies and deserves more thought when controlling FDR. BioMed Central 2012-08-24 /pmc/articles/PMC3426804/ /pubmed/23320794 http://dx.doi.org/10.1186/1471-2105-13-S13-S1 Text en Copyright ©2012 Zhang and Coombes; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Zhang, Jiexin Coombes, Kevin R Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
title	Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
title_full	Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
title_fullStr	Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
title_full_unstemmed	Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
title_short	Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
title_sort	sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426804/ https://www.ncbi.nlm.nih.gov/pubmed/23320794 http://dx.doi.org/10.1186/1471-2105-13-S13-S1
work_keys_str_mv	AT zhangjiexin sourcesofvariationinfalsediscoveryrateestimationincludesamplesizecorrelationandinherentdifferencesbetweengroups AT coombeskevinr sourcesofvariationinfalsediscoveryrateestimationincludesamplesizecorrelationandinherentdifferencesbetweengroups

Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups

Ejemplares similares