Cargando…

Using controls to limit false discovery in the era of big data

BACKGROUND: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserv...

Descripción completa

Detalles Bibliográficos
Autores principales:	Parks, Matthew M., Raphael, Benjamin J., Lawrence, Charles E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6137876/ https://www.ncbi.nlm.nih.gov/pubmed/30217148 http://dx.doi.org/10.1186/s12859-018-2356-2

_version_	1783355247022833664
author	Parks, Matthew M. Raphael, Benjamin J. Lawrence, Charles E.
author_facet	Parks, Matthew M. Raphael, Benjamin J. Lawrence, Charles E.
author_sort	Parks, Matthew M.
collection	PubMed
description	BACKGROUND: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results. RESULTS: We present a general method for controlling the FDR that capitalizes on the large amount of control data often found in big data studies to avoid these frequently problematic intermediate steps. The method utilizes control data to empirically construct the distribution of the test statistic under the null hypothesis and directly compares this distribution to the empirical distribution of the test data. By not relying on p-values, our control data-based empirical FDR procedure more closely follows the foundational principles of the scientific method: that inference is drawn by comparing test data to control data. The method is demonstrated through application to a problem in structural genomics. CONCLUSIONS: The method described here provides a general statistical framework for controlling the FDR that is specifically tailored for the big data setting. By relying on empirically constructed distributions and control data, it forgoes potentially problematic modeling steps and extrapolation into the unknown tails of the null distribution. This procedure is broadly applicable insofar as controlled experiments or internal negative controls are available, as is increasingly common in the big data setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2356-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6137876
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-61378762018-09-15 Using controls to limit false discovery in the era of big data Parks, Matthew M. Raphael, Benjamin J. Lawrence, Charles E. BMC Bioinformatics Methodology Article BACKGROUND: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results. RESULTS: We present a general method for controlling the FDR that capitalizes on the large amount of control data often found in big data studies to avoid these frequently problematic intermediate steps. The method utilizes control data to empirically construct the distribution of the test statistic under the null hypothesis and directly compares this distribution to the empirical distribution of the test data. By not relying on p-values, our control data-based empirical FDR procedure more closely follows the foundational principles of the scientific method: that inference is drawn by comparing test data to control data. The method is demonstrated through application to a problem in structural genomics. CONCLUSIONS: The method described here provides a general statistical framework for controlling the FDR that is specifically tailored for the big data setting. By relying on empirically constructed distributions and control data, it forgoes potentially problematic modeling steps and extrapolation into the unknown tails of the null distribution. This procedure is broadly applicable insofar as controlled experiments or internal negative controls are available, as is increasingly common in the big data setting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2356-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-09-14 /pmc/articles/PMC6137876/ /pubmed/30217148 http://dx.doi.org/10.1186/s12859-018-2356-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Parks, Matthew M. Raphael, Benjamin J. Lawrence, Charles E. Using controls to limit false discovery in the era of big data
title	Using controls to limit false discovery in the era of big data
title_full	Using controls to limit false discovery in the era of big data
title_fullStr	Using controls to limit false discovery in the era of big data
title_full_unstemmed	Using controls to limit false discovery in the era of big data
title_short	Using controls to limit false discovery in the era of big data
title_sort	using controls to limit false discovery in the era of big data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6137876/ https://www.ncbi.nlm.nih.gov/pubmed/30217148 http://dx.doi.org/10.1186/s12859-018-2356-2
work_keys_str_mv	AT parksmatthewm usingcontrolstolimitfalsediscoveryintheeraofbigdata AT raphaelbenjaminj usingcontrolstolimitfalsediscoveryintheeraofbigdata AT lawrencecharlese usingcontrolstolimitfalsediscoveryintheeraofbigdata

Using controls to limit false discovery in the era of big data

Ejemplares similares