Cargando…

A practical guide to methods controlling false discoveries in computational biology

BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR me...

Descripción completa

Detalles Bibliográficos
Autores principales:	Korthauer, Keegan, Kimes, Patrick K., Duvallet, Claire, Reyes, Alejandro, Subramanian, Ayshwarya, Teng, Mingxiang, Shukla, Chinmay, Alm, Eric J., Hicks, Stephanie C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547503/ https://www.ncbi.nlm.nih.gov/pubmed/31164141 http://dx.doi.org/10.1186/s13059-019-1716-1

_version_	1783423691838717952
author	Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C.
author_facet	Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C.
author_sort	Korthauer, Keegan
collection	PubMed
description	BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. RESULTS: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. CONCLUSIONS: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1716-1) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6547503
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-65475032019-06-06 A practical guide to methods controlling false discoveries in computational biology Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C. Genome Biol Research BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. RESULTS: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. CONCLUSIONS: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1716-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-04 /pmc/articles/PMC6547503/ /pubmed/31164141 http://dx.doi.org/10.1186/s13059-019-1716-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C. A practical guide to methods controlling false discoveries in computational biology
title	A practical guide to methods controlling false discoveries in computational biology
title_full	A practical guide to methods controlling false discoveries in computational biology
title_fullStr	A practical guide to methods controlling false discoveries in computational biology
title_full_unstemmed	A practical guide to methods controlling false discoveries in computational biology
title_short	A practical guide to methods controlling false discoveries in computational biology
title_sort	practical guide to methods controlling false discoveries in computational biology
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547503/ https://www.ncbi.nlm.nih.gov/pubmed/31164141 http://dx.doi.org/10.1186/s13059-019-1716-1
work_keys_str_mv	AT korthauerkeegan apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT kimespatrickk apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT duvalletclaire apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT reyesalejandro apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT subramanianayshwarya apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT tengmingxiang apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT shuklachinmay apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT almericj apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT hicksstephaniec apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT korthauerkeegan practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT kimespatrickk practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT duvalletclaire practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT reyesalejandro practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT subramanianayshwarya practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT tengmingxiang practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT shuklachinmay practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT almericj practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT hicksstephaniec practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology

A practical guide to methods controlling false discoveries in computational biology

Ejemplares similares