Cargando…

A practical guide to methods controlling false discoveries in computational biology

BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR me...

Descripción completa

Detalles Bibliográficos
Autores principales: Korthauer, Keegan, Kimes, Patrick K., Duvallet, Claire, Reyes, Alejandro, Subramanian, Ayshwarya, Teng, Mingxiang, Shukla, Chinmay, Alm, Eric J., Hicks, Stephanie C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547503/
https://www.ncbi.nlm.nih.gov/pubmed/31164141
http://dx.doi.org/10.1186/s13059-019-1716-1
_version_ 1783423691838717952
author Korthauer, Keegan
Kimes, Patrick K.
Duvallet, Claire
Reyes, Alejandro
Subramanian, Ayshwarya
Teng, Mingxiang
Shukla, Chinmay
Alm, Eric J.
Hicks, Stephanie C.
author_facet Korthauer, Keegan
Kimes, Patrick K.
Duvallet, Claire
Reyes, Alejandro
Subramanian, Ayshwarya
Teng, Mingxiang
Shukla, Chinmay
Alm, Eric J.
Hicks, Stephanie C.
author_sort Korthauer, Keegan
collection PubMed
description BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. RESULTS: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. CONCLUSIONS: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1716-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6547503
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65475032019-06-06 A practical guide to methods controlling false discoveries in computational biology Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C. Genome Biol Research BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. RESULTS: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. CONCLUSIONS: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1716-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-04 /pmc/articles/PMC6547503/ /pubmed/31164141 http://dx.doi.org/10.1186/s13059-019-1716-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Korthauer, Keegan
Kimes, Patrick K.
Duvallet, Claire
Reyes, Alejandro
Subramanian, Ayshwarya
Teng, Mingxiang
Shukla, Chinmay
Alm, Eric J.
Hicks, Stephanie C.
A practical guide to methods controlling false discoveries in computational biology
title A practical guide to methods controlling false discoveries in computational biology
title_full A practical guide to methods controlling false discoveries in computational biology
title_fullStr A practical guide to methods controlling false discoveries in computational biology
title_full_unstemmed A practical guide to methods controlling false discoveries in computational biology
title_short A practical guide to methods controlling false discoveries in computational biology
title_sort practical guide to methods controlling false discoveries in computational biology
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547503/
https://www.ncbi.nlm.nih.gov/pubmed/31164141
http://dx.doi.org/10.1186/s13059-019-1716-1
work_keys_str_mv AT korthauerkeegan apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT kimespatrickk apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT duvalletclaire apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT reyesalejandro apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT subramanianayshwarya apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT tengmingxiang apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT shuklachinmay apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT almericj apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT hicksstephaniec apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT korthauerkeegan practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT kimespatrickk practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT duvalletclaire practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT reyesalejandro practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT subramanianayshwarya practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT tengmingxiang practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT shuklachinmay practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT almericj practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology
AT hicksstephaniec practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology