Cargando…
A practical guide to methods controlling false discoveries in computational biology
BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR me...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547503/ https://www.ncbi.nlm.nih.gov/pubmed/31164141 http://dx.doi.org/10.1186/s13059-019-1716-1 |
_version_ | 1783423691838717952 |
---|---|
author | Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C. |
author_facet | Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C. |
author_sort | Korthauer, Keegan |
collection | PubMed |
description | BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. RESULTS: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. CONCLUSIONS: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1716-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6547503 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65475032019-06-06 A practical guide to methods controlling false discoveries in computational biology Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C. Genome Biol Research BACKGROUND: In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. RESULTS: Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. CONCLUSIONS: Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1716-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-04 /pmc/articles/PMC6547503/ /pubmed/31164141 http://dx.doi.org/10.1186/s13059-019-1716-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Korthauer, Keegan Kimes, Patrick K. Duvallet, Claire Reyes, Alejandro Subramanian, Ayshwarya Teng, Mingxiang Shukla, Chinmay Alm, Eric J. Hicks, Stephanie C. A practical guide to methods controlling false discoveries in computational biology |
title | A practical guide to methods controlling false discoveries in computational biology |
title_full | A practical guide to methods controlling false discoveries in computational biology |
title_fullStr | A practical guide to methods controlling false discoveries in computational biology |
title_full_unstemmed | A practical guide to methods controlling false discoveries in computational biology |
title_short | A practical guide to methods controlling false discoveries in computational biology |
title_sort | practical guide to methods controlling false discoveries in computational biology |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547503/ https://www.ncbi.nlm.nih.gov/pubmed/31164141 http://dx.doi.org/10.1186/s13059-019-1716-1 |
work_keys_str_mv | AT korthauerkeegan apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT kimespatrickk apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT duvalletclaire apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT reyesalejandro apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT subramanianayshwarya apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT tengmingxiang apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT shuklachinmay apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT almericj apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT hicksstephaniec apracticalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT korthauerkeegan practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT kimespatrickk practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT duvalletclaire practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT reyesalejandro practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT subramanianayshwarya practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT tengmingxiang practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT shuklachinmay practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT almericj practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology AT hicksstephaniec practicalguidetomethodscontrollingfalsediscoveriesincomputationalbiology |