Cargando…

Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing

Multiple hypothesis testing is an essential component of modern data science. In many settings, in addition to the p-value, additional covariates for each hypothesis are available, e.g., functional annotation of variants in genome-wide association studies. Such information is ignored by popular mult...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Martin J., Xia, Fei, Zou, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6668431/
https://www.ncbi.nlm.nih.gov/pubmed/31366926
http://dx.doi.org/10.1038/s41467-019-11247-0
Descripción
Sumario:Multiple hypothesis testing is an essential component of modern data science. In many settings, in addition to the p-value, additional covariates for each hypothesis are available, e.g., functional annotation of variants in genome-wide association studies. Such information is ignored by popular multiple testing approaches such as the Benjamini-Hochberg procedure (BH). Here we introduce AdaFDR, a fast and flexible method that adaptively learns the optimal p-value threshold from covariates to significantly improve detection power. On eQTL analysis of the GTEx data, AdaFDR discovers 32% more associations than BH at the same false discovery rate. We prove that AdaFDR controls false discovery proportion and show that it makes substantially more discoveries while controlling false discovery rate (FDR) in extensive experiments. AdaFDR is computationally efficient and allows multi-dimensional covariates with both numeric and categorical values, making it broadly useful across many applications.