Cargando…

A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk

To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using aux...

Descripción completa

Detalles Bibliográficos
Autores principales: Yurko, Ronald, G’Sell, Max, Roeder, Kathryn, Devlin, Bernie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334489/
https://www.ncbi.nlm.nih.gov/pubmed/32522875
http://dx.doi.org/10.1073/pnas.1918862117
_version_ 1783553943011328000
author Yurko, Ronald
G’Sell, Max
Roeder, Kathryn
Devlin, Bernie
author_facet Yurko, Ronald
G’Sell, Max
Roeder, Kathryn
Devlin, Bernie
author_sort Yurko, Ronald
collection PubMed
description To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive P-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association P values play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.
format Online
Article
Text
id pubmed-7334489
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-73344892020-07-15 A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk Yurko, Ronald G’Sell, Max Roeder, Kathryn Devlin, Bernie Proc Natl Acad Sci U S A Biological Sciences To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive P-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association P values play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power. National Academy of Sciences 2020-06-30 2020-06-10 /pmc/articles/PMC7334489/ /pubmed/32522875 http://dx.doi.org/10.1073/pnas.1918862117 Text en Copyright © 2020 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Yurko, Ronald
G’Sell, Max
Roeder, Kathryn
Devlin, Bernie
A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
title A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
title_full A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
title_fullStr A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
title_full_unstemmed A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
title_short A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
title_sort selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334489/
https://www.ncbi.nlm.nih.gov/pubmed/32522875
http://dx.doi.org/10.1073/pnas.1918862117
work_keys_str_mv AT yurkoronald aselectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk
AT gsellmax aselectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk
AT roederkathryn aselectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk
AT devlinbernie aselectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk
AT yurkoronald selectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk
AT gsellmax selectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk
AT roederkathryn selectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk
AT devlinbernie selectiveinferenceapproachforfalsediscoveryratecontrolusingmultiomicscovariatesyieldsinsightsintodiseaserisk