Cargando…

Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics

Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric d...

Descripción completa

Detalles Bibliográficos
Autores principales: Holland, Dominic, Wang, Yunpeng, Thompson, Wesley K., Schork, Andrew, Chen, Chi-Hua, Lo, Min-Tzu, Witoelar, Aree, Werge, Thomas, O'Donovan, Michael, Andreassen, Ole A., Dale, Anders M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754432/
https://www.ncbi.nlm.nih.gov/pubmed/26909100
http://dx.doi.org/10.3389/fgene.2016.00015
_version_ 1782416016696934400
author Holland, Dominic
Wang, Yunpeng
Thompson, Wesley K.
Schork, Andrew
Chen, Chi-Hua
Lo, Min-Tzu
Witoelar, Aree
Werge, Thomas
O'Donovan, Michael
Andreassen, Ole A.
Dale, Anders M.
author_facet Holland, Dominic
Wang, Yunpeng
Thompson, Wesley K.
Schork, Andrew
Chen, Chi-Hua
Lo, Min-Tzu
Witoelar, Aree
Werge, Thomas
O'Donovan, Michael
Andreassen, Ole A.
Dale, Anders M.
author_sort Holland, Dominic
collection PubMed
description Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.
format Online
Article
Text
id pubmed-4754432
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-47544322016-02-23 Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics Holland, Dominic Wang, Yunpeng Thompson, Wesley K. Schork, Andrew Chen, Chi-Hua Lo, Min-Tzu Witoelar, Aree Werge, Thomas O'Donovan, Michael Andreassen, Ole A. Dale, Anders M. Front Genet Genetics Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures. Frontiers Media S.A. 2016-02-16 /pmc/articles/PMC4754432/ /pubmed/26909100 http://dx.doi.org/10.3389/fgene.2016.00015 Text en Copyright © 2016 Holland, Wang, Thompson, Schork, Chen, Lo, Witoelar, Werge, O'Donovan, Andreassen and Dale. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Holland, Dominic
Wang, Yunpeng
Thompson, Wesley K.
Schork, Andrew
Chen, Chi-Hua
Lo, Min-Tzu
Witoelar, Aree
Werge, Thomas
O'Donovan, Michael
Andreassen, Ole A.
Dale, Anders M.
Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_full Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_fullStr Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_full_unstemmed Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_short Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
title_sort estimating effect sizes and expected replication probabilities from gwas summary statistics
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754432/
https://www.ncbi.nlm.nih.gov/pubmed/26909100
http://dx.doi.org/10.3389/fgene.2016.00015
work_keys_str_mv AT hollanddominic estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT wangyunpeng estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT thompsonwesleyk estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT schorkandrew estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT chenchihua estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT lomintzu estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT witoelararee estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT wergethomas estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT odonovanmichael estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT andreassenolea estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics
AT daleandersm estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics