Cargando…
Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics
Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric d...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754432/ https://www.ncbi.nlm.nih.gov/pubmed/26909100 http://dx.doi.org/10.3389/fgene.2016.00015 |
_version_ | 1782416016696934400 |
---|---|
author | Holland, Dominic Wang, Yunpeng Thompson, Wesley K. Schork, Andrew Chen, Chi-Hua Lo, Min-Tzu Witoelar, Aree Werge, Thomas O'Donovan, Michael Andreassen, Ole A. Dale, Anders M. |
author_facet | Holland, Dominic Wang, Yunpeng Thompson, Wesley K. Schork, Andrew Chen, Chi-Hua Lo, Min-Tzu Witoelar, Aree Werge, Thomas O'Donovan, Michael Andreassen, Ole A. Dale, Anders M. |
author_sort | Holland, Dominic |
collection | PubMed |
description | Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures. |
format | Online Article Text |
id | pubmed-4754432 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-47544322016-02-23 Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics Holland, Dominic Wang, Yunpeng Thompson, Wesley K. Schork, Andrew Chen, Chi-Hua Lo, Min-Tzu Witoelar, Aree Werge, Thomas O'Donovan, Michael Andreassen, Ole A. Dale, Anders M. Front Genet Genetics Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures. Frontiers Media S.A. 2016-02-16 /pmc/articles/PMC4754432/ /pubmed/26909100 http://dx.doi.org/10.3389/fgene.2016.00015 Text en Copyright © 2016 Holland, Wang, Thompson, Schork, Chen, Lo, Witoelar, Werge, O'Donovan, Andreassen and Dale. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Holland, Dominic Wang, Yunpeng Thompson, Wesley K. Schork, Andrew Chen, Chi-Hua Lo, Min-Tzu Witoelar, Aree Werge, Thomas O'Donovan, Michael Andreassen, Ole A. Dale, Anders M. Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics |
title | Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics |
title_full | Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics |
title_fullStr | Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics |
title_full_unstemmed | Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics |
title_short | Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics |
title_sort | estimating effect sizes and expected replication probabilities from gwas summary statistics |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754432/ https://www.ncbi.nlm.nih.gov/pubmed/26909100 http://dx.doi.org/10.3389/fgene.2016.00015 |
work_keys_str_mv | AT hollanddominic estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT wangyunpeng estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT thompsonwesleyk estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT schorkandrew estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT chenchihua estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT lomintzu estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT witoelararee estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT wergethomas estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT odonovanmichael estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT andreassenolea estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics AT daleandersm estimatingeffectsizesandexpectedreplicationprobabilitiesfromgwassummarystatistics |