Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies

Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from com...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, ChangJiang, Tachmazidou, Ioanna, Walter, Klaudia, Ciampi, Antonio, Zeggini, Eleftheria, Greenwood, Celia M T
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Ltd 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489336/
https://www.ncbi.nlm.nih.gov/pubmed/24676807
http://dx.doi.org/10.1002/gepi.21797
_version_ 1782379339598266368
author Xu, ChangJiang
Tachmazidou, Ioanna
Walter, Klaudia
Ciampi, Antonio
Zeggini, Eleftheria
Greenwood, Celia M T
author_facet Xu, ChangJiang
Tachmazidou, Ioanna
Walter, Klaudia
Ciampi, Antonio
Zeggini, Eleftheria
Greenwood, Celia M T
author_sort Xu, ChangJiang
collection PubMed
description Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(−8) and 8 × 10(−8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(−8)–1.5 × 10(−8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.
format Online
Article
Text
id pubmed-4489336
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher John Wiley & Sons, Ltd
record_format MEDLINE/PubMed
spelling pubmed-44893362015-07-07 Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies Xu, ChangJiang Tachmazidou, Ioanna Walter, Klaudia Ciampi, Antonio Zeggini, Eleftheria Greenwood, Celia M T Genet Epidemiol Research Articles Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(−8) and 8 × 10(−8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(−8)–1.5 × 10(−8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy. John Wiley & Sons, Ltd 2014-04 2014-02-14 /pmc/articles/PMC4489336/ /pubmed/24676807 http://dx.doi.org/10.1002/gepi.21797 Text en © 2014 The Authors. Genetic Epidemiology published by Wiley Periodicals, Inc. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Xu, ChangJiang
Tachmazidou, Ioanna
Walter, Klaudia
Ciampi, Antonio
Zeggini, Eleftheria
Greenwood, Celia M T
Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies
title Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies
title_full Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies
title_fullStr Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies
title_full_unstemmed Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies
title_short Estimating Genome-Wide Significance for Whole-Genome Sequencing Studies
title_sort estimating genome-wide significance for whole-genome sequencing studies
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489336/
https://www.ncbi.nlm.nih.gov/pubmed/24676807
http://dx.doi.org/10.1002/gepi.21797
work_keys_str_mv AT xuchangjiang estimatinggenomewidesignificanceforwholegenomesequencingstudies
AT tachmazidouioanna estimatinggenomewidesignificanceforwholegenomesequencingstudies
AT walterklaudia estimatinggenomewidesignificanceforwholegenomesequencingstudies
AT ciampiantonio estimatinggenomewidesignificanceforwholegenomesequencingstudies
AT zegginieleftheria estimatinggenomewidesignificanceforwholegenomesequencingstudies
AT greenwoodceliamt estimatinggenomewidesignificanceforwholegenomesequencingstudies
AT estimatinggenomewidesignificanceforwholegenomesequencingstudies