Cargando…

Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold s...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Buhm, Kang, Hyun Min, Eskin, Eleazar
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2663787/
https://www.ncbi.nlm.nih.gov/pubmed/19381255
http://dx.doi.org/10.1371/journal.pgen.1000456
_version_ 1782165923836198912
author Han, Buhm
Kang, Hyun Min
Eskin, Eleazar
author_facet Han, Buhm
Kang, Hyun Min
Eskin, Eleazar
author_sort Han, Buhm
collection PubMed
description With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.
format Text
id pubmed-2663787
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26637872009-04-17 Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers Han, Buhm Kang, Hyun Min Eskin, Eleazar PLoS Genet Research Article With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. Public Library of Science 2009-04-17 /pmc/articles/PMC2663787/ /pubmed/19381255 http://dx.doi.org/10.1371/journal.pgen.1000456 Text en Han et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Han, Buhm
Kang, Hyun Min
Eskin, Eleazar
Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers
title Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers
title_full Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers
title_fullStr Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers
title_full_unstemmed Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers
title_short Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers
title_sort rapid and accurate multiple testing correction and power estimation for millions of correlated markers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2663787/
https://www.ncbi.nlm.nih.gov/pubmed/19381255
http://dx.doi.org/10.1371/journal.pgen.1000456
work_keys_str_mv AT hanbuhm rapidandaccuratemultipletestingcorrectionandpowerestimationformillionsofcorrelatedmarkers
AT kanghyunmin rapidandaccuratemultipletestingcorrectionandpowerestimationformillionsofcorrelatedmarkers
AT eskineleazar rapidandaccuratemultipletestingcorrectionandpowerestimationformillionsofcorrelatedmarkers