Cargando…

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Miao-Xin, Yeung, Juilian M. Y., Cherny, Stacey S., Sham, Pak C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer-Verlag 2011
Materias:	Original Investigation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3325408/ https://www.ncbi.nlm.nih.gov/pubmed/22143225 http://dx.doi.org/10.1007/s00439-011-1118-2

_version_	1782229430772432896
author	Li, Miao-Xin Yeung, Juilian M. Y. Cherny, Stacey S. Sham, Pak C.
author_facet	Li, Miao-Xin Yeung, Juilian M. Y. Cherny, Stacey S. Sham, Pak C.
author_sort	Li, Miao-Xin
collection	PubMed
description	Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M (e)) for the adjustment of multiple testing, but current methods of calculation for M (e) are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate M (e). Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the M (e), and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10(−7) as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10(−8) for current or merged commercial genotyping arrays, ~10(−8) for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10(−8) for the common SNPs only within genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00439-011-1118-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-3325408
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Springer-Verlag
record_format	MEDLINE/PubMed
spelling	pubmed-33254082012-04-20 Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets Li, Miao-Xin Yeung, Juilian M. Y. Cherny, Stacey S. Sham, Pak C. Hum Genet Original Investigation Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M (e)) for the adjustment of multiple testing, but current methods of calculation for M (e) are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate M (e). Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the M (e), and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10(−7) as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10(−8) for current or merged commercial genotyping arrays, ~10(−8) for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10(−8) for the common SNPs only within genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00439-011-1118-2) contains supplementary material, which is available to authorized users. Springer-Verlag 2011-12-06 2012 /pmc/articles/PMC3325408/ /pubmed/22143225 http://dx.doi.org/10.1007/s00439-011-1118-2 Text en © The Author(s) 2011 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
spellingShingle	Original Investigation Li, Miao-Xin Yeung, Juilian M. Y. Cherny, Stacey S. Sham, Pak C. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
title	Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
title_full	Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
title_fullStr	Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
title_full_unstemmed	Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
title_short	Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
title_sort	evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
topic	Original Investigation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3325408/ https://www.ncbi.nlm.nih.gov/pubmed/22143225 http://dx.doi.org/10.1007/s00439-011-1118-2
work_keys_str_mv	AT limiaoxin evaluatingtheeffectivenumbersofindependenttestsandsignificantpvaluethresholdsincommercialgenotypingarraysandpublicimputationreferencedatasets AT yeungjuilianmy evaluatingtheeffectivenumbersofindependenttestsandsignificantpvaluethresholdsincommercialgenotypingarraysandpublicimputationreferencedatasets AT chernystaceys evaluatingtheeffectivenumbersofindependenttestsandsignificantpvaluethresholdsincommercialgenotypingarraysandpublicimputationreferencedatasets AT shampakc evaluatingtheeffectivenumbersofindependenttestsandsignificantpvaluethresholdsincommercialgenotypingarraysandpublicimputationreferencedatasets

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Ejemplares similares