Cargando…

Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies

To facilitate whole-genome association studies (WGAS), several high-density SNP genotyping arrays have been developed. Genetic coverage and statistical power are the primary benchmark metrics in evaluating the performance of SNP arrays. Ideally, such evaluations would be done on a SNP set and a coho...

Descripción completa

Detalles Bibliográficos
Autores principales: Hao, Ke, Schadt, Eric E., Storey, John D.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432039/
https://www.ncbi.nlm.nih.gov/pubmed/18584036
http://dx.doi.org/10.1371/journal.pgen.1000109
_version_ 1782156447057969152
author Hao, Ke
Schadt, Eric E.
Storey, John D.
author_facet Hao, Ke
Schadt, Eric E.
Storey, John D.
author_sort Hao, Ke
collection PubMed
description To facilitate whole-genome association studies (WGAS), several high-density SNP genotyping arrays have been developed. Genetic coverage and statistical power are the primary benchmark metrics in evaluating the performance of SNP arrays. Ideally, such evaluations would be done on a SNP set and a cohort of individuals that are both independently sampled from the original SNPs and individuals used in developing the arrays. Without utilization of an independent test set, previous estimates of genetic coverage and statistical power may be subject to an overfitting bias. Additionally, the SNP arrays' statistical power in WGAS has not been systematically assessed on real traits. One robust setting for doing so is to evaluate statistical power on thousands of traits measured from a single set of individuals. In this study, 359 newly sampled Americans of European descent were genotyped using both Affymetrix 500K (Affx500K) and Illumina 650Y (Ilmn650K) SNP arrays. From these data, we were able to obtain estimates of genetic coverage, which are robust to overfitting, by constructing an independent test set from among these genotypes and individuals. Furthermore, we collected liver tissue RNA from the participants and profiled these samples on a comprehensive gene expression microarray. The RNA levels were used as a large-scale set of quantitative traits to calibrate the relative statistical power of the commercial arrays. Our genetic coverage estimates are lower than previous reports, providing evidence that previous estimates may be inflated due to overfitting. The Ilmn650K platform showed reasonable power (50% or greater) to detect SNPs associated with quantitative traits when the signal-to-noise ratio (SNR) is greater than or equal to 0.5 and the causal SNP's minor allele frequency (MAF) is greater than or equal to 20% (N = 359). In testing each of the more than 40,000 gene expression traits for association to each of the SNPs on the Ilmn650K and Affx500K arrays, we found that the Ilmn650K yielded 15% times more discoveries than the Affx500K at the same false discovery rate (FDR) level.
format Text
id pubmed-2432039
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-24320392008-06-27 Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies Hao, Ke Schadt, Eric E. Storey, John D. PLoS Genet Research Article To facilitate whole-genome association studies (WGAS), several high-density SNP genotyping arrays have been developed. Genetic coverage and statistical power are the primary benchmark metrics in evaluating the performance of SNP arrays. Ideally, such evaluations would be done on a SNP set and a cohort of individuals that are both independently sampled from the original SNPs and individuals used in developing the arrays. Without utilization of an independent test set, previous estimates of genetic coverage and statistical power may be subject to an overfitting bias. Additionally, the SNP arrays' statistical power in WGAS has not been systematically assessed on real traits. One robust setting for doing so is to evaluate statistical power on thousands of traits measured from a single set of individuals. In this study, 359 newly sampled Americans of European descent were genotyped using both Affymetrix 500K (Affx500K) and Illumina 650Y (Ilmn650K) SNP arrays. From these data, we were able to obtain estimates of genetic coverage, which are robust to overfitting, by constructing an independent test set from among these genotypes and individuals. Furthermore, we collected liver tissue RNA from the participants and profiled these samples on a comprehensive gene expression microarray. The RNA levels were used as a large-scale set of quantitative traits to calibrate the relative statistical power of the commercial arrays. Our genetic coverage estimates are lower than previous reports, providing evidence that previous estimates may be inflated due to overfitting. The Ilmn650K platform showed reasonable power (50% or greater) to detect SNPs associated with quantitative traits when the signal-to-noise ratio (SNR) is greater than or equal to 0.5 and the causal SNP's minor allele frequency (MAF) is greater than or equal to 20% (N = 359). In testing each of the more than 40,000 gene expression traits for association to each of the SNPs on the Ilmn650K and Affx500K arrays, we found that the Ilmn650K yielded 15% times more discoveries than the Affx500K at the same false discovery rate (FDR) level. Public Library of Science 2008-06-27 /pmc/articles/PMC2432039/ /pubmed/18584036 http://dx.doi.org/10.1371/journal.pgen.1000109 Text en Hao et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Hao, Ke
Schadt, Eric E.
Storey, John D.
Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies
title Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies
title_full Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies
title_fullStr Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies
title_full_unstemmed Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies
title_short Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies
title_sort calibrating the performance of snp arrays for whole-genome association studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432039/
https://www.ncbi.nlm.nih.gov/pubmed/18584036
http://dx.doi.org/10.1371/journal.pgen.1000109
work_keys_str_mv AT haoke calibratingtheperformanceofsnparraysforwholegenomeassociationstudies
AT schadterice calibratingtheperformanceofsnparraysforwholegenomeassociationstudies
AT storeyjohnd calibratingtheperformanceofsnparraysforwholegenomeassociationstudies