Cargando…

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped....

Descripción completa

Detalles Bibliográficos
Autores principales: Spencer, Chris C. A., Su, Zhan, Donnelly, Peter, Marchini, Jonathan
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688469/
https://www.ncbi.nlm.nih.gov/pubmed/19492015
http://dx.doi.org/10.1371/journal.pgen.1000477
_version_ 1782167704374870016
author Spencer, Chris C. A.
Su, Zhan
Donnelly, Peter
Marchini, Jonathan
author_facet Spencer, Chris C. A.
Su, Zhan
Donnelly, Peter
Marchini, Jonathan
author_sort Spencer, Chris C. A.
collection PubMed
description Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
format Text
id pubmed-2688469
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26884692009-06-02 Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip Spencer, Chris C. A. Su, Zhan Donnelly, Peter Marchini, Jonathan PLoS Genet Research Article Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated. Public Library of Science 2009-05-15 /pmc/articles/PMC2688469/ /pubmed/19492015 http://dx.doi.org/10.1371/journal.pgen.1000477 Text en Spencer et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Spencer, Chris C. A.
Su, Zhan
Donnelly, Peter
Marchini, Jonathan
Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip
title Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip
title_full Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip
title_fullStr Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip
title_full_unstemmed Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip
title_short Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip
title_sort designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688469/
https://www.ncbi.nlm.nih.gov/pubmed/19492015
http://dx.doi.org/10.1371/journal.pgen.1000477
work_keys_str_mv AT spencerchrisca designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip
AT suzhan designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip
AT donnellypeter designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip
AT marchinijonathan designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip