Cargando…
Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped....
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688469/ https://www.ncbi.nlm.nih.gov/pubmed/19492015 http://dx.doi.org/10.1371/journal.pgen.1000477 |
_version_ | 1782167704374870016 |
---|---|
author | Spencer, Chris C. A. Su, Zhan Donnelly, Peter Marchini, Jonathan |
author_facet | Spencer, Chris C. A. Su, Zhan Donnelly, Peter Marchini, Jonathan |
author_sort | Spencer, Chris C. A. |
collection | PubMed |
description | Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated. |
format | Text |
id | pubmed-2688469 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-26884692009-06-02 Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip Spencer, Chris C. A. Su, Zhan Donnelly, Peter Marchini, Jonathan PLoS Genet Research Article Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated. Public Library of Science 2009-05-15 /pmc/articles/PMC2688469/ /pubmed/19492015 http://dx.doi.org/10.1371/journal.pgen.1000477 Text en Spencer et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Spencer, Chris C. A. Su, Zhan Donnelly, Peter Marchini, Jonathan Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip |
title | Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip |
title_full | Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip |
title_fullStr | Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip |
title_full_unstemmed | Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip |
title_short | Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip |
title_sort | designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688469/ https://www.ncbi.nlm.nih.gov/pubmed/19492015 http://dx.doi.org/10.1371/journal.pgen.1000477 |
work_keys_str_mv | AT spencerchrisca designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip AT suzhan designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip AT donnellypeter designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip AT marchinijonathan designinggenomewideassociationstudiessamplesizepowerimputationandthechoiceofgenotypingchip |