Cargando…

Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies

BACKGROUND: Until recently, genome-wide association studies (GWAS) have been restricted to research groups with the budget necessary to genotype hundreds, if not thousands, of samples. Replacing individual genotyping with genotyping of DNA pools in Phase I of a GWAS has proven successful, and dramat...

Descripción completa

Detalles Bibliográficos
Autores principales: Earp, Madalene A, Rahmani, Maziar, Chew, Kevin, Brooks-Wilson, Angela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247851/
https://www.ncbi.nlm.nih.gov/pubmed/22122996
http://dx.doi.org/10.1186/1755-8794-4-81
_version_ 1782220179557580800
author Earp, Madalene A
Rahmani, Maziar
Chew, Kevin
Brooks-Wilson, Angela
author_facet Earp, Madalene A
Rahmani, Maziar
Chew, Kevin
Brooks-Wilson, Angela
author_sort Earp, Madalene A
collection PubMed
description BACKGROUND: Until recently, genome-wide association studies (GWAS) have been restricted to research groups with the budget necessary to genotype hundreds, if not thousands, of samples. Replacing individual genotyping with genotyping of DNA pools in Phase I of a GWAS has proven successful, and dramatically altered the financial feasibility of this approach. When conducting a pool-based GWAS, how well SNP allele frequency is estimated from a DNA pool will influence a study's power to detect associations. Here we address how to control the variance in allele frequency estimation when DNAs are pooled, and how to plan and conduct the most efficient well-powered pool-based GWAS. METHODS: By examining the variation in allele frequency estimation on SNP arrays between and within DNA pools we determine how array variance [var(e(array))] and pool-construction variance [var(e(construction))] contribute to the total variance of allele frequency estimation. This information is useful in deciding whether replicate arrays or replicate pools are most useful in reducing variance. Our analysis is based on 27 DNA pools ranging in size from 74 to 446 individual samples, genotyped on a collective total of 128 Illumina beadarrays: 24 1M-Single, 32 1M-Duo, and 72 660-Quad. RESULTS: For all three Illumina SNP array types our estimates of var(e(array)) were similar, between 3-4 × 10(-4 )for normalized data. Var(e(construction)) accounted for between 20-40% of pooling variance across 27 pools in normalized data. CONCLUSIONS: We conclude that relative to var(e(array)), var(e(construction)) is of less importance in reducing the variance in allele frequency estimation from DNA pools; however, our data suggests that on average it may be more important than previously thought. We have prepared a simple online tool, PoolingPlanner (available at http://www.kchew.ca/PoolingPlanner/), which calculates the effective sample size (ESS) of a DNA pool given a range of replicate array values. ESS can be used in a power calculator to perform pool-adjusted calculations. This allows one to quickly calculate the loss of power associated with a pooling experiment to make an informed decision on whether a pool-based GWAS is worth pursuing.
format Online
Article
Text
id pubmed-3247851
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32478512011-12-30 Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies Earp, Madalene A Rahmani, Maziar Chew, Kevin Brooks-Wilson, Angela BMC Med Genomics Research Article BACKGROUND: Until recently, genome-wide association studies (GWAS) have been restricted to research groups with the budget necessary to genotype hundreds, if not thousands, of samples. Replacing individual genotyping with genotyping of DNA pools in Phase I of a GWAS has proven successful, and dramatically altered the financial feasibility of this approach. When conducting a pool-based GWAS, how well SNP allele frequency is estimated from a DNA pool will influence a study's power to detect associations. Here we address how to control the variance in allele frequency estimation when DNAs are pooled, and how to plan and conduct the most efficient well-powered pool-based GWAS. METHODS: By examining the variation in allele frequency estimation on SNP arrays between and within DNA pools we determine how array variance [var(e(array))] and pool-construction variance [var(e(construction))] contribute to the total variance of allele frequency estimation. This information is useful in deciding whether replicate arrays or replicate pools are most useful in reducing variance. Our analysis is based on 27 DNA pools ranging in size from 74 to 446 individual samples, genotyped on a collective total of 128 Illumina beadarrays: 24 1M-Single, 32 1M-Duo, and 72 660-Quad. RESULTS: For all three Illumina SNP array types our estimates of var(e(array)) were similar, between 3-4 × 10(-4 )for normalized data. Var(e(construction)) accounted for between 20-40% of pooling variance across 27 pools in normalized data. CONCLUSIONS: We conclude that relative to var(e(array)), var(e(construction)) is of less importance in reducing the variance in allele frequency estimation from DNA pools; however, our data suggests that on average it may be more important than previously thought. We have prepared a simple online tool, PoolingPlanner (available at http://www.kchew.ca/PoolingPlanner/), which calculates the effective sample size (ESS) of a DNA pool given a range of replicate array values. ESS can be used in a power calculator to perform pool-adjusted calculations. This allows one to quickly calculate the loss of power associated with a pooling experiment to make an informed decision on whether a pool-based GWAS is worth pursuing. BioMed Central 2011-11-28 /pmc/articles/PMC3247851/ /pubmed/22122996 http://dx.doi.org/10.1186/1755-8794-4-81 Text en Copyright ©2011 Earp et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Earp, Madalene A
Rahmani, Maziar
Chew, Kevin
Brooks-Wilson, Angela
Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies
title Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies
title_full Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies
title_fullStr Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies
title_full_unstemmed Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies
title_short Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies
title_sort estimates of array and pool-construction variance for planning efficient dna-pooling genome wide association studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247851/
https://www.ncbi.nlm.nih.gov/pubmed/22122996
http://dx.doi.org/10.1186/1755-8794-4-81
work_keys_str_mv AT earpmadalenea estimatesofarrayandpoolconstructionvarianceforplanningefficientdnapoolinggenomewideassociationstudies
AT rahmanimaziar estimatesofarrayandpoolconstructionvarianceforplanningefficientdnapoolinggenomewideassociationstudies
AT chewkevin estimatesofarrayandpoolconstructionvarianceforplanningefficientdnapoolinggenomewideassociationstudies
AT brookswilsonangela estimatesofarrayandpoolconstructionvarianceforplanningefficientdnapoolinggenomewideassociationstudies