Cargando…

Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples

BACKGROUND: Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling a...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Huixiao, Su, Zhenqiang, Ge, Weigong, Shi, Leming, Perkins, Roger, Fang, Hong, Xu, Joshua, Chen, James J, Han, Tao, Kaput, Jim, Fuscoe, James C, Tong, Weida
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537568/
https://www.ncbi.nlm.nih.gov/pubmed/18793462
http://dx.doi.org/10.1186/1471-2105-9-S9-S17
_version_ 1782159108723441664
author Hong, Huixiao
Su, Zhenqiang
Ge, Weigong
Shi, Leming
Perkins, Roger
Fang, Hong
Xu, Joshua
Chen, James J
Han, Tao
Kaput, Jim
Fuscoe, James C
Tong, Weida
author_facet Hong, Huixiao
Su, Zhenqiang
Ge, Weigong
Shi, Leming
Perkins, Roger
Fang, Hong
Xu, Joshua
Chen, James J
Han, Tao
Kaput, Jim
Fuscoe, James C
Tong, Weida
author_sort Hong, Huixiao
collection PubMed
description BACKGROUND: Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set. RESULTS: Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls. CONCLUSION: Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
format Text
id pubmed-2537568
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25375682008-09-17 Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples Hong, Huixiao Su, Zhenqiang Ge, Weigong Shi, Leming Perkins, Roger Fang, Hong Xu, Joshua Chen, James J Han, Tao Kaput, Jim Fuscoe, James C Tong, Weida BMC Bioinformatics Proceedings BACKGROUND: Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set. RESULTS: Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls. CONCLUSION: Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch. BioMed Central 2008-08-12 /pmc/articles/PMC2537568/ /pubmed/18793462 http://dx.doi.org/10.1186/1471-2105-9-S9-S17 Text en Copyright © 2008 Hong et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Hong, Huixiao
Su, Zhenqiang
Ge, Weigong
Shi, Leming
Perkins, Roger
Fang, Hong
Xu, Joshua
Chen, James J
Han, Tao
Kaput, Jim
Fuscoe, James C
Tong, Weida
Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
title Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
title_full Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
title_fullStr Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
title_full_unstemmed Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
title_short Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples
title_sort assessing batch effects of genotype calling algorithm brlmm for the affymetrix genechip human mapping 500 k array set using 270 hapmap samples
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2537568/
https://www.ncbi.nlm.nih.gov/pubmed/18793462
http://dx.doi.org/10.1186/1471-2105-9-S9-S17
work_keys_str_mv AT honghuixiao assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT suzhenqiang assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT geweigong assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT shileming assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT perkinsroger assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT fanghong assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT xujoshua assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT chenjamesj assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT hantao assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT kaputjim assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT fuscoejamesc assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples
AT tongweida assessingbatcheffectsofgenotypecallingalgorithmbrlmmfortheaffymetrixgenechiphumanmapping500karraysetusing270hapmapsamples