Cargando…

Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias

BACKGROUND: High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs) in genetic studies, including human genome-wide association studies. Hybridization intensities a...

Descripción completa

Detalles Bibliográficos
Autores principales: Didion, John P, Yang, Hyuna, Sheppard, Keith, Fu, Chen-Ping, McMillan, Leonard, de Villena, Fernando Pardo-Manuel, Churchill, Gary A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3305361/
https://www.ncbi.nlm.nih.gov/pubmed/22260749
http://dx.doi.org/10.1186/1471-2164-13-34
_version_ 1782227052506644480
author Didion, John P
Yang, Hyuna
Sheppard, Keith
Fu, Chen-Ping
McMillan, Leonard
de Villena, Fernando Pardo-Manuel
Churchill, Gary A
author_facet Didion, John P
Yang, Hyuna
Sheppard, Keith
Fu, Chen-Ping
McMillan, Leonard
de Villena, Fernando Pardo-Manuel
Churchill, Gary A
author_sort Didion, John P
collection PubMed
description BACKGROUND: High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs) in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. RESULTS: We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO) probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno) that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. CONCLUSION: The problems of ascertainment bias and missing information due to genotyping errors are widely recognized as limiting factors in genetic studies. We have conducted the first formal analysis of the effect of novel variants on genotyping arrays, and we have shown that these variants account for a large portion of miscalled and uncalled genotypes. Genetic studies will benefit from substantial improvements in the accuracy of their results by incorporating VINOs in their analyses.
format Online
Article
Text
id pubmed-3305361
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33053612012-03-16 Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias Didion, John P Yang, Hyuna Sheppard, Keith Fu, Chen-Ping McMillan, Leonard de Villena, Fernando Pardo-Manuel Churchill, Gary A BMC Genomics Methodology Article BACKGROUND: High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs) in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. RESULTS: We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO) probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno) that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. CONCLUSION: The problems of ascertainment bias and missing information due to genotyping errors are widely recognized as limiting factors in genetic studies. We have conducted the first formal analysis of the effect of novel variants on genotyping arrays, and we have shown that these variants account for a large portion of miscalled and uncalled genotypes. Genetic studies will benefit from substantial improvements in the accuracy of their results by incorporating VINOs in their analyses. BioMed Central 2012-01-19 /pmc/articles/PMC3305361/ /pubmed/22260749 http://dx.doi.org/10.1186/1471-2164-13-34 Text en Copyright ©2012 Didion et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Didion, John P
Yang, Hyuna
Sheppard, Keith
Fu, Chen-Ping
McMillan, Leonard
de Villena, Fernando Pardo-Manuel
Churchill, Gary A
Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias
title Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias
title_full Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias
title_fullStr Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias
title_full_unstemmed Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias
title_short Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias
title_sort discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3305361/
https://www.ncbi.nlm.nih.gov/pubmed/22260749
http://dx.doi.org/10.1186/1471-2164-13-34
work_keys_str_mv AT didionjohnp discoveryofnovelvariantsingenotypingarraysimprovesgenotyperetentionandreducesascertainmentbias
AT yanghyuna discoveryofnovelvariantsingenotypingarraysimprovesgenotyperetentionandreducesascertainmentbias
AT sheppardkeith discoveryofnovelvariantsingenotypingarraysimprovesgenotyperetentionandreducesascertainmentbias
AT fuchenping discoveryofnovelvariantsingenotypingarraysimprovesgenotyperetentionandreducesascertainmentbias
AT mcmillanleonard discoveryofnovelvariantsingenotypingarraysimprovesgenotyperetentionandreducesascertainmentbias
AT devillenafernandopardomanuel discoveryofnovelvariantsingenotypingarraysimprovesgenotyperetentionandreducesascertainmentbias
AT churchillgarya discoveryofnovelvariantsingenotypingarraysimprovesgenotyperetentionandreducesascertainmentbias