Cargando…

Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle

BACKGROUND: We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility...

Descripción completa

Detalles Bibliográficos
Autores principales: Khatkar, Mehar S, Moser, Gerhard, Hayes, Ben J, Raadsma, Herman W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531262/
https://www.ncbi.nlm.nih.gov/pubmed/23043356
http://dx.doi.org/10.1186/1471-2164-13-538
_version_ 1782254143568609280
author Khatkar, Mehar S
Moser, Gerhard
Hayes, Ben J
Raadsma, Herman W
author_facet Khatkar, Mehar S
Moser, Gerhard
Hayes, Ben J
Raadsma, Herman W
author_sort Khatkar, Mehar S
collection PubMed
description BACKGROUND: We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios. RESULTS: Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K. CONCLUSION: Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection.
format Online
Article
Text
id pubmed-3531262
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35312622013-01-10 Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle Khatkar, Mehar S Moser, Gerhard Hayes, Ben J Raadsma, Herman W BMC Genomics Research Article BACKGROUND: We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios. RESULTS: Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K. CONCLUSION: Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection. BioMed Central 2012-10-08 /pmc/articles/PMC3531262/ /pubmed/23043356 http://dx.doi.org/10.1186/1471-2164-13-538 Text en Copyright ©2012 Khatkar et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Khatkar, Mehar S
Moser, Gerhard
Hayes, Ben J
Raadsma, Herman W
Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
title Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
title_full Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
title_fullStr Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
title_full_unstemmed Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
title_short Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
title_sort strategies and utility of imputed snp genotypes for genomic analysis in dairy cattle
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531262/
https://www.ncbi.nlm.nih.gov/pubmed/23043356
http://dx.doi.org/10.1186/1471-2164-13-538
work_keys_str_mv AT khatkarmehars strategiesandutilityofimputedsnpgenotypesforgenomicanalysisindairycattle
AT mosergerhard strategiesandutilityofimputedsnpgenotypesforgenomicanalysisindairycattle
AT hayesbenj strategiesandutilityofimputedsnpgenotypesforgenomicanalysisindairycattle
AT raadsmahermanw strategiesandutilityofimputedsnpgenotypesforgenomicanalysisindairycattle