Cargando…

Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle

BACKGROUND: The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSN...

Descripción completa

Detalles Bibliográficos
Autores principales: van Binsbergen, Rianne, Bink, Marco CAM, Calus, Mario PL, van Eeuwijk, Fred A, Hayes, Ben J, Hulsegge, Ina, Veerkamp, Roel F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4226983/
https://www.ncbi.nlm.nih.gov/pubmed/25022768
http://dx.doi.org/10.1186/1297-9686-46-41
_version_ 1782343709035069440
author van Binsbergen, Rianne
Bink, Marco CAM
Calus, Mario PL
van Eeuwijk, Fred A
Hayes, Ben J
Hulsegge, Ina
Veerkamp, Roel F
author_facet van Binsbergen, Rianne
Bink, Marco CAM
Calus, Mario PL
van Eeuwijk, Fred A
Hayes, Ben J
Hulsegge, Ina
Veerkamp, Roel F
author_sort van Binsbergen, Rianne
collection PubMed
description BACKGROUND: The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle. METHODS: Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated. RESULTS: Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs. CONCLUSIONS: Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability.
format Online
Article
Text
id pubmed-4226983
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42269832014-11-12 Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle van Binsbergen, Rianne Bink, Marco CAM Calus, Mario PL van Eeuwijk, Fred A Hayes, Ben J Hulsegge, Ina Veerkamp, Roel F Genet Sel Evol Research BACKGROUND: The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle. METHODS: Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated. RESULTS: Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs. CONCLUSIONS: Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability. BioMed Central 2014-07-15 /pmc/articles/PMC4226983/ /pubmed/25022768 http://dx.doi.org/10.1186/1297-9686-46-41 Text en Copyright © 2014 van Binsbergen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research
van Binsbergen, Rianne
Bink, Marco CAM
Calus, Mario PL
van Eeuwijk, Fred A
Hayes, Ben J
Hulsegge, Ina
Veerkamp, Roel F
Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle
title Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle
title_full Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle
title_fullStr Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle
title_full_unstemmed Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle
title_short Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle
title_sort accuracy of imputation to whole-genome sequence data in holstein friesian cattle
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4226983/
https://www.ncbi.nlm.nih.gov/pubmed/25022768
http://dx.doi.org/10.1186/1297-9686-46-41
work_keys_str_mv AT vanbinsbergenrianne accuracyofimputationtowholegenomesequencedatainholsteinfriesiancattle
AT binkmarcocam accuracyofimputationtowholegenomesequencedatainholsteinfriesiancattle
AT calusmariopl accuracyofimputationtowholegenomesequencedatainholsteinfriesiancattle
AT vaneeuwijkfreda accuracyofimputationtowholegenomesequencedatainholsteinfriesiancattle
AT hayesbenj accuracyofimputationtowholegenomesequencedatainholsteinfriesiancattle
AT hulseggeina accuracyofimputationtowholegenomesequencedatainholsteinfriesiancattle
AT veerkamproelf accuracyofimputationtowholegenomesequencedatainholsteinfriesiancattle