Cargando…

Accuracy of imputation to whole-genome sequence in sheep

BACKGROUND: The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nuc...

Descripción completa

Detalles Bibliográficos
Autores principales: Bolormaa, Sunduimijid, Chamberlain, Amanda J., Khansefid, Majid, Stothard, Paul, Swan, Andrew A., Mason, Brett, Prowse-Wilkins, Claire P., Duijvesteijn, Naomi, Moghaddar, Nasir, van der Werf, Julius H., Daetwyler, Hans D., MacLeod, Iona M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337865/
https://www.ncbi.nlm.nih.gov/pubmed/30654735
http://dx.doi.org/10.1186/s12711-018-0443-5
_version_ 1783388349600366592
author Bolormaa, Sunduimijid
Chamberlain, Amanda J.
Khansefid, Majid
Stothard, Paul
Swan, Andrew A.
Mason, Brett
Prowse-Wilkins, Claire P.
Duijvesteijn, Naomi
Moghaddar, Nasir
van der Werf, Julius H.
Daetwyler, Hans D.
MacLeod, Iona M.
author_facet Bolormaa, Sunduimijid
Chamberlain, Amanda J.
Khansefid, Majid
Stothard, Paul
Swan, Andrew A.
Mason, Brett
Prowse-Wilkins, Claire P.
Duijvesteijn, Naomi
Moghaddar, Nasir
van der Werf, Julius H.
Daetwyler, Hans D.
MacLeod, Iona M.
author_sort Bolormaa, Sunduimijid
collection PubMed
description BACKGROUND: The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep. RESULTS: The accuracy of imputation from the Ovine Infinium(®) HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R(2)) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R(2) below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R(2) in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R(2) ≤ 0.4. CONCLUSIONS: The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R(2)) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6337865
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63378652019-01-23 Accuracy of imputation to whole-genome sequence in sheep Bolormaa, Sunduimijid Chamberlain, Amanda J. Khansefid, Majid Stothard, Paul Swan, Andrew A. Mason, Brett Prowse-Wilkins, Claire P. Duijvesteijn, Naomi Moghaddar, Nasir van der Werf, Julius H. Daetwyler, Hans D. MacLeod, Iona M. Genet Sel Evol Research Article BACKGROUND: The use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep. RESULTS: The accuracy of imputation from the Ovine Infinium(®) HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R(2)) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R(2) below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R(2) in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R(2) ≤ 0.4. CONCLUSIONS: The mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R(2)) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-17 /pmc/articles/PMC6337865/ /pubmed/30654735 http://dx.doi.org/10.1186/s12711-018-0443-5 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bolormaa, Sunduimijid
Chamberlain, Amanda J.
Khansefid, Majid
Stothard, Paul
Swan, Andrew A.
Mason, Brett
Prowse-Wilkins, Claire P.
Duijvesteijn, Naomi
Moghaddar, Nasir
van der Werf, Julius H.
Daetwyler, Hans D.
MacLeod, Iona M.
Accuracy of imputation to whole-genome sequence in sheep
title Accuracy of imputation to whole-genome sequence in sheep
title_full Accuracy of imputation to whole-genome sequence in sheep
title_fullStr Accuracy of imputation to whole-genome sequence in sheep
title_full_unstemmed Accuracy of imputation to whole-genome sequence in sheep
title_short Accuracy of imputation to whole-genome sequence in sheep
title_sort accuracy of imputation to whole-genome sequence in sheep
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337865/
https://www.ncbi.nlm.nih.gov/pubmed/30654735
http://dx.doi.org/10.1186/s12711-018-0443-5
work_keys_str_mv AT bolormaasunduimijid accuracyofimputationtowholegenomesequenceinsheep
AT chamberlainamandaj accuracyofimputationtowholegenomesequenceinsheep
AT khansefidmajid accuracyofimputationtowholegenomesequenceinsheep
AT stothardpaul accuracyofimputationtowholegenomesequenceinsheep
AT swanandrewa accuracyofimputationtowholegenomesequenceinsheep
AT masonbrett accuracyofimputationtowholegenomesequenceinsheep
AT prowsewilkinsclairep accuracyofimputationtowholegenomesequenceinsheep
AT duijvesteijnnaomi accuracyofimputationtowholegenomesequenceinsheep
AT moghaddarnasir accuracyofimputationtowholegenomesequenceinsheep
AT vanderwerfjuliush accuracyofimputationtowholegenomesequenceinsheep
AT daetwylerhansd accuracyofimputationtowholegenomesequenceinsheep
AT macleodionam accuracyofimputationtowholegenomesequenceinsheep