Cargando…

Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals

Genotype imputation from BeadChip to whole-genome sequencing (WGS) data is a cost-effective method of obtaining genotypes of WGS variants. Beagle, one of the most popular imputation software programs, has been widely used for genotype inference in humans and non-human species. A few studies have sys...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Yifan, Song, Hailiang, Gao, Hongding, Zhang, Qin, Ding, Xiangdong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9459117/
https://www.ncbi.nlm.nih.gov/pubmed/36092888
http://dx.doi.org/10.3389/fgene.2022.963654
_version_ 1784786433782841344
author Jiang, Yifan
Song, Hailiang
Gao, Hongding
Zhang, Qin
Ding, Xiangdong
author_facet Jiang, Yifan
Song, Hailiang
Gao, Hongding
Zhang, Qin
Ding, Xiangdong
author_sort Jiang, Yifan
collection PubMed
description Genotype imputation from BeadChip to whole-genome sequencing (WGS) data is a cost-effective method of obtaining genotypes of WGS variants. Beagle, one of the most popular imputation software programs, has been widely used for genotype inference in humans and non-human species. A few studies have systematically and comprehensively compared the performance of beagle versions and parameter settings of farm animals. Here, we investigated the imputation performance of three representative versions of Beagle (Beagle 4.1, Beagle 5.0, and Beagle 5.4), and the effective population size (Ne) parameter setting for three species (cattle, pig, and chicken). Six scenarios were investigated to explore the impact of certain key factors on imputation performance. The results showed that the default Ne (1,000,000) is not suitable for livestock and poultry in small reference or low-density arrays of target panels, with 2.47%–10.45% drops in accuracy. Beagle 5 significantly reduced the computation time (4.66-fold–13.24-fold) without an accuracy loss. In addition, using a large combined-reference panel or high-density chip provides greater imputation accuracy, especially for low minor allele frequency (MAF) variants. Finally, a highly significant correlation in the measures of imputation accuracy can be obtained with an MAF equal to or greater than 0.05.
format Online
Article
Text
id pubmed-9459117
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-94591172022-09-10 Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals Jiang, Yifan Song, Hailiang Gao, Hongding Zhang, Qin Ding, Xiangdong Front Genet Genetics Genotype imputation from BeadChip to whole-genome sequencing (WGS) data is a cost-effective method of obtaining genotypes of WGS variants. Beagle, one of the most popular imputation software programs, has been widely used for genotype inference in humans and non-human species. A few studies have systematically and comprehensively compared the performance of beagle versions and parameter settings of farm animals. Here, we investigated the imputation performance of three representative versions of Beagle (Beagle 4.1, Beagle 5.0, and Beagle 5.4), and the effective population size (Ne) parameter setting for three species (cattle, pig, and chicken). Six scenarios were investigated to explore the impact of certain key factors on imputation performance. The results showed that the default Ne (1,000,000) is not suitable for livestock and poultry in small reference or low-density arrays of target panels, with 2.47%–10.45% drops in accuracy. Beagle 5 significantly reduced the computation time (4.66-fold–13.24-fold) without an accuracy loss. In addition, using a large combined-reference panel or high-density chip provides greater imputation accuracy, especially for low minor allele frequency (MAF) variants. Finally, a highly significant correlation in the measures of imputation accuracy can be obtained with an MAF equal to or greater than 0.05. Frontiers Media S.A. 2022-08-26 /pmc/articles/PMC9459117/ /pubmed/36092888 http://dx.doi.org/10.3389/fgene.2022.963654 Text en Copyright © 2022 Jiang, Song, Gao, Zhang and Ding. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Jiang, Yifan
Song, Hailiang
Gao, Hongding
Zhang, Qin
Ding, Xiangdong
Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals
title Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals
title_full Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals
title_fullStr Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals
title_full_unstemmed Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals
title_short Exploring the optimal strategy of imputation from SNP array to whole-genome sequencing data in farm animals
title_sort exploring the optimal strategy of imputation from snp array to whole-genome sequencing data in farm animals
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9459117/
https://www.ncbi.nlm.nih.gov/pubmed/36092888
http://dx.doi.org/10.3389/fgene.2022.963654
work_keys_str_mv AT jiangyifan exploringtheoptimalstrategyofimputationfromsnparraytowholegenomesequencingdatainfarmanimals
AT songhailiang exploringtheoptimalstrategyofimputationfromsnparraytowholegenomesequencingdatainfarmanimals
AT gaohongding exploringtheoptimalstrategyofimputationfromsnparraytowholegenomesequencingdatainfarmanimals
AT zhangqin exploringtheoptimalstrategyofimputationfromsnparraytowholegenomesequencingdatainfarmanimals
AT dingxiangdong exploringtheoptimalstrategyofimputationfromsnparraytowholegenomesequencingdatainfarmanimals