Cargando…

Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data

Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Tianyu, Zhang, Pengfei, Garrick, Dorian, Gao, Huijiang, Wang, Lixian, Zhao, Fuping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762119/
https://www.ncbi.nlm.nih.gov/pubmed/35046990
http://dx.doi.org/10.3389/fgene.2021.704118
_version_ 1784633689948291072
author Deng, Tianyu
Zhang, Pengfei
Garrick, Dorian
Gao, Huijiang
Wang, Lixian
Zhao, Fuping
author_facet Deng, Tianyu
Zhang, Pengfei
Garrick, Dorian
Gao, Huijiang
Wang, Lixian
Zhao, Fuping
author_sort Deng, Tianyu
collection PubMed
description Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.
format Online
Article
Text
id pubmed-8762119
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-87621192022-01-18 Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data Deng, Tianyu Zhang, Pengfei Garrick, Dorian Gao, Huijiang Wang, Lixian Zhao, Fuping Front Genet Genetics Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals. Frontiers Media S.A. 2022-01-03 /pmc/articles/PMC8762119/ /pubmed/35046990 http://dx.doi.org/10.3389/fgene.2021.704118 Text en Copyright © 2022 Deng, Zhang, Garrick, Gao, Wang and Zhao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Deng, Tianyu
Zhang, Pengfei
Garrick, Dorian
Gao, Huijiang
Wang, Lixian
Zhao, Fuping
Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
title Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
title_full Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
title_fullStr Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
title_full_unstemmed Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
title_short Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
title_sort comparison of genotype imputation for snp array and low-coverage whole-genome sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8762119/
https://www.ncbi.nlm.nih.gov/pubmed/35046990
http://dx.doi.org/10.3389/fgene.2021.704118
work_keys_str_mv AT dengtianyu comparisonofgenotypeimputationforsnparrayandlowcoveragewholegenomesequencingdata
AT zhangpengfei comparisonofgenotypeimputationforsnparrayandlowcoveragewholegenomesequencingdata
AT garrickdorian comparisonofgenotypeimputationforsnparrayandlowcoveragewholegenomesequencingdata
AT gaohuijiang comparisonofgenotypeimputationforsnparrayandlowcoveragewholegenomesequencingdata
AT wanglixian comparisonofgenotypeimputationforsnparrayandlowcoveragewholegenomesequencingdata
AT zhaofuping comparisonofgenotypeimputationforsnparrayandlowcoveragewholegenomesequencingdata