Cargando…

Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population

BACKGROUND: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obt...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Shaopan, Yuan, Xiaolong, Lin, Xiran, Gao, Ning, Luo, Yuanyu, Chen, Zanmou, Li, Jiaqi, Zhang, Xiquan, Zhang, Zhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5861640/
https://www.ncbi.nlm.nih.gov/pubmed/29581880
http://dx.doi.org/10.1186/s40104-018-0241-5
_version_ 1783308127624495104
author Ye, Shaopan
Yuan, Xiaolong
Lin, Xiran
Gao, Ning
Luo, Yuanyu
Chen, Zanmou
Li, Jiaqi
Zhang, Xiquan
Zhang, Zhe
author_facet Ye, Shaopan
Yuan, Xiaolong
Lin, Xiran
Gao, Ning
Luo, Yuanyu
Chen, Zanmou
Li, Jiaqi
Zhang, Xiquan
Zhang, Zhe
author_sort Ye, Shaopan
collection PubMed
description BACKGROUND: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. RESULTS: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. CONCLUSIONS: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40104-018-0241-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5861640
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58616402018-03-26 Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population Ye, Shaopan Yuan, Xiaolong Lin, Xiran Gao, Ning Luo, Yuanyu Chen, Zanmou Li, Jiaqi Zhang, Xiquan Zhang, Zhe J Anim Sci Biotechnol Research BACKGROUND: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. RESULTS: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. CONCLUSIONS: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40104-018-0241-5) contains supplementary material, which is available to authorized users. BioMed Central 2018-03-21 /pmc/articles/PMC5861640/ /pubmed/29581880 http://dx.doi.org/10.1186/s40104-018-0241-5 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ye, Shaopan
Yuan, Xiaolong
Lin, Xiran
Gao, Ning
Luo, Yuanyu
Chen, Zanmou
Li, Jiaqi
Zhang, Xiquan
Zhang, Zhe
Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_full Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_fullStr Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_full_unstemmed Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_short Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
title_sort imputation from snp chip to sequence: a case study in a chinese indigenous chicken population
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5861640/
https://www.ncbi.nlm.nih.gov/pubmed/29581880
http://dx.doi.org/10.1186/s40104-018-0241-5
work_keys_str_mv AT yeshaopan imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT yuanxiaolong imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT linxiran imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT gaoning imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT luoyuanyu imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT chenzanmou imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT lijiaqi imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT zhangxiquan imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation
AT zhangzhe imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation