Cargando…
Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population
BACKGROUND: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obt...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5861640/ https://www.ncbi.nlm.nih.gov/pubmed/29581880 http://dx.doi.org/10.1186/s40104-018-0241-5 |
_version_ | 1783308127624495104 |
---|---|
author | Ye, Shaopan Yuan, Xiaolong Lin, Xiran Gao, Ning Luo, Yuanyu Chen, Zanmou Li, Jiaqi Zhang, Xiquan Zhang, Zhe |
author_facet | Ye, Shaopan Yuan, Xiaolong Lin, Xiran Gao, Ning Luo, Yuanyu Chen, Zanmou Li, Jiaqi Zhang, Xiquan Zhang, Zhe |
author_sort | Ye, Shaopan |
collection | PubMed |
description | BACKGROUND: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. RESULTS: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. CONCLUSIONS: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40104-018-0241-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5861640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-58616402018-03-26 Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population Ye, Shaopan Yuan, Xiaolong Lin, Xiran Gao, Ning Luo, Yuanyu Chen, Zanmou Li, Jiaqi Zhang, Xiquan Zhang, Zhe J Anim Sci Biotechnol Research BACKGROUND: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. RESULTS: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. CONCLUSIONS: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40104-018-0241-5) contains supplementary material, which is available to authorized users. BioMed Central 2018-03-21 /pmc/articles/PMC5861640/ /pubmed/29581880 http://dx.doi.org/10.1186/s40104-018-0241-5 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ye, Shaopan Yuan, Xiaolong Lin, Xiran Gao, Ning Luo, Yuanyu Chen, Zanmou Li, Jiaqi Zhang, Xiquan Zhang, Zhe Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population |
title | Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population |
title_full | Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population |
title_fullStr | Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population |
title_full_unstemmed | Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population |
title_short | Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population |
title_sort | imputation from snp chip to sequence: a case study in a chinese indigenous chicken population |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5861640/ https://www.ncbi.nlm.nih.gov/pubmed/29581880 http://dx.doi.org/10.1186/s40104-018-0241-5 |
work_keys_str_mv | AT yeshaopan imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT yuanxiaolong imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT linxiran imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT gaoning imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT luoyuanyu imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT chenzanmou imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT lijiaqi imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT zhangxiquan imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation AT zhangzhe imputationfromsnpchiptosequenceacasestudyinachineseindigenouschickenpopulation |