Cargando…

Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction

Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (S...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Shaopan, Gao, Ning, Zheng, Rongrong, Chen, Zitao, Teng, Jinyan, Yuan, Xiaolong, Zhang, Hao, Chen, Zanmou, Zhang, Xiquan, Li, Jiaqi, Zhang, Zhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6650575/
https://www.ncbi.nlm.nih.gov/pubmed/31379929
http://dx.doi.org/10.3389/fgene.2019.00673
_version_ 1783438158538473472
author Ye, Shaopan
Gao, Ning
Zheng, Rongrong
Chen, Zitao
Teng, Jinyan
Yuan, Xiaolong
Zhang, Hao
Chen, Zanmou
Zhang, Xiquan
Li, Jiaqi
Zhang, Zhe
author_facet Ye, Shaopan
Gao, Ning
Zheng, Rongrong
Chen, Zitao
Teng, Jinyan
Yuan, Xiaolong
Zhang, Hao
Chen, Zanmou
Zhang, Xiquan
Li, Jiaqi
Zhang, Zhe
author_sort Ye, Shaopan
collection PubMed
description Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (SNP) chip data to WGS data using different reference panels. We also investigated the prediction accuracy of genomic best linear unbiased prediction (GBLUP) using imputed WGS data from different reference panels, linkage disequilibrium (LD)-based marker pruning, and pre-selected variants based on Genome-wide association society (GWAS) results. Results showed that the imputation accuracies from 600K to WGS data were 0.873 ± 0.038, 0.906 ± 0.036, and 0.979 ± 0.010 for the internal, external, and combined reference panels, respectively. In most traits of chickens, the prediction accuracy of imputed WGS data obtained from the internal reference panel was greater than or equal to that of the combined reference panel; the external reference panel had the lowest prediction accuracy. Compared with 600K chip data, GBLUP with imputed WGS data had only a small increase (1–3%) in prediction accuracy. Using only variants selected from imputed WGS data based on GWAS results resulted in almost no increase for most traits and even increased the bias of the regression coefficient. The impact of the degree of LD of selected and remaining variants on prediction accuracy was different. For average daily gain (ADG), residual feed intake (RFI), intestine length (IL), and body weight in 91 days (BW91), the accuracy of GBLUP increased as the degree of LD of selected variants decreased, but the opposite relationship occurred for the remaining variants. But for breast muscle weight (BMW) and average daily feed intake (ADFI), the accuracy of GBLUP increased as the degree of LD of selected variants increased, and the degree of LD of remaining variants had a small effect on prediction accuracy. Overall, the optimal imputation strategy to obtain WGS data for genomic prediction should consider the relationship between selected individuals and target population individuals to avoid heterogeneity of imputation. LD-based marker pruning can be used to improve the accuracy of genomic prediction using imputed WGS data.
format Online
Article
Text
id pubmed-6650575
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-66505752019-08-02 Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction Ye, Shaopan Gao, Ning Zheng, Rongrong Chen, Zitao Teng, Jinyan Yuan, Xiaolong Zhang, Hao Chen, Zanmou Zhang, Xiquan Li, Jiaqi Zhang, Zhe Front Genet Genetics Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (SNP) chip data to WGS data using different reference panels. We also investigated the prediction accuracy of genomic best linear unbiased prediction (GBLUP) using imputed WGS data from different reference panels, linkage disequilibrium (LD)-based marker pruning, and pre-selected variants based on Genome-wide association society (GWAS) results. Results showed that the imputation accuracies from 600K to WGS data were 0.873 ± 0.038, 0.906 ± 0.036, and 0.979 ± 0.010 for the internal, external, and combined reference panels, respectively. In most traits of chickens, the prediction accuracy of imputed WGS data obtained from the internal reference panel was greater than or equal to that of the combined reference panel; the external reference panel had the lowest prediction accuracy. Compared with 600K chip data, GBLUP with imputed WGS data had only a small increase (1–3%) in prediction accuracy. Using only variants selected from imputed WGS data based on GWAS results resulted in almost no increase for most traits and even increased the bias of the regression coefficient. The impact of the degree of LD of selected and remaining variants on prediction accuracy was different. For average daily gain (ADG), residual feed intake (RFI), intestine length (IL), and body weight in 91 days (BW91), the accuracy of GBLUP increased as the degree of LD of selected variants decreased, but the opposite relationship occurred for the remaining variants. But for breast muscle weight (BMW) and average daily feed intake (ADFI), the accuracy of GBLUP increased as the degree of LD of selected variants increased, and the degree of LD of remaining variants had a small effect on prediction accuracy. Overall, the optimal imputation strategy to obtain WGS data for genomic prediction should consider the relationship between selected individuals and target population individuals to avoid heterogeneity of imputation. LD-based marker pruning can be used to improve the accuracy of genomic prediction using imputed WGS data. Frontiers Media S.A. 2019-07-17 /pmc/articles/PMC6650575/ /pubmed/31379929 http://dx.doi.org/10.3389/fgene.2019.00673 Text en Copyright © 2019 Ye, Gao, Zheng, Chen, Teng, Yuan, Zhang, Chen, Zhang, Li and Zhang http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Ye, Shaopan
Gao, Ning
Zheng, Rongrong
Chen, Zitao
Teng, Jinyan
Yuan, Xiaolong
Zhang, Hao
Chen, Zanmou
Zhang, Xiquan
Li, Jiaqi
Zhang, Zhe
Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction
title Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction
title_full Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction
title_fullStr Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction
title_full_unstemmed Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction
title_short Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction
title_sort strategies for obtaining and pruning imputed whole-genome sequence data for genomic prediction
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6650575/
https://www.ncbi.nlm.nih.gov/pubmed/31379929
http://dx.doi.org/10.3389/fgene.2019.00673
work_keys_str_mv AT yeshaopan strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT gaoning strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT zhengrongrong strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT chenzitao strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT tengjinyan strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT yuanxiaolong strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT zhanghao strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT chenzanmou strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT zhangxiquan strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT lijiaqi strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction
AT zhangzhe strategiesforobtainingandpruningimputedwholegenomesequencedataforgenomicprediction