Cargando…

Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population

BACKGROUND: Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve predic...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Di, Zhao, Yiqiang, Zhang, Ran, Wu, Hanyu, Cai, Gengyuan, Wu, Zhenfang, Wang, Yuzhe, Hu, Xiaoxiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583454/
https://www.ncbi.nlm.nih.gov/pubmed/37853325
http://dx.doi.org/10.1186/s12711-023-00843-w
_version_ 1785122556504702976
author Zhu, Di
Zhao, Yiqiang
Zhang, Ran
Wu, Hanyu
Cai, Gengyuan
Wu, Zhenfang
Wang, Yuzhe
Hu, Xiaoxiang
author_facet Zhu, Di
Zhao, Yiqiang
Zhang, Ran
Wu, Hanyu
Cai, Gengyuan
Wu, Zhenfang
Wang, Yuzhe
Hu, Xiaoxiang
author_sort Zhu, Di
collection PubMed
description BACKGROUND: Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data. RESULTS: We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r(2)). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN. CONCLUSIONS: The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12711-023-00843-w.
format Online
Article
Text
id pubmed-10583454
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-105834542023-10-19 Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population Zhu, Di Zhao, Yiqiang Zhang, Ran Wu, Hanyu Cai, Gengyuan Wu, Zhenfang Wang, Yuzhe Hu, Xiaoxiang Genet Sel Evol Research Article BACKGROUND: Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data. RESULTS: We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r(2)). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN. CONCLUSIONS: The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12711-023-00843-w. BioMed Central 2023-10-18 /pmc/articles/PMC10583454/ /pubmed/37853325 http://dx.doi.org/10.1186/s12711-023-00843-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Zhu, Di
Zhao, Yiqiang
Zhang, Ran
Wu, Hanyu
Cai, Gengyuan
Wu, Zhenfang
Wang, Yuzhe
Hu, Xiaoxiang
Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population
title Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population
title_full Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population
title_fullStr Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population
title_full_unstemmed Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population
title_short Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population
title_sort genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure duroc population
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583454/
https://www.ncbi.nlm.nih.gov/pubmed/37853325
http://dx.doi.org/10.1186/s12711-023-00843-w
work_keys_str_mv AT zhudi genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation
AT zhaoyiqiang genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation
AT zhangran genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation
AT wuhanyu genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation
AT caigengyuan genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation
AT wuzhenfang genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation
AT wangyuzhe genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation
AT huxiaoxiang genomicpredictionbasedonselectivelinkagedisequilibriumpruningoflowcoveragewholegenomesequencevariantsinapuredurocpopulation