Cargando…

Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle

SIMPLE SUMMARY: The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Duanyang, Teng, Jinyan, Diao, Shuqi, Lin, Qing, Li, Jiaqi, Zhang, Zhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8300388/
https://www.ncbi.nlm.nih.gov/pubmed/34359120
http://dx.doi.org/10.3390/ani11071992
_version_ 1783726461871456256
author Ren, Duanyang
Teng, Jinyan
Diao, Shuqi
Lin, Qing
Li, Jiaqi
Zhang, Zhe
author_facet Ren, Duanyang
Teng, Jinyan
Diao, Shuqi
Lin, Qing
Li, Jiaqi
Zhang, Zhe
author_sort Ren, Duanyang
collection PubMed
description SIMPLE SUMMARY: The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is yet to be determined. With population-level whole-genome sequence data or high-density single nucleotide polymorphism (SNP) data available, this question seems to be answered more convincingly. In this study, we investigated and discussed the impact of four ‘marker density’ measures that reflect genetic or physical distances between SNPs on the accuracy of GP in a Germany Holstein dairy cattle population. Our results showed that the degree of variation of physical distance between adjacent SNPs had significant effects on the accuracy of GP, while the genetic distance between SNPs had no relationship with the accuracy of GP. Therefore, for studies based on high-density SNP data, the default strategy of pruning SNPs based on genetic distance is detrimental to heritability estimation and genomic prediction. The results extended the communities knowledge of ‘marker density’ and provided useful suggestions for the application and research on genome prediction. ABSTRACT: With the availability of high-density single-nucleotide polymorphism (SNP) data and the development of genotype imputation methods, high-density panel-based genomic prediction (GP) has become possible in livestock breeding. It is generally considered that the genomic estimated breeding value (GEBV) accuracy increases with the marker density, while studies have shown that the GEBV accuracy does not increase or even decrease when high-density panels were used. Therefore, in addition to the SNP number, other measurements of ‘marker density’ seem to have impacts on the GEBV accuracy, and exploring the relationship between the GEBV accuracy and the measurements of ‘marker density’ based on high-density SNP or whole-genome sequence data is important for the field of GP. In this study, we constructed different SNP panels with certain SNP numbers (e.g., 1 k) by using the physical distance (PhyD), genetic distance (GenD) and random distance (RanD) between SNPs respectively based on the high-density SNP data of a Germany Holstein dairy cattle population. Therefore, there are three different panels at a certain SNP number level. These panels were used to construct GP models to predict fat percentage, milk yield and somatic cell score. Meanwhile, the mean ([Formula: see text]) and variance ([Formula: see text]) of the physical distance between SNPs and the mean ([Formula: see text]) and variance ([Formula: see text]) of the genetic distance between SNPs in each panel were used as marker density-related measurements and their influence on the GEBV accuracy was investigated. At the same SNP number level, the [Formula: see text] of all panels is basically the same, but the [Formula: see text] , [Formula: see text] and [Formula: see text] are different. Therefore, we only investigated the effects of [Formula: see text] , [Formula: see text] and [Formula: see text] on the GEBV accuracy. The results showed that at a certain SNP number level, the GEBV accuracy was negatively correlated with [Formula: see text] , but not with [Formula: see text] and [Formula: see text]. Compared with GenD and RanD, the [Formula: see text] of panels constructed by PhyD is smaller. The low and moderate-density panels (< 50 k) constructed by RanD or GenD have large [Formula: see text] , which is not conducive to genomic prediction. The GEBV accuracy of the low and moderate-density panels constructed by PhyD is 3.8~34.8% higher than that of the low and moderate-density panels constructed by RanD and GenD. Panels with 20–30 k SNPs constructed by PhyD can achieve the same or slightly higher GEBV accuracy than that of high-density SNP panels for all three traits. In summary, the smaller the variation degree of physical distance between adjacent SNPs, the higher the GEBV accuracy. The low and moderate-density panels construct by physical distance are beneficial to genomic prediction, while pruning high-density SNP data based on genetic distance is detrimental to genomic prediction. The results provide suggestions for the development of SNP panels and the research of genome prediction based on whole-genome sequence data.
format Online
Article
Text
id pubmed-8300388
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83003882021-07-24 Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle Ren, Duanyang Teng, Jinyan Diao, Shuqi Lin, Qing Li, Jiaqi Zhang, Zhe Animals (Basel) Article SIMPLE SUMMARY: The usefulness of genomic prediction (GP) has been widely proofed by breeding analysis in livestock, plants and aquatic populations. It is well known that ‘marker density’ is a critical factor that affects the accuracy of GP, however, how to properly measure ‘marker density’ in GP is yet to be determined. With population-level whole-genome sequence data or high-density single nucleotide polymorphism (SNP) data available, this question seems to be answered more convincingly. In this study, we investigated and discussed the impact of four ‘marker density’ measures that reflect genetic or physical distances between SNPs on the accuracy of GP in a Germany Holstein dairy cattle population. Our results showed that the degree of variation of physical distance between adjacent SNPs had significant effects on the accuracy of GP, while the genetic distance between SNPs had no relationship with the accuracy of GP. Therefore, for studies based on high-density SNP data, the default strategy of pruning SNPs based on genetic distance is detrimental to heritability estimation and genomic prediction. The results extended the communities knowledge of ‘marker density’ and provided useful suggestions for the application and research on genome prediction. ABSTRACT: With the availability of high-density single-nucleotide polymorphism (SNP) data and the development of genotype imputation methods, high-density panel-based genomic prediction (GP) has become possible in livestock breeding. It is generally considered that the genomic estimated breeding value (GEBV) accuracy increases with the marker density, while studies have shown that the GEBV accuracy does not increase or even decrease when high-density panels were used. Therefore, in addition to the SNP number, other measurements of ‘marker density’ seem to have impacts on the GEBV accuracy, and exploring the relationship between the GEBV accuracy and the measurements of ‘marker density’ based on high-density SNP or whole-genome sequence data is important for the field of GP. In this study, we constructed different SNP panels with certain SNP numbers (e.g., 1 k) by using the physical distance (PhyD), genetic distance (GenD) and random distance (RanD) between SNPs respectively based on the high-density SNP data of a Germany Holstein dairy cattle population. Therefore, there are three different panels at a certain SNP number level. These panels were used to construct GP models to predict fat percentage, milk yield and somatic cell score. Meanwhile, the mean ([Formula: see text]) and variance ([Formula: see text]) of the physical distance between SNPs and the mean ([Formula: see text]) and variance ([Formula: see text]) of the genetic distance between SNPs in each panel were used as marker density-related measurements and their influence on the GEBV accuracy was investigated. At the same SNP number level, the [Formula: see text] of all panels is basically the same, but the [Formula: see text] , [Formula: see text] and [Formula: see text] are different. Therefore, we only investigated the effects of [Formula: see text] , [Formula: see text] and [Formula: see text] on the GEBV accuracy. The results showed that at a certain SNP number level, the GEBV accuracy was negatively correlated with [Formula: see text] , but not with [Formula: see text] and [Formula: see text]. Compared with GenD and RanD, the [Formula: see text] of panels constructed by PhyD is smaller. The low and moderate-density panels (< 50 k) constructed by RanD or GenD have large [Formula: see text] , which is not conducive to genomic prediction. The GEBV accuracy of the low and moderate-density panels constructed by PhyD is 3.8~34.8% higher than that of the low and moderate-density panels constructed by RanD and GenD. Panels with 20–30 k SNPs constructed by PhyD can achieve the same or slightly higher GEBV accuracy than that of high-density SNP panels for all three traits. In summary, the smaller the variation degree of physical distance between adjacent SNPs, the higher the GEBV accuracy. The low and moderate-density panels construct by physical distance are beneficial to genomic prediction, while pruning high-density SNP data based on genetic distance is detrimental to genomic prediction. The results provide suggestions for the development of SNP panels and the research of genome prediction based on whole-genome sequence data. MDPI 2021-07-02 /pmc/articles/PMC8300388/ /pubmed/34359120 http://dx.doi.org/10.3390/ani11071992 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ren, Duanyang
Teng, Jinyan
Diao, Shuqi
Lin, Qing
Li, Jiaqi
Zhang, Zhe
Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle
title Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle
title_full Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle
title_fullStr Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle
title_full_unstemmed Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle
title_short Impact of Marker Pruning Strategies Based on Different Measurements of Marker Distance on Genomic Prediction in Dairy Cattle
title_sort impact of marker pruning strategies based on different measurements of marker distance on genomic prediction in dairy cattle
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8300388/
https://www.ncbi.nlm.nih.gov/pubmed/34359120
http://dx.doi.org/10.3390/ani11071992
work_keys_str_mv AT renduanyang impactofmarkerpruningstrategiesbasedondifferentmeasurementsofmarkerdistanceongenomicpredictionindairycattle
AT tengjinyan impactofmarkerpruningstrategiesbasedondifferentmeasurementsofmarkerdistanceongenomicpredictionindairycattle
AT diaoshuqi impactofmarkerpruningstrategiesbasedondifferentmeasurementsofmarkerdistanceongenomicpredictionindairycattle
AT linqing impactofmarkerpruningstrategiesbasedondifferentmeasurementsofmarkerdistanceongenomicpredictionindairycattle
AT lijiaqi impactofmarkerpruningstrategiesbasedondifferentmeasurementsofmarkerdistanceongenomicpredictionindairycattle
AT zhangzhe impactofmarkerpruningstrategiesbasedondifferentmeasurementsofmarkerdistanceongenomicpredictionindairycattle