Cargando…
Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
SIMPLE SUMMARY: Genomic selection (GS) is increasingly widely used in animal breeding, owing to its high efficiency in the genetic improvement of economic traits. In China, GS has been implemented for genetic evaluation of young bulls in dairy cattle breeding programs since 2012. GS is commonly base...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9495168/ https://www.ncbi.nlm.nih.gov/pubmed/36139283 http://dx.doi.org/10.3390/ani12182419 |
_version_ | 1784793958295011328 |
---|---|
author | Li, Shanshan Yu, Jian Kang, Huimin Liu, Jianfeng |
author_facet | Li, Shanshan Yu, Jian Kang, Huimin Liu, Jianfeng |
author_sort | Li, Shanshan |
collection | PubMed |
description | SIMPLE SUMMARY: Genomic selection (GS) is increasingly widely used in animal breeding, owing to its high efficiency in the genetic improvement of economic traits. In China, GS has been implemented for genetic evaluation of young bulls in dairy cattle breeding programs since 2012. GS is commonly based on single nucleotide polymorphism (SNP) chips. The cost of whole genome sequencing (WGS) has decreased tremendously in recent years, allowing increased studies of WGS-based GS. In this study, based on the imputed WGS data of approximately 8000 Chinese Holsteins, we investigated the performance of GS of milk production traits using the feature selection method of regularized regression. The results showed that WGS-based GS using regularized regression models and the commonly used linear mixed models achieved comparable prediction accuracies. For milk and protein yields, GS using a combination of SNPs selected with a regularized regression model and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with a linear mixed model combined with 50K SNP chip data performed best for fat yield. The proposed method of GS based on WGS data, i.e., feature selection using regularization regression models, provides a valuable novel strategy for genomic selection. ABSTRACT: Genomic selection (GS) is an efficient method to improve genetically economic traits. Feature selection is an important method for GS based on whole-genome sequencing (WGS) data. We investigated the prediction performance of GS of milk production traits using imputed WGS data on 7957 Chinese Holsteins. We used two regularized regression models, least absolute shrinkage and selection operator (LASSO) and elastic net (EN) for feature selection. For comparison, we performed genome-wide association studies based on a linear mixed model (LMM), and the N single nucleotide polymorphisms (SNPs) with the lowest p-values were selected (LMM(LASSO) and LMM(EN)), where N was the number of non-zero effect SNPs selected by LASSO or EN. GS was conducted using a genomic best linear unbiased prediction (GBLUP) model and several sets of SNPs: (1) selected WGS SNPs; (2) 50K SNP chip data; (3) WGS data; and (4) a combined set of selected WGS SNPs and 50K SNP chip data. The results showed that the prediction accuracies of GS with features selected using LASSO or EN were comparable to those using features selected with LMM(LASSO) or LMM(EN). For milk and protein yields, GS using a combination of SNPs selected with LASSO and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with LMM(LASSO) combined with 50K SNP chip data performed best for fat yield. The proposed method, feature selection using regularization regression models, provides a valuable novel strategy for WGS-based GS. |
format | Online Article Text |
id | pubmed-9495168 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94951682022-09-23 Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data Li, Shanshan Yu, Jian Kang, Huimin Liu, Jianfeng Animals (Basel) Article SIMPLE SUMMARY: Genomic selection (GS) is increasingly widely used in animal breeding, owing to its high efficiency in the genetic improvement of economic traits. In China, GS has been implemented for genetic evaluation of young bulls in dairy cattle breeding programs since 2012. GS is commonly based on single nucleotide polymorphism (SNP) chips. The cost of whole genome sequencing (WGS) has decreased tremendously in recent years, allowing increased studies of WGS-based GS. In this study, based on the imputed WGS data of approximately 8000 Chinese Holsteins, we investigated the performance of GS of milk production traits using the feature selection method of regularized regression. The results showed that WGS-based GS using regularized regression models and the commonly used linear mixed models achieved comparable prediction accuracies. For milk and protein yields, GS using a combination of SNPs selected with a regularized regression model and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with a linear mixed model combined with 50K SNP chip data performed best for fat yield. The proposed method of GS based on WGS data, i.e., feature selection using regularization regression models, provides a valuable novel strategy for genomic selection. ABSTRACT: Genomic selection (GS) is an efficient method to improve genetically economic traits. Feature selection is an important method for GS based on whole-genome sequencing (WGS) data. We investigated the prediction performance of GS of milk production traits using imputed WGS data on 7957 Chinese Holsteins. We used two regularized regression models, least absolute shrinkage and selection operator (LASSO) and elastic net (EN) for feature selection. For comparison, we performed genome-wide association studies based on a linear mixed model (LMM), and the N single nucleotide polymorphisms (SNPs) with the lowest p-values were selected (LMM(LASSO) and LMM(EN)), where N was the number of non-zero effect SNPs selected by LASSO or EN. GS was conducted using a genomic best linear unbiased prediction (GBLUP) model and several sets of SNPs: (1) selected WGS SNPs; (2) 50K SNP chip data; (3) WGS data; and (4) a combined set of selected WGS SNPs and 50K SNP chip data. The results showed that the prediction accuracies of GS with features selected using LASSO or EN were comparable to those using features selected with LMM(LASSO) or LMM(EN). For milk and protein yields, GS using a combination of SNPs selected with LASSO and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with LMM(LASSO) combined with 50K SNP chip data performed best for fat yield. The proposed method, feature selection using regularization regression models, provides a valuable novel strategy for WGS-based GS. MDPI 2022-09-14 /pmc/articles/PMC9495168/ /pubmed/36139283 http://dx.doi.org/10.3390/ani12182419 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Li, Shanshan Yu, Jian Kang, Huimin Liu, Jianfeng Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data |
title | Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data |
title_full | Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data |
title_fullStr | Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data |
title_full_unstemmed | Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data |
title_short | Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data |
title_sort | genomic selection in chinese holsteins using regularized regression models for feature selection of whole genome sequencing data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9495168/ https://www.ncbi.nlm.nih.gov/pubmed/36139283 http://dx.doi.org/10.3390/ani12182419 |
work_keys_str_mv | AT lishanshan genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata AT yujian genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata AT kanghuimin genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata AT liujianfeng genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata |