Cargando…

Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data

SIMPLE SUMMARY: Genomic selection (GS) is increasingly widely used in animal breeding, owing to its high efficiency in the genetic improvement of economic traits. In China, GS has been implemented for genetic evaluation of young bulls in dairy cattle breeding programs since 2012. GS is commonly base...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Shanshan, Yu, Jian, Kang, Huimin, Liu, Jianfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9495168/
https://www.ncbi.nlm.nih.gov/pubmed/36139283
http://dx.doi.org/10.3390/ani12182419
_version_ 1784793958295011328
author Li, Shanshan
Yu, Jian
Kang, Huimin
Liu, Jianfeng
author_facet Li, Shanshan
Yu, Jian
Kang, Huimin
Liu, Jianfeng
author_sort Li, Shanshan
collection PubMed
description SIMPLE SUMMARY: Genomic selection (GS) is increasingly widely used in animal breeding, owing to its high efficiency in the genetic improvement of economic traits. In China, GS has been implemented for genetic evaluation of young bulls in dairy cattle breeding programs since 2012. GS is commonly based on single nucleotide polymorphism (SNP) chips. The cost of whole genome sequencing (WGS) has decreased tremendously in recent years, allowing increased studies of WGS-based GS. In this study, based on the imputed WGS data of approximately 8000 Chinese Holsteins, we investigated the performance of GS of milk production traits using the feature selection method of regularized regression. The results showed that WGS-based GS using regularized regression models and the commonly used linear mixed models achieved comparable prediction accuracies. For milk and protein yields, GS using a combination of SNPs selected with a regularized regression model and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with a linear mixed model combined with 50K SNP chip data performed best for fat yield. The proposed method of GS based on WGS data, i.e., feature selection using regularization regression models, provides a valuable novel strategy for genomic selection. ABSTRACT: Genomic selection (GS) is an efficient method to improve genetically economic traits. Feature selection is an important method for GS based on whole-genome sequencing (WGS) data. We investigated the prediction performance of GS of milk production traits using imputed WGS data on 7957 Chinese Holsteins. We used two regularized regression models, least absolute shrinkage and selection operator (LASSO) and elastic net (EN) for feature selection. For comparison, we performed genome-wide association studies based on a linear mixed model (LMM), and the N single nucleotide polymorphisms (SNPs) with the lowest p-values were selected (LMM(LASSO) and LMM(EN)), where N was the number of non-zero effect SNPs selected by LASSO or EN. GS was conducted using a genomic best linear unbiased prediction (GBLUP) model and several sets of SNPs: (1) selected WGS SNPs; (2) 50K SNP chip data; (3) WGS data; and (4) a combined set of selected WGS SNPs and 50K SNP chip data. The results showed that the prediction accuracies of GS with features selected using LASSO or EN were comparable to those using features selected with LMM(LASSO) or LMM(EN). For milk and protein yields, GS using a combination of SNPs selected with LASSO and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with LMM(LASSO) combined with 50K SNP chip data performed best for fat yield. The proposed method, feature selection using regularization regression models, provides a valuable novel strategy for WGS-based GS.
format Online
Article
Text
id pubmed-9495168
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94951682022-09-23 Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data Li, Shanshan Yu, Jian Kang, Huimin Liu, Jianfeng Animals (Basel) Article SIMPLE SUMMARY: Genomic selection (GS) is increasingly widely used in animal breeding, owing to its high efficiency in the genetic improvement of economic traits. In China, GS has been implemented for genetic evaluation of young bulls in dairy cattle breeding programs since 2012. GS is commonly based on single nucleotide polymorphism (SNP) chips. The cost of whole genome sequencing (WGS) has decreased tremendously in recent years, allowing increased studies of WGS-based GS. In this study, based on the imputed WGS data of approximately 8000 Chinese Holsteins, we investigated the performance of GS of milk production traits using the feature selection method of regularized regression. The results showed that WGS-based GS using regularized regression models and the commonly used linear mixed models achieved comparable prediction accuracies. For milk and protein yields, GS using a combination of SNPs selected with a regularized regression model and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with a linear mixed model combined with 50K SNP chip data performed best for fat yield. The proposed method of GS based on WGS data, i.e., feature selection using regularization regression models, provides a valuable novel strategy for genomic selection. ABSTRACT: Genomic selection (GS) is an efficient method to improve genetically economic traits. Feature selection is an important method for GS based on whole-genome sequencing (WGS) data. We investigated the prediction performance of GS of milk production traits using imputed WGS data on 7957 Chinese Holsteins. We used two regularized regression models, least absolute shrinkage and selection operator (LASSO) and elastic net (EN) for feature selection. For comparison, we performed genome-wide association studies based on a linear mixed model (LMM), and the N single nucleotide polymorphisms (SNPs) with the lowest p-values were selected (LMM(LASSO) and LMM(EN)), where N was the number of non-zero effect SNPs selected by LASSO or EN. GS was conducted using a genomic best linear unbiased prediction (GBLUP) model and several sets of SNPs: (1) selected WGS SNPs; (2) 50K SNP chip data; (3) WGS data; and (4) a combined set of selected WGS SNPs and 50K SNP chip data. The results showed that the prediction accuracies of GS with features selected using LASSO or EN were comparable to those using features selected with LMM(LASSO) or LMM(EN). For milk and protein yields, GS using a combination of SNPs selected with LASSO and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with LMM(LASSO) combined with 50K SNP chip data performed best for fat yield. The proposed method, feature selection using regularization regression models, provides a valuable novel strategy for WGS-based GS. MDPI 2022-09-14 /pmc/articles/PMC9495168/ /pubmed/36139283 http://dx.doi.org/10.3390/ani12182419 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Shanshan
Yu, Jian
Kang, Huimin
Liu, Jianfeng
Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
title Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
title_full Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
title_fullStr Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
title_full_unstemmed Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
title_short Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
title_sort genomic selection in chinese holsteins using regularized regression models for feature selection of whole genome sequencing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9495168/
https://www.ncbi.nlm.nih.gov/pubmed/36139283
http://dx.doi.org/10.3390/ani12182419
work_keys_str_mv AT lishanshan genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata
AT yujian genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata
AT kanghuimin genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata
AT liujianfeng genomicselectioninchineseholsteinsusingregularizedregressionmodelsforfeatureselectionofwholegenomesequencingdata