Cargando…

Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions

BACKGROUND: Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estim...

Descripción completa

Detalles Bibliográficos
Autores principales: Ogutu, Joseph O, Schulz-Streeck, Torben, Piepho, Hans-Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3363152/
https://www.ncbi.nlm.nih.gov/pubmed/22640436
http://dx.doi.org/10.1186/1753-6561-6-S2-S10
_version_ 1782234304389054464
author Ogutu, Joseph O
Schulz-Streeck, Torben
Piepho, Hans-Peter
author_facet Ogutu, Joseph O
Schulz-Streeck, Torben
Piepho, Hans-Peter
author_sort Ogutu, Joseph O
collection PubMed
description BACKGROUND: Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. METHODS: We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). RESULTS: The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error. The performance of RR-BLUP was also somewhat better than that of ridge regression. This pattern was replicated by the Pearson correlation between predicted GEBVs and the true breeding values (TBV) and the root mean squared error calculated with respect to TBV, except that accuracy was lower for all models, most especially for the adaptive elastic net. The correlation between the predicted GEBV and simulated phenotypic values based on the fivefold CV also revealed a similar pattern except that the adaptive elastic net had lower accuracy than both the ridge regression methods. CONCLUSIONS: All the six models had relatively high prediction accuracies for the simulated data set. Accuracy was higher for the lasso type methods than for ridge regression and ridge regression BLUP.
format Online
Article
Text
id pubmed-3363152
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33631522012-06-01 Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions Ogutu, Joseph O Schulz-Streeck, Torben Piepho, Hans-Peter BMC Proc Proceedings BACKGROUND: Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. METHODS: We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). RESULTS: The elastic net, lasso, adaptive lasso and the adaptive elastic net all had similar accuracies but outperformed ridge regression and ridge regression BLUP in terms of the Pearson correlation between predicted GEBVs and the true genomic value as well as the root mean squared error. The performance of RR-BLUP was also somewhat better than that of ridge regression. This pattern was replicated by the Pearson correlation between predicted GEBVs and the true breeding values (TBV) and the root mean squared error calculated with respect to TBV, except that accuracy was lower for all models, most especially for the adaptive elastic net. The correlation between the predicted GEBV and simulated phenotypic values based on the fivefold CV also revealed a similar pattern except that the adaptive elastic net had lower accuracy than both the ridge regression methods. CONCLUSIONS: All the six models had relatively high prediction accuracies for the simulated data set. Accuracy was higher for the lasso type methods than for ridge regression and ridge regression BLUP. BioMed Central 2012-05-21 /pmc/articles/PMC3363152/ /pubmed/22640436 http://dx.doi.org/10.1186/1753-6561-6-S2-S10 Text en Copyright ©2012 Ogutu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Ogutu, Joseph O
Schulz-Streeck, Torben
Piepho, Hans-Peter
Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
title Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
title_full Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
title_fullStr Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
title_full_unstemmed Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
title_short Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
title_sort genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3363152/
https://www.ncbi.nlm.nih.gov/pubmed/22640436
http://dx.doi.org/10.1186/1753-6561-6-S2-S10
work_keys_str_mv AT ogutujosepho genomicselectionusingregularizedlinearregressionmodelsridgeregressionlassoelasticnetandtheirextensions
AT schulzstreecktorben genomicselectionusingregularizedlinearregressionmodelsridgeregressionlassoelasticnetandtheirextensions
AT piephohanspeter genomicselectionusingregularizedlinearregressionmodelsridgeregressionlassoelasticnetandtheirextensions