Cargando…

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice

We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6...

Descripción completa

Detalles Bibliográficos
Autores principales: Perez, Bruno C, Bink, Marco C A M, Svenson, Karen L, Churchill, Gary A, Calus, Mario P L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982369/
https://www.ncbi.nlm.nih.gov/pubmed/35166767
http://dx.doi.org/10.1093/g3journal/jkac039
_version_ 1784681795920330752
author Perez, Bruno C
Bink, Marco C A M
Svenson, Karen L
Churchill, Gary A
Calus, Mario P L
author_facet Perez, Bruno C
Bink, Marco C A M
Svenson, Karen L
Churchill, Gary A
Calus, Mario P L
author_sort Perez, Bruno C
collection PubMed
description We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects.
format Online
Article
Text
id pubmed-8982369
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-89823692022-04-05 Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice Perez, Bruno C Bink, Marco C A M Svenson, Karen L Churchill, Gary A Calus, Mario P L G3 (Bethesda) Investigation We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects. Oxford University Press 2022-02-15 /pmc/articles/PMC8982369/ /pubmed/35166767 http://dx.doi.org/10.1093/g3journal/jkac039 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Perez, Bruno C
Bink, Marco C A M
Svenson, Karen L
Churchill, Gary A
Calus, Mario P L
Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice
title Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice
title_full Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice
title_fullStr Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice
title_full_unstemmed Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice
title_short Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice
title_sort prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982369/
https://www.ncbi.nlm.nih.gov/pubmed/35166767
http://dx.doi.org/10.1093/g3journal/jkac039
work_keys_str_mv AT perezbrunoc predictionperformanceoflinearmodelsandgradientboostingmachineoncomplexphenotypesinoutbredmice
AT binkmarcocam predictionperformanceoflinearmodelsandgradientboostingmachineoncomplexphenotypesinoutbredmice
AT svensonkarenl predictionperformanceoflinearmodelsandgradientboostingmachineoncomplexphenotypesinoutbredmice
AT churchillgarya predictionperformanceoflinearmodelsandgradientboostingmachineoncomplexphenotypesinoutbredmice
AT calusmariopl predictionperformanceoflinearmodelsandgradientboostingmachineoncomplexphenotypesinoutbredmice