Cargando…

Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP

Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms...

Descripción completa

Detalles Bibliográficos
Autores principales: Heilmann, Philipp Georg, Frisch, Matthias, Abbadi, Amine, Kox, Tobias, Herzog, Eva
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401275/
https://www.ncbi.nlm.nih.gov/pubmed/37546247
http://dx.doi.org/10.3389/fpls.2023.1178902
_version_ 1785084623552774144
author Heilmann, Philipp Georg
Frisch, Matthias
Abbadi, Amine
Kox, Tobias
Herzog, Eva
author_facet Heilmann, Philipp Georg
Frisch, Matthias
Abbadi, Amine
Kox, Tobias
Herzog, Eva
author_sort Heilmann, Philipp Georg
collection PubMed
description Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high.
format Online
Article
Text
id pubmed-10401275
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-104012752023-08-05 Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP Heilmann, Philipp Georg Frisch, Matthias Abbadi, Amine Kox, Tobias Herzog, Eva Front Plant Sci Plant Science Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high. Frontiers Media S.A. 2023-07-21 /pmc/articles/PMC10401275/ /pubmed/37546247 http://dx.doi.org/10.3389/fpls.2023.1178902 Text en Copyright © 2023 Heilmann, Frisch, Abbadi, Kox and Herzog https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Heilmann, Philipp Georg
Frisch, Matthias
Abbadi, Amine
Kox, Tobias
Herzog, Eva
Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_full Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_fullStr Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_full_unstemmed Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_short Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP
title_sort stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based gblup
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401275/
https://www.ncbi.nlm.nih.gov/pubmed/37546247
http://dx.doi.org/10.3389/fpls.2023.1178902
work_keys_str_mv AT heilmannphilippgeorg stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT frischmatthias stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT abbadiamine stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT koxtobias stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup
AT herzogeva stackedensemblesonbasisofparentageinformationcanpredicthybridperformancewithanaccuracycomparabletomarkerbasedgblup