Cargando…

Multi-trait genome prediction of new environments with partial least squares

The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Montesinos-López, Osval A., Montesinos-López, Abelardo, Bernal Sandoval, David Alejandro, Mosqueda-Gonzalez, Brandon Alejandro, Valenzo-Jiménez, Marco Alberto, Crossa, José
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483856/
https://www.ncbi.nlm.nih.gov/pubmed/36134027
http://dx.doi.org/10.3389/fgene.2022.966775
_version_ 1784791758745370624
author Montesinos-López, Osval A.
Montesinos-López, Abelardo
Bernal Sandoval, David Alejandro
Mosqueda-Gonzalez, Brandon Alejandro
Valenzo-Jiménez, Marco Alberto
Crossa, José
author_facet Montesinos-López, Osval A.
Montesinos-López, Abelardo
Bernal Sandoval, David Alejandro
Mosqueda-Gonzalez, Brandon Alejandro
Valenzo-Jiménez, Marco Alberto
Crossa, José
author_sort Montesinos-López, Osval A.
collection PubMed
description The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions for genotyped candidate lines, GS saves significant resources in the selection of candidate individuals. However, its practical implementation is still challenging when the plant breeder is interested in the prediction of future seasons or new locations and/or environments, which is called the “leave one environment out” issue. Furthermore, because the distributions of the training and testing set do not match, most statistical machine learning methods struggle to produce moderate or reasonable prediction accuracies. For this reason, the main objective of this study was to explore the use of the multi-trait partial least square (MT-PLS) regression methodology for this specific task, benchmarking its performance with the Bayesian Multi-trait Genomic Best Linear Unbiased Predictor (MT-GBLUP) method. The benchmarking process was performed with five actual data sets. We found that in all data sets the MT-PLS method outperformed the popular MT-GBLUP method by 349.8% (under predictor E + G), 484.4% (under predictor E + G + GE; where E denotes environments, G genotypes and GE the genotype by environment interaction) and 15.9% (under predictor G + GE) across traits. Our results provide empirical evidence of the power of the MT-PLS methodology for the prediction of future seasons or new environments. Furthermore, the comparison between single univariate-trait (UT) versus MT for GBLUP and PLS gave an increase in prediction accuracy of MT-GBLUP versus UT-GBLUP, but not for MT-PLS versus UT-PLS.
format Online
Article
Text
id pubmed-9483856
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-94838562022-09-20 Multi-trait genome prediction of new environments with partial least squares Montesinos-López, Osval A. Montesinos-López, Abelardo Bernal Sandoval, David Alejandro Mosqueda-Gonzalez, Brandon Alejandro Valenzo-Jiménez, Marco Alberto Crossa, José Front Genet Genetics The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions for genotyped candidate lines, GS saves significant resources in the selection of candidate individuals. However, its practical implementation is still challenging when the plant breeder is interested in the prediction of future seasons or new locations and/or environments, which is called the “leave one environment out” issue. Furthermore, because the distributions of the training and testing set do not match, most statistical machine learning methods struggle to produce moderate or reasonable prediction accuracies. For this reason, the main objective of this study was to explore the use of the multi-trait partial least square (MT-PLS) regression methodology for this specific task, benchmarking its performance with the Bayesian Multi-trait Genomic Best Linear Unbiased Predictor (MT-GBLUP) method. The benchmarking process was performed with five actual data sets. We found that in all data sets the MT-PLS method outperformed the popular MT-GBLUP method by 349.8% (under predictor E + G), 484.4% (under predictor E + G + GE; where E denotes environments, G genotypes and GE the genotype by environment interaction) and 15.9% (under predictor G + GE) across traits. Our results provide empirical evidence of the power of the MT-PLS methodology for the prediction of future seasons or new environments. Furthermore, the comparison between single univariate-trait (UT) versus MT for GBLUP and PLS gave an increase in prediction accuracy of MT-GBLUP versus UT-GBLUP, but not for MT-PLS versus UT-PLS. Frontiers Media S.A. 2022-09-05 /pmc/articles/PMC9483856/ /pubmed/36134027 http://dx.doi.org/10.3389/fgene.2022.966775 Text en Copyright © 2022 Montesinos-López, Montesinos-López, Bernal Sandoval, Mosqueda-Gonzalez, Valenzo-Jiménez and Crossa. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Montesinos-López, Osval A.
Montesinos-López, Abelardo
Bernal Sandoval, David Alejandro
Mosqueda-Gonzalez, Brandon Alejandro
Valenzo-Jiménez, Marco Alberto
Crossa, José
Multi-trait genome prediction of new environments with partial least squares
title Multi-trait genome prediction of new environments with partial least squares
title_full Multi-trait genome prediction of new environments with partial least squares
title_fullStr Multi-trait genome prediction of new environments with partial least squares
title_full_unstemmed Multi-trait genome prediction of new environments with partial least squares
title_short Multi-trait genome prediction of new environments with partial least squares
title_sort multi-trait genome prediction of new environments with partial least squares
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9483856/
https://www.ncbi.nlm.nih.gov/pubmed/36134027
http://dx.doi.org/10.3389/fgene.2022.966775
work_keys_str_mv AT montesinoslopezosvala multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT montesinoslopezabelardo multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT bernalsandovaldavidalejandro multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT mosquedagonzalezbrandonalejandro multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT valenzojimenezmarcoalberto multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT crossajose multitraitgenomepredictionofnewenvironmentswithpartialleastsquares