Cargando…

A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library

Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available...

Descripción completa

Detalles Bibliográficos
Autores principales: Montesinos-López, Osval A., Montesinos-López, Abelardo, Cano-Paez, Bernabe, Hernández-Suárez, Carlos Moisés, Santana-Mancilla, Pedro C., Crossa, José
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407886/
https://www.ncbi.nlm.nih.gov/pubmed/36011405
http://dx.doi.org/10.3390/genes13081494
_version_ 1784774472615591936
author Montesinos-López, Osval A.
Montesinos-López, Abelardo
Cano-Paez, Bernabe
Hernández-Suárez, Carlos Moisés
Santana-Mancilla, Pedro C.
Crossa, José
author_facet Montesinos-López, Osval A.
Montesinos-López, Abelardo
Cano-Paez, Bernabe
Hernández-Suárez, Carlos Moisés
Santana-Mancilla, Pedro C.
Crossa, José
author_sort Montesinos-López, Osval A.
collection PubMed
description Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning methods have been proposed for this task. Multi-trait (MT) genomic prediction models take advantage of correlated traits to improve prediction accuracy. Therefore, some multivariate statistical machine learning methods are popular for GS. In this paper, we compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least squares (PLS) and the multi-trait random forest (RF) methods. Benchmarking was performed with six real datasets. We found that the three investigated methods produce similar results, but under predictors with genotype (G) and environment (E), that is, E + G, the MT GBLUP achieved superior performance, whereas under predictors E + G + genotype [Formula: see text] environment (GE) and G + GE, random forest achieved the best results. We also found that the best predictions were achieved under the predictors E + G and E + G + GE. Here, we also provide the R code for the implementation of these three statistical machine learning methods in the sparse kernel method (SKM) library, which offers not only options for single-trait prediction with various statistical machine learning methods but also some options for MT predictions that can help to capture improved complex patterns in datasets that are common in genomic selection.
format Online
Article
Text
id pubmed-9407886
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94078862022-08-26 A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library Montesinos-López, Osval A. Montesinos-López, Abelardo Cano-Paez, Bernabe Hernández-Suárez, Carlos Moisés Santana-Mancilla, Pedro C. Crossa, José Genes (Basel) Article Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning methods have been proposed for this task. Multi-trait (MT) genomic prediction models take advantage of correlated traits to improve prediction accuracy. Therefore, some multivariate statistical machine learning methods are popular for GS. In this paper, we compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least squares (PLS) and the multi-trait random forest (RF) methods. Benchmarking was performed with six real datasets. We found that the three investigated methods produce similar results, but under predictors with genotype (G) and environment (E), that is, E + G, the MT GBLUP achieved superior performance, whereas under predictors E + G + genotype [Formula: see text] environment (GE) and G + GE, random forest achieved the best results. We also found that the best predictions were achieved under the predictors E + G and E + G + GE. Here, we also provide the R code for the implementation of these three statistical machine learning methods in the sparse kernel method (SKM) library, which offers not only options for single-trait prediction with various statistical machine learning methods but also some options for MT predictions that can help to capture improved complex patterns in datasets that are common in genomic selection. MDPI 2022-08-21 /pmc/articles/PMC9407886/ /pubmed/36011405 http://dx.doi.org/10.3390/genes13081494 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Montesinos-López, Osval A.
Montesinos-López, Abelardo
Cano-Paez, Bernabe
Hernández-Suárez, Carlos Moisés
Santana-Mancilla, Pedro C.
Crossa, José
A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library
title A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library
title_full A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library
title_fullStr A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library
title_full_unstemmed A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library
title_short A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library
title_sort comparison of three machine learning methods for multivariate genomic prediction using the sparse kernels method (skm) library
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407886/
https://www.ncbi.nlm.nih.gov/pubmed/36011405
http://dx.doi.org/10.3390/genes13081494
work_keys_str_mv AT montesinoslopezosvala acomparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT montesinoslopezabelardo acomparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT canopaezbernabe acomparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT hernandezsuarezcarlosmoises acomparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT santanamancillapedroc acomparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT crossajose acomparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT montesinoslopezosvala comparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT montesinoslopezabelardo comparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT canopaezbernabe comparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT hernandezsuarezcarlosmoises comparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT santanamancillapedroc comparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary
AT crossajose comparisonofthreemachinelearningmethodsformultivariategenomicpredictionusingthesparsekernelsmethodskmlibrary