Cargando…

A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species

Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to comple...

Descripción completa

Detalles Bibliográficos
Autores principales:	John, Maura, Haselbeck, Florian, Dass, Rupashree, Malisi, Christoph, Ricca, Patrizia, Dreischer, Christian, Schultheiss, Sebastian J., Grimm, Dominik G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Plant Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9673477/ https://www.ncbi.nlm.nih.gov/pubmed/36407627 http://dx.doi.org/10.3389/fpls.2022.932512

_version_	1784832949315698688
author	John, Maura Haselbeck, Florian Dass, Rupashree Malisi, Christoph Ricca, Patrizia Dreischer, Christian Schultheiss, Sebastian J. Grimm, Dominik G.
author_facet	John, Maura Haselbeck, Florian Dass, Rupashree Malisi, Christoph Ricca, Patrizia Dreischer, Christian Schultheiss, Sebastian J. Grimm, Dominik G.
author_sort	John, Maura
collection	PubMed
description	Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
format	Online Article Text
id	pubmed-9673477
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-96734772022-11-19 A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species John, Maura Haselbeck, Florian Dass, Rupashree Malisi, Christoph Ricca, Patrizia Dreischer, Christian Schultheiss, Sebastian J. Grimm, Dominik G. Front Plant Sci Plant Science Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research. Frontiers Media S.A. 2022-11-04 /pmc/articles/PMC9673477/ /pubmed/36407627 http://dx.doi.org/10.3389/fpls.2022.932512 Text en Copyright © 2022 John, Haselbeck, Dass, Malisi, Ricca, Dreischer, Schultheiss and Grimm https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Plant Science John, Maura Haselbeck, Florian Dass, Rupashree Malisi, Christoph Ricca, Patrizia Dreischer, Christian Schultheiss, Sebastian J. Grimm, Dominik G. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_full	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_fullStr	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_full_unstemmed	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_short	A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
title_sort	comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species
topic	Plant Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9673477/ https://www.ncbi.nlm.nih.gov/pubmed/36407627 http://dx.doi.org/10.3389/fpls.2022.932512
work_keys_str_mv	AT johnmaura acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT haselbeckflorian acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dassrupashree acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT malisichristoph acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT riccapatrizia acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dreischerchristian acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT schultheisssebastianj acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT grimmdominikg acomparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT johnmaura comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT haselbeckflorian comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dassrupashree comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT malisichristoph comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT riccapatrizia comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT dreischerchristian comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT schultheisssebastianj comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies AT grimmdominikg comparisonofclassicalandmachinelearningbasedphenotypepredictionmethodsonsimulateddataandthreeplantspecies

A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species

Ejemplares similares