Cargando…

Comparing Genomic Prediction Models by Means of Cross Validation

In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those para...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schrauf, Matías F., de los Campos, Gustavo, Munilla, Sebastián
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Plant Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8639521/ https://www.ncbi.nlm.nih.gov/pubmed/34868117 http://dx.doi.org/10.3389/fpls.2021.734512

_version_	1784609166284816384
author	Schrauf, Matías F. de los Campos, Gustavo Munilla, Sebastián
author_facet	Schrauf, Matías F. de los Campos, Gustavo Munilla, Sebastián
author_sort	Schrauf, Matías F.
collection	PubMed
description	In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
format	Online Article Text
id	pubmed-8639521
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-86395212021-12-04 Comparing Genomic Prediction Models by Means of Cross Validation Schrauf, Matías F. de los Campos, Gustavo Munilla, Sebastián Front Plant Sci Plant Science In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders. Frontiers Media S.A. 2021-11-19 /pmc/articles/PMC8639521/ /pubmed/34868117 http://dx.doi.org/10.3389/fpls.2021.734512 Text en Copyright © 2021 Schrauf, de los Campos and Munilla. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Plant Science Schrauf, Matías F. de los Campos, Gustavo Munilla, Sebastián Comparing Genomic Prediction Models by Means of Cross Validation
title	Comparing Genomic Prediction Models by Means of Cross Validation
title_full	Comparing Genomic Prediction Models by Means of Cross Validation
title_fullStr	Comparing Genomic Prediction Models by Means of Cross Validation
title_full_unstemmed	Comparing Genomic Prediction Models by Means of Cross Validation
title_short	Comparing Genomic Prediction Models by Means of Cross Validation
title_sort	comparing genomic prediction models by means of cross validation
topic	Plant Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8639521/ https://www.ncbi.nlm.nih.gov/pubmed/34868117 http://dx.doi.org/10.3389/fpls.2021.734512
work_keys_str_mv	AT schraufmatiasf comparinggenomicpredictionmodelsbymeansofcrossvalidation AT deloscamposgustavo comparinggenomicpredictionmodelsbymeansofcrossvalidation AT munillasebastian comparinggenomicpredictionmodelsbymeansofcrossvalidation

Comparing Genomic Prediction Models by Means of Cross Validation

Ejemplares similares