Cargando…

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Runcie, Daniel, Cheng, Hao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Genetics Society of America 2019
Materias:	Genomic Prediction
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829121/ https://www.ncbi.nlm.nih.gov/pubmed/31511297 http://dx.doi.org/10.1534/g3.119.400598

_version_	1783465481030598656
author	Runcie, Daniel Cheng, Hao
author_facet	Runcie, Daniel Cheng, Hao
author_sort	Runcie, Daniel
collection	PubMed
description	Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.
format	Online Article Text
id	pubmed-6829121
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Genetics Society of America
record_format	MEDLINE/PubMed
spelling	pubmed-68291212019-11-06 Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods Runcie, Daniel Cheng, Hao G3 (Bethesda) Genomic Prediction Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful. Genetics Society of America 2019-09-11 /pmc/articles/PMC6829121/ /pubmed/31511297 http://dx.doi.org/10.1534/g3.119.400598 Text en Copyright © 2019 Runcie, Cheng http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Genomic Prediction Runcie, Daniel Cheng, Hao Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title	Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_full	Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_fullStr	Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_full_unstemmed	Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_short	Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_sort	pitfalls and remedies for cross validation with multi-trait genomic prediction methods
topic	Genomic Prediction
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829121/ https://www.ncbi.nlm.nih.gov/pubmed/31511297 http://dx.doi.org/10.1534/g3.119.400598
work_keys_str_mv	AT runciedaniel pitfallsandremediesforcrossvalidationwithmultitraitgenomicpredictionmethods AT chenghao pitfallsandremediesforcrossvalidationwithmultitraitgenomicpredictionmethods

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

Ejemplares similares