Cargando…

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic p...

Descripción completa

Detalles Bibliográficos
Autores principales: Runcie, Daniel, Cheng, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829121/
https://www.ncbi.nlm.nih.gov/pubmed/31511297
http://dx.doi.org/10.1534/g3.119.400598
_version_ 1783465481030598656
author Runcie, Daniel
Cheng, Hao
author_facet Runcie, Daniel
Cheng, Hao
author_sort Runcie, Daniel
collection PubMed
description Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.
format Online
Article
Text
id pubmed-6829121
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-68291212019-11-06 Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods Runcie, Daniel Cheng, Hao G3 (Bethesda) Genomic Prediction Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful. Genetics Society of America 2019-09-11 /pmc/articles/PMC6829121/ /pubmed/31511297 http://dx.doi.org/10.1534/g3.119.400598 Text en Copyright © 2019 Runcie, Cheng http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomic Prediction
Runcie, Daniel
Cheng, Hao
Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_full Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_fullStr Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_full_unstemmed Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_short Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
title_sort pitfalls and remedies for cross validation with multi-trait genomic prediction methods
topic Genomic Prediction
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829121/
https://www.ncbi.nlm.nih.gov/pubmed/31511297
http://dx.doi.org/10.1534/g3.119.400598
work_keys_str_mv AT runciedaniel pitfallsandremediesforcrossvalidationwithmultitraitgenomicpredictionmethods
AT chenghao pitfallsandremediesforcrossvalidationwithmultitraitgenomicpredictionmethods