Cargando…
Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods
Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic p...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829121/ https://www.ncbi.nlm.nih.gov/pubmed/31511297 http://dx.doi.org/10.1534/g3.119.400598 |
_version_ | 1783465481030598656 |
---|---|
author | Runcie, Daniel Cheng, Hao |
author_facet | Runcie, Daniel Cheng, Hao |
author_sort | Runcie, Daniel |
collection | PubMed |
description | Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful. |
format | Online Article Text |
id | pubmed-6829121 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-68291212019-11-06 Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods Runcie, Daniel Cheng, Hao G3 (Bethesda) Genomic Prediction Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful. Genetics Society of America 2019-09-11 /pmc/articles/PMC6829121/ /pubmed/31511297 http://dx.doi.org/10.1534/g3.119.400598 Text en Copyright © 2019 Runcie, Cheng http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Genomic Prediction Runcie, Daniel Cheng, Hao Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods |
title | Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods |
title_full | Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods |
title_fullStr | Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods |
title_full_unstemmed | Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods |
title_short | Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods |
title_sort | pitfalls and remedies for cross validation with multi-trait genomic prediction methods |
topic | Genomic Prediction |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829121/ https://www.ncbi.nlm.nih.gov/pubmed/31511297 http://dx.doi.org/10.1534/g3.119.400598 |
work_keys_str_mv | AT runciedaniel pitfallsandremediesforcrossvalidationwithmultitraitgenomicpredictionmethods AT chenghao pitfallsandremediesforcrossvalidationwithmultitraitgenomicpredictionmethods |