Cargando…

Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes

In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the importa...

Descripción completa

Detalles Bibliográficos
Autores principales: Arouisse, Bader, Theeuwen, Tom P. J. M., van Eeuwijk, Fred A., Kruijer, Willem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8181460/
https://www.ncbi.nlm.nih.gov/pubmed/34108993
http://dx.doi.org/10.3389/fgene.2021.667358
_version_ 1783704096572702720
author Arouisse, Bader
Theeuwen, Tom P. J. M.
van Eeuwijk, Fred A.
Kruijer, Willem
author_facet Arouisse, Bader
Theeuwen, Tom P. J. M.
van Eeuwijk, Fred A.
Kruijer, Willem
author_sort Arouisse, Bader
collection PubMed
description In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the important question whether these additional or “secondary” traits can be used to improve genomic prediction for the target trait. With only a small number of secondary traits, this is known to be the case, given sufficiently high heritabilities and genetic correlations. Here we focus on the more challenging situation with a large number of secondary traits, which is increasingly common since the arrival of high-throughput phenotyping. In this case, secondary traits are usually incorporated through additional relatedness matrices. This approach is however infeasible when secondary traits are not measured on the test set, and cannot distinguish between genetic and non-genetic correlations. An alternative direction is to extend the classical selection indices using penalized regression. So far, penalized selection indices have not been applied in a genomic prediction setting, and require plot-level data in order to reliably estimate genetic correlations. Here we aim to overcome these limitations, using two novel approaches. Our first approach relies on a dimension reduction of the secondary traits, using either penalized regression or random forests (LS-BLUP/RF-BLUP). We then compute the bivariate GBLUP with the dimension reduction as secondary trait. For simulated data (with available plot-level data), we also use bivariate GBLUP with the penalized selection index as secondary trait (SI-BLUP). In our second approach (GM-BLUP), we follow existing multi-kernel methods but replace secondary traits by their genomic predictions, with the advantage that genomic prediction is also possible when secondary traits are only measured on the training set. For most of our simulated data, SI-BLUP was most accurate, often closely followed by RF-BLUP or LS-BLUP. In real datasets, involving metabolites in Arabidopsis and transcriptomics in maize, no method could substantially improve over univariate prediction when secondary traits were only available on the training set. LS-BLUP and RF-BLUP were most accurate when secondary traits were available also for the test set.
format Online
Article
Text
id pubmed-8181460
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81814602021-06-08 Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes Arouisse, Bader Theeuwen, Tom P. J. M. van Eeuwijk, Fred A. Kruijer, Willem Front Genet Genetics In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the important question whether these additional or “secondary” traits can be used to improve genomic prediction for the target trait. With only a small number of secondary traits, this is known to be the case, given sufficiently high heritabilities and genetic correlations. Here we focus on the more challenging situation with a large number of secondary traits, which is increasingly common since the arrival of high-throughput phenotyping. In this case, secondary traits are usually incorporated through additional relatedness matrices. This approach is however infeasible when secondary traits are not measured on the test set, and cannot distinguish between genetic and non-genetic correlations. An alternative direction is to extend the classical selection indices using penalized regression. So far, penalized selection indices have not been applied in a genomic prediction setting, and require plot-level data in order to reliably estimate genetic correlations. Here we aim to overcome these limitations, using two novel approaches. Our first approach relies on a dimension reduction of the secondary traits, using either penalized regression or random forests (LS-BLUP/RF-BLUP). We then compute the bivariate GBLUP with the dimension reduction as secondary trait. For simulated data (with available plot-level data), we also use bivariate GBLUP with the penalized selection index as secondary trait (SI-BLUP). In our second approach (GM-BLUP), we follow existing multi-kernel methods but replace secondary traits by their genomic predictions, with the advantage that genomic prediction is also possible when secondary traits are only measured on the training set. For most of our simulated data, SI-BLUP was most accurate, often closely followed by RF-BLUP or LS-BLUP. In real datasets, involving metabolites in Arabidopsis and transcriptomics in maize, no method could substantially improve over univariate prediction when secondary traits were only available on the training set. LS-BLUP and RF-BLUP were most accurate when secondary traits were available also for the test set. Frontiers Media S.A. 2021-05-24 /pmc/articles/PMC8181460/ /pubmed/34108993 http://dx.doi.org/10.3389/fgene.2021.667358 Text en Copyright © 2021 Arouisse, Theeuwen, van Eeuwijk and Kruijer. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Arouisse, Bader
Theeuwen, Tom P. J. M.
van Eeuwijk, Fred A.
Kruijer, Willem
Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes
title Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes
title_full Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes
title_fullStr Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes
title_full_unstemmed Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes
title_short Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes
title_sort improving genomic prediction using high-dimensional secondary phenotypes
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8181460/
https://www.ncbi.nlm.nih.gov/pubmed/34108993
http://dx.doi.org/10.3389/fgene.2021.667358
work_keys_str_mv AT arouissebader improvinggenomicpredictionusinghighdimensionalsecondaryphenotypes
AT theeuwentompjm improvinggenomicpredictionusinghighdimensionalsecondaryphenotypes
AT vaneeuwijkfreda improvinggenomicpredictionusinghighdimensionalsecondaryphenotypes
AT kruijerwillem improvinggenomicpredictionusinghighdimensionalsecondaryphenotypes