Cargando…

Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato

Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilson, Stefan, Malosetti, Marcos, Maliepaard, Chris, Mulder, Han A., Visser, Richard G. F., van Eeuwijk, Fred
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8651708/
https://www.ncbi.nlm.nih.gov/pubmed/34899794
http://dx.doi.org/10.3389/fpls.2021.771075
_version_ 1784611457034354688
author Wilson, Stefan
Malosetti, Marcos
Maliepaard, Chris
Mulder, Han A.
Visser, Richard G. F.
van Eeuwijk, Fred
author_facet Wilson, Stefan
Malosetti, Marcos
Maliepaard, Chris
Mulder, Han A.
Visser, Richard G. F.
van Eeuwijk, Fred
author_sort Wilson, Stefan
collection PubMed
description Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the third most important crop in the world in terms of food consumption, after rice and wheat. The aim of this study was to investigate the impact of different training set construction methods using a publicly available diversity panel of tetraploid potatoes. Four methods of training set construction were compared: simple random sampling, stratified random sampling, genetic distance sampling and sampling based on the coefficient of determination (CDmean). For stratified random sampling, population structure analyses were carried out in order to define sub-populations, but since sub-populations accounted for only 16.6% of genetic variation, there were negligible differences between stratified and simple random sampling. For genetic distance sampling, four genetic distance measures were compared and though they performed similarly, Euclidean distance was the most consistent. In the majority of cases the CDmean method was the best sampling method, and compared to simple random sampling gave improvements of 4–14% in cross-validation scenarios, and 2–8% in scenarios with an independent test set, while genetic distance sampling gave improvements of 5.5–10.5% and 0.4–4.5%. No interaction was found between sampling method and the statistical model for the traits analyzed.
format Online
Article
Text
id pubmed-8651708
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-86517082021-12-09 Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato Wilson, Stefan Malosetti, Marcos Maliepaard, Chris Mulder, Han A. Visser, Richard G. F. van Eeuwijk, Fred Front Plant Sci Plant Science Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the third most important crop in the world in terms of food consumption, after rice and wheat. The aim of this study was to investigate the impact of different training set construction methods using a publicly available diversity panel of tetraploid potatoes. Four methods of training set construction were compared: simple random sampling, stratified random sampling, genetic distance sampling and sampling based on the coefficient of determination (CDmean). For stratified random sampling, population structure analyses were carried out in order to define sub-populations, but since sub-populations accounted for only 16.6% of genetic variation, there were negligible differences between stratified and simple random sampling. For genetic distance sampling, four genetic distance measures were compared and though they performed similarly, Euclidean distance was the most consistent. In the majority of cases the CDmean method was the best sampling method, and compared to simple random sampling gave improvements of 4–14% in cross-validation scenarios, and 2–8% in scenarios with an independent test set, while genetic distance sampling gave improvements of 5.5–10.5% and 0.4–4.5%. No interaction was found between sampling method and the statistical model for the traits analyzed. Frontiers Media S.A. 2021-11-24 /pmc/articles/PMC8651708/ /pubmed/34899794 http://dx.doi.org/10.3389/fpls.2021.771075 Text en Copyright © 2021 Wilson, Malosetti, Maliepaard, Mulder, Visser and van Eeuwijk. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Wilson, Stefan
Malosetti, Marcos
Maliepaard, Chris
Mulder, Han A.
Visser, Richard G. F.
van Eeuwijk, Fred
Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_full Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_fullStr Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_full_unstemmed Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_short Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_sort training set construction for genomic prediction in auto-tetraploids: an example in potato
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8651708/
https://www.ncbi.nlm.nih.gov/pubmed/34899794
http://dx.doi.org/10.3389/fpls.2021.771075
work_keys_str_mv AT wilsonstefan trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT malosettimarcos trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT maliepaardchris trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT mulderhana trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT visserrichardgf trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT vaneeuwijkfred trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato