Cargando…

Training set optimization under population structure in genomic selection

KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much in...

Descripción completa

Detalles Bibliográficos
Autores principales: Isidro, Julio, Jannink, Jean-Luc, Akdemir, Deniz, Poland, Jesse, Heslot, Nicolas, Sorrells, Mark E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282691/
https://www.ncbi.nlm.nih.gov/pubmed/25367380
http://dx.doi.org/10.1007/s00122-014-2418-4
_version_ 1782351161450299392
author Isidro, Julio
Jannink, Jean-Luc
Akdemir, Deniz
Poland, Jesse
Heslot, Nicolas
Sorrells, Mark E.
author_facet Isidro, Julio
Jannink, Jean-Luc
Akdemir, Deniz
Poland, Jesse
Heslot, Nicolas
Sorrells, Mark E.
author_sort Isidro, Julio
collection PubMed
description KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-014-2418-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4282691
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-42826912015-01-08 Training set optimization under population structure in genomic selection Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E. Theor Appl Genet Original Paper KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-014-2418-4) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2014-11-01 2015 /pmc/articles/PMC4282691/ /pubmed/25367380 http://dx.doi.org/10.1007/s00122-014-2418-4 Text en © The Author(s) 2014 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Original Paper
Isidro, Julio
Jannink, Jean-Luc
Akdemir, Deniz
Poland, Jesse
Heslot, Nicolas
Sorrells, Mark E.
Training set optimization under population structure in genomic selection
title Training set optimization under population structure in genomic selection
title_full Training set optimization under population structure in genomic selection
title_fullStr Training set optimization under population structure in genomic selection
title_full_unstemmed Training set optimization under population structure in genomic selection
title_short Training set optimization under population structure in genomic selection
title_sort training set optimization under population structure in genomic selection
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282691/
https://www.ncbi.nlm.nih.gov/pubmed/25367380
http://dx.doi.org/10.1007/s00122-014-2418-4
work_keys_str_mv AT isidrojulio trainingsetoptimizationunderpopulationstructureingenomicselection
AT janninkjeanluc trainingsetoptimizationunderpopulationstructureingenomicselection
AT akdemirdeniz trainingsetoptimizationunderpopulationstructureingenomicselection
AT polandjesse trainingsetoptimizationunderpopulationstructureingenomicselection
AT heslotnicolas trainingsetoptimizationunderpopulationstructureingenomicselection
AT sorrellsmarke trainingsetoptimizationunderpopulationstructureingenomicselection