Cargando…

Training set optimization under population structure in genomic selection

KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Isidro, Julio, Jannink, Jean-Luc, Akdemir, Deniz, Poland, Jesse, Heslot, Nicolas, Sorrells, Mark E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2014
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282691/ https://www.ncbi.nlm.nih.gov/pubmed/25367380 http://dx.doi.org/10.1007/s00122-014-2418-4

_version_	1782351161450299392
author	Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E.
author_facet	Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E.
author_sort	Isidro, Julio
collection	PubMed
description	KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-014-2418-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4282691
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-42826912015-01-08 Training set optimization under population structure in genomic selection Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E. Theor Appl Genet Original Paper KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-014-2418-4) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2014-11-01 2015 /pmc/articles/PMC4282691/ /pubmed/25367380 http://dx.doi.org/10.1007/s00122-014-2418-4 Text en © The Author(s) 2014 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle	Original Paper Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E. Training set optimization under population structure in genomic selection
title	Training set optimization under population structure in genomic selection
title_full	Training set optimization under population structure in genomic selection
title_fullStr	Training set optimization under population structure in genomic selection
title_full_unstemmed	Training set optimization under population structure in genomic selection
title_short	Training set optimization under population structure in genomic selection
title_sort	training set optimization under population structure in genomic selection
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282691/ https://www.ncbi.nlm.nih.gov/pubmed/25367380 http://dx.doi.org/10.1007/s00122-014-2418-4
work_keys_str_mv	AT isidrojulio trainingsetoptimizationunderpopulationstructureingenomicselection AT janninkjeanluc trainingsetoptimizationunderpopulationstructureingenomicselection AT akdemirdeniz trainingsetoptimizationunderpopulationstructureingenomicselection AT polandjesse trainingsetoptimizationunderpopulationstructureingenomicselection AT heslotnicolas trainingsetoptimizationunderpopulationstructureingenomicselection AT sorrellsmarke trainingsetoptimizationunderpopulationstructureingenomicselection

Training set optimization under population structure in genomic selection

Ejemplares similares