Cargando…
Training set optimization under population structure in genomic selection
KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much in...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282691/ https://www.ncbi.nlm.nih.gov/pubmed/25367380 http://dx.doi.org/10.1007/s00122-014-2418-4 |
_version_ | 1782351161450299392 |
---|---|
author | Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E. |
author_facet | Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E. |
author_sort | Isidro, Julio |
collection | PubMed |
description | KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-014-2418-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4282691 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-42826912015-01-08 Training set optimization under population structure in genomic selection Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E. Theor Appl Genet Original Paper KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. ABSTRACT: The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-014-2418-4) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2014-11-01 2015 /pmc/articles/PMC4282691/ /pubmed/25367380 http://dx.doi.org/10.1007/s00122-014-2418-4 Text en © The Author(s) 2014 https://creativecommons.org/licenses/by/4.0/ Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. |
spellingShingle | Original Paper Isidro, Julio Jannink, Jean-Luc Akdemir, Deniz Poland, Jesse Heslot, Nicolas Sorrells, Mark E. Training set optimization under population structure in genomic selection |
title | Training set optimization under population structure in genomic selection |
title_full | Training set optimization under population structure in genomic selection |
title_fullStr | Training set optimization under population structure in genomic selection |
title_full_unstemmed | Training set optimization under population structure in genomic selection |
title_short | Training set optimization under population structure in genomic selection |
title_sort | training set optimization under population structure in genomic selection |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4282691/ https://www.ncbi.nlm.nih.gov/pubmed/25367380 http://dx.doi.org/10.1007/s00122-014-2418-4 |
work_keys_str_mv | AT isidrojulio trainingsetoptimizationunderpopulationstructureingenomicselection AT janninkjeanluc trainingsetoptimizationunderpopulationstructureingenomicselection AT akdemirdeniz trainingsetoptimizationunderpopulationstructureingenomicselection AT polandjesse trainingsetoptimizationunderpopulationstructureingenomicselection AT heslotnicolas trainingsetoptimizationunderpopulationstructureingenomicselection AT sorrellsmarke trainingsetoptimizationunderpopulationstructureingenomicselection |