Cargando…
A comparison of methods for training population optimization in genomic selection
KEY MESSAGE: Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50–55% (targeted) or 65–85% (untargeted) is needed to obtain 95% of the accuracy. ABSTRACT: With the advent of genomic selection (GS) as a widespread breeding tool, mechanism...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998580/ https://www.ncbi.nlm.nih.gov/pubmed/36892603 http://dx.doi.org/10.1007/s00122-023-04265-6 |
_version_ | 1784903496670117888 |
---|---|
author | Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio |
author_facet | Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio |
author_sort | Fernández-González, Javier |
collection | PubMed |
description | KEY MESSAGE: Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50–55% (targeted) or 65–85% (untargeted) is needed to obtain 95% of the accuracy. ABSTRACT: With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50–55% of the candidate set was enough to reach 95–100% of the maximum accuracy in the targeted scenario, while we needed a 65–85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-023-04265-6. |
format | Online Article Text |
id | pubmed-9998580 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-99985802023-03-11 A comparison of methods for training population optimization in genomic selection Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio Theor Appl Genet Original Article KEY MESSAGE: Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50–55% (targeted) or 65–85% (untargeted) is needed to obtain 95% of the accuracy. ABSTRACT: With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50–55% of the candidate set was enough to reach 95–100% of the maximum accuracy in the targeted scenario, while we needed a 65–85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-023-04265-6. Springer Berlin Heidelberg 2023-03-09 2023 /pmc/articles/PMC9998580/ /pubmed/36892603 http://dx.doi.org/10.1007/s00122-023-04265-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Original Article Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio A comparison of methods for training population optimization in genomic selection |
title | A comparison of methods for training population optimization in genomic selection |
title_full | A comparison of methods for training population optimization in genomic selection |
title_fullStr | A comparison of methods for training population optimization in genomic selection |
title_full_unstemmed | A comparison of methods for training population optimization in genomic selection |
title_short | A comparison of methods for training population optimization in genomic selection |
title_sort | comparison of methods for training population optimization in genomic selection |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998580/ https://www.ncbi.nlm.nih.gov/pubmed/36892603 http://dx.doi.org/10.1007/s00122-023-04265-6 |
work_keys_str_mv | AT fernandezgonzalezjavier acomparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT akdemirdeniz acomparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT isidroysanchezjulio acomparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT fernandezgonzalezjavier comparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT akdemirdeniz comparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT isidroysanchezjulio comparisonofmethodsfortrainingpopulationoptimizationingenomicselection |