Cargando…

A comparison of methods for training population optimization in genomic selection

KEY MESSAGE: Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50–55% (targeted) or 65–85% (untargeted) is needed to obtain 95% of the accuracy. ABSTRACT: With the advent of genomic selection (GS) as a widespread breeding tool, mechanism...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fernández-González, Javier, Akdemir, Deniz, Isidro y Sánchez, Julio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998580/ https://www.ncbi.nlm.nih.gov/pubmed/36892603 http://dx.doi.org/10.1007/s00122-023-04265-6

_version_	1784903496670117888
author	Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio
author_facet	Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio
author_sort	Fernández-González, Javier
collection	PubMed
description	KEY MESSAGE: Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50–55% (targeted) or 65–85% (untargeted) is needed to obtain 95% of the accuracy. ABSTRACT: With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50–55% of the candidate set was enough to reach 95–100% of the maximum accuracy in the targeted scenario, while we needed a 65–85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-023-04265-6.
format	Online Article Text
id	pubmed-9998580
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-99985802023-03-11 A comparison of methods for training population optimization in genomic selection Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio Theor Appl Genet Original Article KEY MESSAGE: Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50–55% (targeted) or 65–85% (untargeted) is needed to obtain 95% of the accuracy. ABSTRACT: With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50–55% of the candidate set was enough to reach 95–100% of the maximum accuracy in the targeted scenario, while we needed a 65–85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00122-023-04265-6. Springer Berlin Heidelberg 2023-03-09 2023 /pmc/articles/PMC9998580/ /pubmed/36892603 http://dx.doi.org/10.1007/s00122-023-04265-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Original Article Fernández-González, Javier Akdemir, Deniz Isidro y Sánchez, Julio A comparison of methods for training population optimization in genomic selection
title	A comparison of methods for training population optimization in genomic selection
title_full	A comparison of methods for training population optimization in genomic selection
title_fullStr	A comparison of methods for training population optimization in genomic selection
title_full_unstemmed	A comparison of methods for training population optimization in genomic selection
title_short	A comparison of methods for training population optimization in genomic selection
title_sort	comparison of methods for training population optimization in genomic selection
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998580/ https://www.ncbi.nlm.nih.gov/pubmed/36892603 http://dx.doi.org/10.1007/s00122-023-04265-6
work_keys_str_mv	AT fernandezgonzalezjavier acomparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT akdemirdeniz acomparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT isidroysanchezjulio acomparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT fernandezgonzalezjavier comparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT akdemirdeniz comparisonofmethodsfortrainingpopulationoptimizationingenomicselection AT isidroysanchezjulio comparisonofmethodsfortrainingpopulationoptimizationingenomicselection

A comparison of methods for training population optimization in genomic selection

Ejemplares similares