Cargando…

Two simple methods to improve the accuracy of the genomic selection methodology

BACKGROUND: Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a...

Descripción completa

Detalles Bibliográficos
Autores principales: Montesinos-López, Osval A., Kismiantini, Montesinos-López, Abelardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131336/
https://www.ncbi.nlm.nih.gov/pubmed/37101112
http://dx.doi.org/10.1186/s12864-023-09294-5
_version_ 1785031156514684928
author Montesinos-López, Osval A.
Kismiantini
Montesinos-López, Abelardo
author_facet Montesinos-López, Osval A.
Kismiantini
Montesinos-López, Abelardo
author_sort Montesinos-López, Osval A.
collection PubMed
description BACKGROUND: Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a regression problem in general has low sensitivity to select the best candidate individuals since a top percentage is selected according to a ranking of predicted breeding values. RESULTS: For this reason, in this paper we propose two methods to improve the prediction accuracy of this methodology. One of the methods consist in reformulating the GS (nowadays formulated as a regression problem) methodology as a binary classification problem. The other consists only in a postprocessing step that adjust the threshold used for classification of the lines predicted in its original scale (continues scale) to guarantee similar sensitivity and specificity. The postprocessing method is applied for the resulting predictions after obtaining the predictions using the conventional regression model. Both methods assume that we defined with anticipation a threshold, to divide the training data as top lines and not top lines, and this threshold can be decided in terms of a quantile (for example 80%, 90%, etc.) or as the average (or maximum) of the performance of the checks. In the reformulation method it is required to label as one those lines in the training set that are equal or larger than the specified threshold and as zero otherwise. Then we train a binary classification model with the conventional inputs, but using the binary response variable in place of the continuous response variable. The training of the binary classification should be done to guarantee a more similar sensitivity and specificity, to guarantee a reasonable probability of classification of the top lines. CONCLUSIONS: We evaluated the proposed models in seven data sets and we found that the two proposed methods outperformed by large margin the conventional regression model (by 402.9% in terms of sensitivity, by 110.04% in terms of F1 score and by 70.96% in terms of Kappa coefficient, with the postprocessing methods). However, between the two proposed methods the postprocessing method was better than the reformulation as binary classification model. The simple postprocessing method to improve the accuracy of the conventional genomic regression models avoid the need to reformulate the conventional regression models as binary classification models with similar or better performance, that significantly improve the selection of the top best candidate lines. In general both proposed methods are simple and can easily be adopted for use in practical breeding programs, with the guarantee that will improve significantly the selection of the top best candidates lines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09294-5.
format Online
Article
Text
id pubmed-10131336
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101313362023-04-27 Two simple methods to improve the accuracy of the genomic selection methodology Montesinos-López, Osval A. Kismiantini Montesinos-López, Abelardo BMC Genomics Research BACKGROUND: Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a regression problem in general has low sensitivity to select the best candidate individuals since a top percentage is selected according to a ranking of predicted breeding values. RESULTS: For this reason, in this paper we propose two methods to improve the prediction accuracy of this methodology. One of the methods consist in reformulating the GS (nowadays formulated as a regression problem) methodology as a binary classification problem. The other consists only in a postprocessing step that adjust the threshold used for classification of the lines predicted in its original scale (continues scale) to guarantee similar sensitivity and specificity. The postprocessing method is applied for the resulting predictions after obtaining the predictions using the conventional regression model. Both methods assume that we defined with anticipation a threshold, to divide the training data as top lines and not top lines, and this threshold can be decided in terms of a quantile (for example 80%, 90%, etc.) or as the average (or maximum) of the performance of the checks. In the reformulation method it is required to label as one those lines in the training set that are equal or larger than the specified threshold and as zero otherwise. Then we train a binary classification model with the conventional inputs, but using the binary response variable in place of the continuous response variable. The training of the binary classification should be done to guarantee a more similar sensitivity and specificity, to guarantee a reasonable probability of classification of the top lines. CONCLUSIONS: We evaluated the proposed models in seven data sets and we found that the two proposed methods outperformed by large margin the conventional regression model (by 402.9% in terms of sensitivity, by 110.04% in terms of F1 score and by 70.96% in terms of Kappa coefficient, with the postprocessing methods). However, between the two proposed methods the postprocessing method was better than the reformulation as binary classification model. The simple postprocessing method to improve the accuracy of the conventional genomic regression models avoid the need to reformulate the conventional regression models as binary classification models with similar or better performance, that significantly improve the selection of the top best candidate lines. In general both proposed methods are simple and can easily be adopted for use in practical breeding programs, with the guarantee that will improve significantly the selection of the top best candidates lines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09294-5. BioMed Central 2023-04-26 /pmc/articles/PMC10131336/ /pubmed/37101112 http://dx.doi.org/10.1186/s12864-023-09294-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Montesinos-López, Osval A.
Kismiantini
Montesinos-López, Abelardo
Two simple methods to improve the accuracy of the genomic selection methodology
title Two simple methods to improve the accuracy of the genomic selection methodology
title_full Two simple methods to improve the accuracy of the genomic selection methodology
title_fullStr Two simple methods to improve the accuracy of the genomic selection methodology
title_full_unstemmed Two simple methods to improve the accuracy of the genomic selection methodology
title_short Two simple methods to improve the accuracy of the genomic selection methodology
title_sort two simple methods to improve the accuracy of the genomic selection methodology
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131336/
https://www.ncbi.nlm.nih.gov/pubmed/37101112
http://dx.doi.org/10.1186/s12864-023-09294-5
work_keys_str_mv AT montesinoslopezosvala twosimplemethodstoimprovetheaccuracyofthegenomicselectionmethodology
AT kismiantini twosimplemethodstoimprovetheaccuracyofthegenomicselectionmethodology
AT montesinoslopezabelardo twosimplemethodstoimprovetheaccuracyofthegenomicselectionmethodology