Cargando…

The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method

To develop a breed assignment model, three main steps are generally followed: 1) The selection of breed informative single nucleotide polymorphism (SNP); 2) The training of a model, based on a reference population, that allows to classify animals to their breed of origin; and 3) The validation of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilmot, Hélène, Niehoff, Tobias, Soyeurt, Hélène, Gengler, Nicolas, Calus, Mario P L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10276639/
https://www.ncbi.nlm.nih.gov/pubmed/37220912
http://dx.doi.org/10.1093/jas/skad172
_version_ 1785060119890886656
author Wilmot, Hélène
Niehoff, Tobias
Soyeurt, Hélène
Gengler, Nicolas
Calus, Mario P L
author_facet Wilmot, Hélène
Niehoff, Tobias
Soyeurt, Hélène
Gengler, Nicolas
Calus, Mario P L
author_sort Wilmot, Hélène
collection PubMed
description To develop a breed assignment model, three main steps are generally followed: 1) The selection of breed informative single nucleotide polymorphism (SNP); 2) The training of a model, based on a reference population, that allows to classify animals to their breed of origin; and 3) The validation of the developed model on external animals i.e., that were not used in previous steps. However, there is no consensus in the literature about which methodology to follow for the first step, nor about the number of SNP to be selected. This can raise many questions when developing the model and lead to the use of sophisticated methodologies for selecting SNP (e.g., with iterative algorithms, partitions of SNP, or combination of several methods). Therefore, it may be of interest to avoid the first step by the use of all the available SNP. For this purpose, we propose the use of a genomic relationship matrix (GRM), combined or not with a machine learning method, for breed assignment. We compared it with a previously developed model based on selected informative SNP. Four methodologies were investigated: 1) The PLS_NSC methodology: selection of SNP based on a partial least square-discriminant analysis (PLS-DA) and breed assignment by classification based on the nearest shrunken centroids (NSC) method; 2) Breed assignment based on the highest mean relatedness of an animal to the reference populations of each breed (referred to mean_GRM); 3) Breed assignment based on the highest SD of the relatedness of an animal to the reference populations of each breed (referred to SD_GRM) and 4) The GRM_SVM methodology: the use of means and SD of the relatedness defined in mean_GRM and SD_GRM methodologies combined with the linear support vector machine (SVM), a machine learning method used for classification. Regarding mean global accuracies, results showed that the use of mean_GRM or GRM_SVM was not significantly different (Bonferroni corrected P > 0.0083) than the model based on a reduced SNP panel (PLS_NSC). Moreover, the mean_GRM and GRM_SVM methodology were more efficient than PLS_NSC as it was faster to compute. Therefore, it is possible to bypass the selection of SNP and, by the use of a GRM, to develop an efficient breed assignment model. In routine, we recommend the use of GRM_SVM over mean_GRM as it gave a slightly increased global accuracy, which can help endangered breeds to be maintained. The script to execute the different methodologies can be accessed on: https://github.com/hwilmot675/Breed_assignment.
format Online
Article
Text
id pubmed-10276639
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-102766392023-06-18 The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method Wilmot, Hélène Niehoff, Tobias Soyeurt, Hélène Gengler, Nicolas Calus, Mario P L J Anim Sci Animal Genetics and Genomics To develop a breed assignment model, three main steps are generally followed: 1) The selection of breed informative single nucleotide polymorphism (SNP); 2) The training of a model, based on a reference population, that allows to classify animals to their breed of origin; and 3) The validation of the developed model on external animals i.e., that were not used in previous steps. However, there is no consensus in the literature about which methodology to follow for the first step, nor about the number of SNP to be selected. This can raise many questions when developing the model and lead to the use of sophisticated methodologies for selecting SNP (e.g., with iterative algorithms, partitions of SNP, or combination of several methods). Therefore, it may be of interest to avoid the first step by the use of all the available SNP. For this purpose, we propose the use of a genomic relationship matrix (GRM), combined or not with a machine learning method, for breed assignment. We compared it with a previously developed model based on selected informative SNP. Four methodologies were investigated: 1) The PLS_NSC methodology: selection of SNP based on a partial least square-discriminant analysis (PLS-DA) and breed assignment by classification based on the nearest shrunken centroids (NSC) method; 2) Breed assignment based on the highest mean relatedness of an animal to the reference populations of each breed (referred to mean_GRM); 3) Breed assignment based on the highest SD of the relatedness of an animal to the reference populations of each breed (referred to SD_GRM) and 4) The GRM_SVM methodology: the use of means and SD of the relatedness defined in mean_GRM and SD_GRM methodologies combined with the linear support vector machine (SVM), a machine learning method used for classification. Regarding mean global accuracies, results showed that the use of mean_GRM or GRM_SVM was not significantly different (Bonferroni corrected P > 0.0083) than the model based on a reduced SNP panel (PLS_NSC). Moreover, the mean_GRM and GRM_SVM methodology were more efficient than PLS_NSC as it was faster to compute. Therefore, it is possible to bypass the selection of SNP and, by the use of a GRM, to develop an efficient breed assignment model. In routine, we recommend the use of GRM_SVM over mean_GRM as it gave a slightly increased global accuracy, which can help endangered breeds to be maintained. The script to execute the different methodologies can be accessed on: https://github.com/hwilmot675/Breed_assignment. Oxford University Press 2023-05-23 /pmc/articles/PMC10276639/ /pubmed/37220912 http://dx.doi.org/10.1093/jas/skad172 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Society of Animal Science. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Animal Genetics and Genomics
Wilmot, Hélène
Niehoff, Tobias
Soyeurt, Hélène
Gengler, Nicolas
Calus, Mario P L
The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method
title The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method
title_full The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method
title_fullStr The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method
title_full_unstemmed The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method
title_short The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method
title_sort use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method
topic Animal Genetics and Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10276639/
https://www.ncbi.nlm.nih.gov/pubmed/37220912
http://dx.doi.org/10.1093/jas/skad172
work_keys_str_mv AT wilmothelene theuseofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT niehofftobias theuseofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT soyeurthelene theuseofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT genglernicolas theuseofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT calusmariopl theuseofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT wilmothelene useofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT niehofftobias useofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT soyeurthelene useofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT genglernicolas useofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod
AT calusmariopl useofagenomicrelationshipmatrixforbreedassignmentofcattlebreedscomparisonandcombinationwithamachinelearningmethod