Cargando…

Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information

Introduction: The ability to accurately predict breed composition using genomic information has many potential uses including increasing the accuracy of genetic evaluations, optimising mating plans and as a parameter for genotype quality control. The objective of the present study was to use a datab...

Descripción completa

Detalles Bibliográficos
Autores principales: Ryan, C. A., Berry, D. P., O’Brien, A., Pabiou, T., Purfield, D. C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10237237/
https://www.ncbi.nlm.nih.gov/pubmed/37274789
http://dx.doi.org/10.3389/fgene.2023.1120312
_version_ 1785053116994945024
author Ryan, C. A.
Berry, D. P.
O’Brien, A.
Pabiou, T.
Purfield, D. C.
author_facet Ryan, C. A.
Berry, D. P.
O’Brien, A.
Pabiou, T.
Purfield, D. C.
author_sort Ryan, C. A.
collection PubMed
description Introduction: The ability to accurately predict breed composition using genomic information has many potential uses including increasing the accuracy of genetic evaluations, optimising mating plans and as a parameter for genotype quality control. The objective of the present study was to use a database of genotyped purebred and crossbred cattle to compare breed composition predictions using a freely available software, Admixture, with those from a single nucleotide polymorphism Best Linear Unbiased Prediction (SNP-BLUP) approach; a supplementary objective was to determine the accuracy and general robustness of low-density genotype panels for predicting breed composition. Methods: All animals had genotype information on 49,213 autosomal single nucleotide polymorphism (SNPs). Thirteen breeds were included in the analysis and 500 purebred animals per breed were used to establish the breed training populations. Accuracy of breed composition prediction was determined using a separate validation population of 3,146 verified purebred and 4,330 two and three-way crossbred cattle. Results: When all 49,213 autosomal SNPs were used for breed prediction, a minimal absolute mean difference of 0.04 between Admixture vs. SNP-BLUP breed predictions was evident. For crossbreds, the average absolute difference in breed prediction estimates generated using SNP-BLUP and Admixture was 0.068 with a root mean square error of 0.08. Breed predictions from low-density SNP panels were generated using both SNP-BLUP and Admixture and compared to breed prediction estimates using all 49,213 SNPs (representing the gold standard). Breed composition estimates of crossbreds required more SNPs than predicting the breed composition of purebreds. SNP-BLUP required ≥3,000 SNPs to predict crossbred breed composition, but only 2,000 SNPs were required to predict purebred breed status. The absolute mean (standard deviation) difference across all panels <2,000 SNPs was 0.091 (0.054) and 0.315 (0.316) when predicting the breed composition of all animals using Admixture and SNP-BLUP, respectively compared to the gold standard prediction. Discussion: Nevertheless, a negligible absolute mean (standard deviation) difference of 0.009 (0.123) in breed prediction existed between SNP-BLUP and Admixture once ≥3,000 SNPs were considered, indicating that the prediction of breed composition could be readily integrated into SNP-BLUP pipelines used for genomic evaluations thereby avoiding the necessity for a stand-alone software.
format Online
Article
Text
id pubmed-10237237
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-102372372023-06-03 Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information Ryan, C. A. Berry, D. P. O’Brien, A. Pabiou, T. Purfield, D. C. Front Genet Genetics Introduction: The ability to accurately predict breed composition using genomic information has many potential uses including increasing the accuracy of genetic evaluations, optimising mating plans and as a parameter for genotype quality control. The objective of the present study was to use a database of genotyped purebred and crossbred cattle to compare breed composition predictions using a freely available software, Admixture, with those from a single nucleotide polymorphism Best Linear Unbiased Prediction (SNP-BLUP) approach; a supplementary objective was to determine the accuracy and general robustness of low-density genotype panels for predicting breed composition. Methods: All animals had genotype information on 49,213 autosomal single nucleotide polymorphism (SNPs). Thirteen breeds were included in the analysis and 500 purebred animals per breed were used to establish the breed training populations. Accuracy of breed composition prediction was determined using a separate validation population of 3,146 verified purebred and 4,330 two and three-way crossbred cattle. Results: When all 49,213 autosomal SNPs were used for breed prediction, a minimal absolute mean difference of 0.04 between Admixture vs. SNP-BLUP breed predictions was evident. For crossbreds, the average absolute difference in breed prediction estimates generated using SNP-BLUP and Admixture was 0.068 with a root mean square error of 0.08. Breed predictions from low-density SNP panels were generated using both SNP-BLUP and Admixture and compared to breed prediction estimates using all 49,213 SNPs (representing the gold standard). Breed composition estimates of crossbreds required more SNPs than predicting the breed composition of purebreds. SNP-BLUP required ≥3,000 SNPs to predict crossbred breed composition, but only 2,000 SNPs were required to predict purebred breed status. The absolute mean (standard deviation) difference across all panels <2,000 SNPs was 0.091 (0.054) and 0.315 (0.316) when predicting the breed composition of all animals using Admixture and SNP-BLUP, respectively compared to the gold standard prediction. Discussion: Nevertheless, a negligible absolute mean (standard deviation) difference of 0.009 (0.123) in breed prediction existed between SNP-BLUP and Admixture once ≥3,000 SNPs were considered, indicating that the prediction of breed composition could be readily integrated into SNP-BLUP pipelines used for genomic evaluations thereby avoiding the necessity for a stand-alone software. Frontiers Media S.A. 2023-05-15 /pmc/articles/PMC10237237/ /pubmed/37274789 http://dx.doi.org/10.3389/fgene.2023.1120312 Text en Copyright © 2023 Ryan, Berry, O’Brien, Pabiou and Purfield. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Ryan, C. A.
Berry, D. P.
O’Brien, A.
Pabiou, T.
Purfield, D. C.
Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information
title Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information
title_full Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information
title_fullStr Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information
title_full_unstemmed Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information
title_short Evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information
title_sort evaluating the use of statistical and machine learning methods for estimating breed composition of purebred and crossbred animals in thirteen cattle breeds using genomic information
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10237237/
https://www.ncbi.nlm.nih.gov/pubmed/37274789
http://dx.doi.org/10.3389/fgene.2023.1120312
work_keys_str_mv AT ryanca evaluatingtheuseofstatisticalandmachinelearningmethodsforestimatingbreedcompositionofpurebredandcrossbredanimalsinthirteencattlebreedsusinggenomicinformation
AT berrydp evaluatingtheuseofstatisticalandmachinelearningmethodsforestimatingbreedcompositionofpurebredandcrossbredanimalsinthirteencattlebreedsusinggenomicinformation
AT obriena evaluatingtheuseofstatisticalandmachinelearningmethodsforestimatingbreedcompositionofpurebredandcrossbredanimalsinthirteencattlebreedsusinggenomicinformation
AT pabiout evaluatingtheuseofstatisticalandmachinelearningmethodsforestimatingbreedcompositionofpurebredandcrossbredanimalsinthirteencattlebreedsusinggenomicinformation
AT purfielddc evaluatingtheuseofstatisticalandmachinelearningmethodsforestimatingbreedcompositionofpurebredandcrossbredanimalsinthirteencattlebreedsusinggenomicinformation