Cargando…

Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms

Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. F...

Descripción completa

Detalles Bibliográficos
Autores principales: Raschia, María Agustina, Ríos, Pablo Javier, Maizon, Daniel Omar, Demitrio, Daniel, Poli, Mario Andrés
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9144035/
https://www.ncbi.nlm.nih.gov/pubmed/35637693
http://dx.doi.org/10.1016/j.mex.2022.101733
_version_ 1784715951016509440
author Raschia, María Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel
Poli, Mario Andrés
author_facet Raschia, María Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel
Poli, Mario Andrés
author_sort Raschia, María Agustina
collection PubMed
description Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBV(M)), milk fat content (EBV(F)) and milk protein content (EBV(P)) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: • Predicted breeding values for animals not included in the dataset. • Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.
format Online
Article
Text
id pubmed-9144035
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-91440352022-05-29 Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms Raschia, María Agustina Ríos, Pablo Javier Maizon, Daniel Omar Demitrio, Daniel Poli, Mario Andrés MethodsX Method Article Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBV(M)), milk fat content (EBV(F)) and milk protein content (EBV(P)) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: • Predicted breeding values for animals not included in the dataset. • Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits. Elsevier 2022-05-16 /pmc/articles/PMC9144035/ /pubmed/35637693 http://dx.doi.org/10.1016/j.mex.2022.101733 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method Article
Raschia, María Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel
Poli, Mario Andrés
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_full Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_fullStr Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_full_unstemmed Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_short Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_sort methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9144035/
https://www.ncbi.nlm.nih.gov/pubmed/35637693
http://dx.doi.org/10.1016/j.mex.2022.101733
work_keys_str_mv AT raschiamariaagustina methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT riospablojavier methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT maizondanielomar methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT demitriodaniel methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms
AT polimarioandres methodologyfortheidentificationofrelevantlociformilktraitsindairycattleusingmachinelearningalgorithms