Cargando…

Genome-wide prediction of discrete traits using bayesian regressions and machine learning

BACKGROUND: Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only wit...

Descripción completa

Detalles Bibliográficos
Autores principales:	González-Recio, Oscar, Forni, Selma
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400433/ https://www.ncbi.nlm.nih.gov/pubmed/21329522 http://dx.doi.org/10.1186/1297-9686-43-7

_version_	1782238489450905600
author	González-Recio, Oscar Forni, Selma
author_facet	González-Recio, Oscar Forni, Selma
author_sort	González-Recio, Oscar
collection	PubMed
description	BACKGROUND: Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. METHODS: This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability. RESULTS: The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. CONCLUSIONS: The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem.
format	Online Article Text
id	pubmed-3400433
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34004332012-07-24 Genome-wide prediction of discrete traits using bayesian regressions and machine learning González-Recio, Oscar Forni, Selma Genet Sel Evol Research BACKGROUND: Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. METHODS: This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability. RESULTS: The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. CONCLUSIONS: The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem. BioMed Central 2011-02-17 /pmc/articles/PMC3400433/ /pubmed/21329522 http://dx.doi.org/10.1186/1297-9686-43-7 Text en Copyright ©2011 González-Recio and Forni; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research González-Recio, Oscar Forni, Selma Genome-wide prediction of discrete traits using bayesian regressions and machine learning
title	Genome-wide prediction of discrete traits using bayesian regressions and machine learning
title_full	Genome-wide prediction of discrete traits using bayesian regressions and machine learning
title_fullStr	Genome-wide prediction of discrete traits using bayesian regressions and machine learning
title_full_unstemmed	Genome-wide prediction of discrete traits using bayesian regressions and machine learning
title_short	Genome-wide prediction of discrete traits using bayesian regressions and machine learning
title_sort	genome-wide prediction of discrete traits using bayesian regressions and machine learning
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400433/ https://www.ncbi.nlm.nih.gov/pubmed/21329522 http://dx.doi.org/10.1186/1297-9686-43-7
work_keys_str_mv	AT gonzalezreciooscar genomewidepredictionofdiscretetraitsusingbayesianregressionsandmachinelearning AT forniselma genomewidepredictionofdiscretetraitsusingbayesianregressionsandmachinelearning

Genome-wide prediction of discrete traits using bayesian regressions and machine learning

Ejemplares similares