Cargando…

Multi-population genomic prediction using a multi-task Bayesian learning model

BACKGROUND: Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Liuhong, Li, Changxi, Miller, Stephen, Schenkel, Flavio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4024655/ https://www.ncbi.nlm.nih.gov/pubmed/24884927 http://dx.doi.org/10.1186/1471-2156-15-53

_version_	1782316675590258688
author	Chen, Liuhong Li, Changxi Miller, Stephen Schenkel, Flavio
author_facet	Chen, Liuhong Li, Changxi Miller, Stephen Schenkel, Flavio
author_sort	Chen, Liuhong
collection	PubMed
description	BACKGROUND: Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method. RESULTS: A multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an increase of accuracy between 0 and 0.07 in the Ayrshire validation set when 28,206 SNPs were used, while the simple data pooling method resulted in a reduction of accuracy for all traits except for protein percentage. When 246,668 SNPs were used, the accuracy achieved from the multi-task model increased by 0 to 0.03, while using the pooling method resulted in a reduction of accuracy by 0.01 to 0.09. In the Holstein population, the three methods had similar performance. CONCLUSIONS: Results in this study suggest that the proposed multi-task Bayesian learning model for multi-population genomic prediction is effective and has the potential to improve the accuracy of genomic prediction.
format	Online Article Text
id	pubmed-4024655
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40246552014-05-30 Multi-population genomic prediction using a multi-task Bayesian learning model Chen, Liuhong Li, Changxi Miller, Stephen Schenkel, Flavio BMC Genet Methodology Article BACKGROUND: Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method. RESULTS: A multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an increase of accuracy between 0 and 0.07 in the Ayrshire validation set when 28,206 SNPs were used, while the simple data pooling method resulted in a reduction of accuracy for all traits except for protein percentage. When 246,668 SNPs were used, the accuracy achieved from the multi-task model increased by 0 to 0.03, while using the pooling method resulted in a reduction of accuracy by 0.01 to 0.09. In the Holstein population, the three methods had similar performance. CONCLUSIONS: Results in this study suggest that the proposed multi-task Bayesian learning model for multi-population genomic prediction is effective and has the potential to improve the accuracy of genomic prediction. BioMed Central 2014-05-03 /pmc/articles/PMC4024655/ /pubmed/24884927 http://dx.doi.org/10.1186/1471-2156-15-53 Text en Copyright © 2014 Chen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Methodology Article Chen, Liuhong Li, Changxi Miller, Stephen Schenkel, Flavio Multi-population genomic prediction using a multi-task Bayesian learning model
title	Multi-population genomic prediction using a multi-task Bayesian learning model
title_full	Multi-population genomic prediction using a multi-task Bayesian learning model
title_fullStr	Multi-population genomic prediction using a multi-task Bayesian learning model
title_full_unstemmed	Multi-population genomic prediction using a multi-task Bayesian learning model
title_short	Multi-population genomic prediction using a multi-task Bayesian learning model
title_sort	multi-population genomic prediction using a multi-task bayesian learning model
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4024655/ https://www.ncbi.nlm.nih.gov/pubmed/24884927 http://dx.doi.org/10.1186/1471-2156-15-53
work_keys_str_mv	AT chenliuhong multipopulationgenomicpredictionusingamultitaskbayesianlearningmodel AT lichangxi multipopulationgenomicpredictionusingamultitaskbayesianlearningmodel AT millerstephen multipopulationgenomicpredictionusingamultitaskbayesianlearningmodel AT schenkelflavio multipopulationgenomicpredictionusingamultitaskbayesianlearningmodel

Multi-population genomic prediction using a multi-task Bayesian learning model

Ejemplares similares