Cargando…

Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction

Training set size is an important determinant of genomic prediction accuracy. Plant breeding programs are characterized by a high degree of structuring, particularly into populations. This hampers the establishment of large training sets for each population. Pooling populations increases training se...

Descripción completa

Detalles Bibliográficos
Autores principales:	Technow, Frank, Totir, L. Radu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Genetics Society of America 2015
Materias:	Genomic Selection
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528317/ https://www.ncbi.nlm.nih.gov/pubmed/26024866 http://dx.doi.org/10.1534/g3.115.019299

_version_	1782384667973910528
author	Technow, Frank Totir, L. Radu
author_facet	Technow, Frank Totir, L. Radu
author_sort	Technow, Frank
collection	PubMed
description	Training set size is an important determinant of genomic prediction accuracy. Plant breeding programs are characterized by a high degree of structuring, particularly into populations. This hampers the establishment of large training sets for each population. Pooling populations increases training set size but ignores unique genetic characteristics of each. A possible solution is partial pooling with multilevel models, which allows estimating population-specific marker effects while still leveraging information across populations. We developed a Bayesian multilevel whole-genome regression model and compared its performance with that of the popular BayesA model applied to each population separately (no pooling) and to the joined data set (complete pooling). As an example, we analyzed a wide array of traits from the nested association mapping maize population. There we show that for small population sizes (e.g., <50), partial pooling increased prediction accuracy over no or complete pooling for populations represented in the training set. No pooling was superior; however, when populations were large. In another example data set of interconnected biparental maize populations either partial or complete pooling was superior, depending on the trait. A simulation showed that no pooling is superior when differences in genetic effects among populations are large and partial pooling when they are intermediate. With small differences, partial and complete pooling achieved equally high accuracy. For prediction of new populations, partial and complete pooling had very similar accuracy in all cases. We conclude that partial pooling with multilevel models can maximize the potential of pooling by making optimal use of information in pooled training sets.
format	Online Article Text
id	pubmed-4528317
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Genetics Society of America
record_format	MEDLINE/PubMed
spelling	pubmed-45283172015-08-10 Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction Technow, Frank Totir, L. Radu G3 (Bethesda) Genomic Selection Training set size is an important determinant of genomic prediction accuracy. Plant breeding programs are characterized by a high degree of structuring, particularly into populations. This hampers the establishment of large training sets for each population. Pooling populations increases training set size but ignores unique genetic characteristics of each. A possible solution is partial pooling with multilevel models, which allows estimating population-specific marker effects while still leveraging information across populations. We developed a Bayesian multilevel whole-genome regression model and compared its performance with that of the popular BayesA model applied to each population separately (no pooling) and to the joined data set (complete pooling). As an example, we analyzed a wide array of traits from the nested association mapping maize population. There we show that for small population sizes (e.g., <50), partial pooling increased prediction accuracy over no or complete pooling for populations represented in the training set. No pooling was superior; however, when populations were large. In another example data set of interconnected biparental maize populations either partial or complete pooling was superior, depending on the trait. A simulation showed that no pooling is superior when differences in genetic effects among populations are large and partial pooling when they are intermediate. With small differences, partial and complete pooling achieved equally high accuracy. For prediction of new populations, partial and complete pooling had very similar accuracy in all cases. We conclude that partial pooling with multilevel models can maximize the potential of pooling by making optimal use of information in pooled training sets. Genetics Society of America 2015-05-29 /pmc/articles/PMC4528317/ /pubmed/26024866 http://dx.doi.org/10.1534/g3.115.019299 Text en Copyright © 2015 Technow and Totir http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Genomic Selection Technow, Frank Totir, L. Radu Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction
title	Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction
title_full	Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction
title_fullStr	Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction
title_full_unstemmed	Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction
title_short	Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction
title_sort	using bayesian multilevel whole genome regression models for partial pooling of training sets in genomic prediction
topic	Genomic Selection
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4528317/ https://www.ncbi.nlm.nih.gov/pubmed/26024866 http://dx.doi.org/10.1534/g3.115.019299
work_keys_str_mv	AT technowfrank usingbayesianmultilevelwholegenomeregressionmodelsforpartialpoolingoftrainingsetsingenomicprediction AT totirlradu usingbayesianmultilevelwholegenomeregressionmodelsforpartialpoolingoftrainingsetsingenomicprediction

Using Bayesian Multilevel Whole Genome Regression Models for Partial Pooling of Training Sets in Genomic Prediction

Ejemplares similares