Cargando…

A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses

BACKGROUND: To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear un...

Descripción completa

Detalles Bibliográficos
Autores principales: Fernando, Rohan L, Dekkers, Jack CM, Garrick, Dorian J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262255/
https://www.ncbi.nlm.nih.gov/pubmed/25253441
http://dx.doi.org/10.1186/1297-9686-46-50
_version_ 1782348405410889728
author Fernando, Rohan L
Dekkers, Jack CM
Garrick, Dorian J
author_facet Fernando, Rohan L
Dekkers, Jack CM
Garrick, Dorian J
author_sort Fernando, Rohan L
collection PubMed
description BACKGROUND: To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase. METHODS: A strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores. DISCUSSION: In SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1297-9686-46-50) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4262255
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42622552014-12-11 A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses Fernando, Rohan L Dekkers, Jack CM Garrick, Dorian J Genet Sel Evol Research BACKGROUND: To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase. METHODS: A strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores. DISCUSSION: In SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1297-9686-46-50) contains supplementary material, which is available to authorized users. BioMed Central 2014-09-22 /pmc/articles/PMC4262255/ /pubmed/25253441 http://dx.doi.org/10.1186/1297-9686-46-50 Text en © Fernando et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Fernando, Rohan L
Dekkers, Jack CM
Garrick, Dorian J
A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
title A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
title_full A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
title_fullStr A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
title_full_unstemmed A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
title_short A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
title_sort class of bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262255/
https://www.ncbi.nlm.nih.gov/pubmed/25253441
http://dx.doi.org/10.1186/1297-9686-46-50
work_keys_str_mv AT fernandorohanl aclassofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses
AT dekkersjackcm aclassofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses
AT garrickdorianj aclassofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses
AT fernandorohanl classofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses
AT dekkersjackcm classofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses
AT garrickdorianj classofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses