Cargando…
A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses
BACKGROUND: To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear un...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262255/ https://www.ncbi.nlm.nih.gov/pubmed/25253441 http://dx.doi.org/10.1186/1297-9686-46-50 |
_version_ | 1782348405410889728 |
---|---|
author | Fernando, Rohan L Dekkers, Jack CM Garrick, Dorian J |
author_facet | Fernando, Rohan L Dekkers, Jack CM Garrick, Dorian J |
author_sort | Fernando, Rohan L |
collection | PubMed |
description | BACKGROUND: To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase. METHODS: A strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores. DISCUSSION: In SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1297-9686-46-50) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4262255 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42622552014-12-11 A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses Fernando, Rohan L Dekkers, Jack CM Garrick, Dorian J Genet Sel Evol Research BACKGROUND: To obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase. METHODS: A strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores. DISCUSSION: In SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1297-9686-46-50) contains supplementary material, which is available to authorized users. BioMed Central 2014-09-22 /pmc/articles/PMC4262255/ /pubmed/25253441 http://dx.doi.org/10.1186/1297-9686-46-50 Text en © Fernando et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Fernando, Rohan L Dekkers, Jack CM Garrick, Dorian J A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses |
title | A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses |
title_full | A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses |
title_fullStr | A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses |
title_full_unstemmed | A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses |
title_short | A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses |
title_sort | class of bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262255/ https://www.ncbi.nlm.nih.gov/pubmed/25253441 http://dx.doi.org/10.1186/1297-9686-46-50 |
work_keys_str_mv | AT fernandorohanl aclassofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses AT dekkersjackcm aclassofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses AT garrickdorianj aclassofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses AT fernandorohanl classofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses AT dekkersjackcm classofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses AT garrickdorianj classofbayesianmethodstocombinelargenumbersofgenotypedandnongenotypedanimalsforwholegenomeanalyses |