Cargando…

Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts

BACKGROUND: Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of all...

Descripción completa

Detalles Bibliográficos
Autores principales: Bouwman, Aniek C., Hayes, Ben J., Calus, Mario P. L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5662034/
https://www.ncbi.nlm.nih.gov/pubmed/29084514
http://dx.doi.org/10.1186/s12711-017-0355-9
_version_ 1783274571579785216
author Bouwman, Aniek C.
Hayes, Ben J.
Calus, Mario P. L.
author_facet Bouwman, Aniek C.
Hayes, Ben J.
Calus, Mario P. L.
author_sort Bouwman, Aniek C.
collection PubMed
description BACKGROUND: Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance. RESULTS: In a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from − 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of − 0.8, while the mean squared error of prediction was minimized with a scaling parameter of − 1, i.e., RRcs. CONCLUSIONS: Large differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not.
format Online
Article
Text
id pubmed-5662034
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56620342017-11-01 Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts Bouwman, Aniek C. Hayes, Ben J. Calus, Mario P. L. Genet Sel Evol Research Article BACKGROUND: Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance. RESULTS: In a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from − 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of − 0.8, while the mean squared error of prediction was minimized with a scaling parameter of − 1, i.e., RRcs. CONCLUSIONS: Large differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not. BioMed Central 2017-10-30 /pmc/articles/PMC5662034/ /pubmed/29084514 http://dx.doi.org/10.1186/s12711-017-0355-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bouwman, Aniek C.
Hayes, Ben J.
Calus, Mario P. L.
Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts
title Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts
title_full Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts
title_fullStr Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts
title_full_unstemmed Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts
title_short Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts
title_sort estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5662034/
https://www.ncbi.nlm.nih.gov/pubmed/29084514
http://dx.doi.org/10.1186/s12711-017-0355-9
work_keys_str_mv AT bouwmananiekc estimatedallelesubstitutioneffectsunderlyinggenomicevaluationmodelsdependonthescalingofallelecounts
AT hayesbenj estimatedallelesubstitutioneffectsunderlyinggenomicevaluationmodelsdependonthescalingofallelecounts
AT calusmariopl estimatedallelesubstitutioneffectsunderlyinggenomicevaluationmodelsdependonthescalingofallelecounts