Cargando…

Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle

BACKGROUND: Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of includin...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qianqian, Sahana, Goutam, Su, Guosheng, Guldbrandtsen, Bernt, Lund, Mogens Sandø, Calus, Mario P. L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247626/
https://www.ncbi.nlm.nih.gov/pubmed/30458700
http://dx.doi.org/10.1186/s12711-018-0432-8
_version_ 1783372518385516544
author Zhang, Qianqian
Sahana, Goutam
Su, Guosheng
Guldbrandtsen, Bernt
Lund, Mogens Sandø
Calus, Mario P. L.
author_facet Zhang, Qianqian
Sahana, Goutam
Su, Guosheng
Guldbrandtsen, Bernt
Lund, Mogens Sandø
Calus, Mario P. L.
author_sort Zhang, Qianqian
collection PubMed
description BACKGROUND: Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. RESULTS: All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. CONCLUSIONS: Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6247626
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62476262018-11-26 Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle Zhang, Qianqian Sahana, Goutam Su, Guosheng Guldbrandtsen, Bernt Lund, Mogens Sandø Calus, Mario P. L. Genet Sel Evol Research Article BACKGROUND: Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. RESULTS: All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. CONCLUSIONS: Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-20 /pmc/articles/PMC6247626/ /pubmed/30458700 http://dx.doi.org/10.1186/s12711-018-0432-8 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zhang, Qianqian
Sahana, Goutam
Su, Guosheng
Guldbrandtsen, Bernt
Lund, Mogens Sandø
Calus, Mario P. L.
Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
title Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
title_full Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
title_fullStr Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
title_full_unstemmed Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
title_short Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
title_sort impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247626/
https://www.ncbi.nlm.nih.gov/pubmed/30458700
http://dx.doi.org/10.1186/s12711-018-0432-8
work_keys_str_mv AT zhangqianqian impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle
AT sahanagoutam impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle
AT suguosheng impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle
AT guldbrandtsenbernt impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle
AT lundmogenssandø impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle
AT calusmariopl impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle