Cargando…
Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle
BACKGROUND: Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of includin...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247626/ https://www.ncbi.nlm.nih.gov/pubmed/30458700 http://dx.doi.org/10.1186/s12711-018-0432-8 |
_version_ | 1783372518385516544 |
---|---|
author | Zhang, Qianqian Sahana, Goutam Su, Guosheng Guldbrandtsen, Bernt Lund, Mogens Sandø Calus, Mario P. L. |
author_facet | Zhang, Qianqian Sahana, Goutam Su, Guosheng Guldbrandtsen, Bernt Lund, Mogens Sandø Calus, Mario P. L. |
author_sort | Zhang, Qianqian |
collection | PubMed |
description | BACKGROUND: Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. RESULTS: All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. CONCLUSIONS: Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6247626 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62476262018-11-26 Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle Zhang, Qianqian Sahana, Goutam Su, Guosheng Guldbrandtsen, Bernt Lund, Mogens Sandø Calus, Mario P. L. Genet Sel Evol Research Article BACKGROUND: Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. RESULTS: All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. CONCLUSIONS: Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-20 /pmc/articles/PMC6247626/ /pubmed/30458700 http://dx.doi.org/10.1186/s12711-018-0432-8 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Zhang, Qianqian Sahana, Goutam Su, Guosheng Guldbrandtsen, Bernt Lund, Mogens Sandø Calus, Mario P. L. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle |
title | Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle |
title_full | Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle |
title_fullStr | Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle |
title_full_unstemmed | Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle |
title_short | Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle |
title_sort | impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247626/ https://www.ncbi.nlm.nih.gov/pubmed/30458700 http://dx.doi.org/10.1186/s12711-018-0432-8 |
work_keys_str_mv | AT zhangqianqian impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle AT sahanagoutam impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle AT suguosheng impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle AT guldbrandtsenbernt impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle AT lundmogenssandø impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle AT calusmariopl impactofrareandlowfrequencysequencevariantsonreliabilityofgenomicpredictionindairycattle |