Cargando…

Selecting sequence variants to improve genomic predictions for dairy cattle

BACKGROUND: Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July...

Descripción completa

Detalles Bibliográficos
Autores principales:	VanRaden, Paul M., Tooker, Melvin E., O’Connell, Jeffrey R., Cole, John B., Bickhart, Derek M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339980/ https://www.ncbi.nlm.nih.gov/pubmed/28270096 http://dx.doi.org/10.1186/s12711-017-0307-4

_version_	1782512759321133056
author	VanRaden, Paul M. Tooker, Melvin E. O’Connell, Jeffrey R. Cole, John B. Bickhart, Derek M.
author_facet	VanRaden, Paul M. Tooker, Melvin E. O’Connell, Jeffrey R. Cole, John B. Bickhart, Derek M.
author_sort	VanRaden, Paul M.
collection	PubMed
description	BACKGROUND: Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project. METHODS: Candidate sequence variants for 444 Holstein animals were combined with high-density (HD) imputed genotypes for 26,970 progeny-tested Holstein bulls. Test 1 included single nucleotide polymorphisms (SNPs) for 481,904 candidate sequence variants. Test 2 also included 249,966 insertions-deletions (InDels). After merging sequence variants with 312,614 HD SNPs and editing steps, Tests 1 and 2 included 762,588 and 1,003,453 variants, respectively. Imputation quality from findhap software was assessed with 404 of the sequenced animals in the reference population and 40 randomly chosen animals for validation. Their sequence genotypes were reduced to the subset of genotypes that were in common with HD genotypes and then imputed back to sequence. Predictions were tested for 33 traits using 2015 data of 3983 US validation bulls with daughters that were first phenotyped after August 2011. RESULTS: The average percentage of correctly imputed variants across all chromosomes was 97.2 for Test 1 and 97.0 for Test 2. Total time required to prepare, edit, impute, and estimate the effects of sequence variants for 27,235 bulls was about 1 week using less than 33 threads. Many sequence variants had larger estimated effects than nearby HD SNPs, but prediction reliability improved only by 0.6 percentage points in Test 1 when sequence SNPs were added to HD SNPs and by 0.4 percentage points in Test 2 when sequence SNPs and InDels were included. However, selecting the 16,648 candidate SNPs with the largest estimated effects and adding them to the 60,671 SNPs used in routine evaluations improved reliabilities by 2.7 percentage points. CONCLUSIONS: Reliabilities for genomic predictions improved when selected sequence variants were added; gains were similar for simulated and real data for the same population, and larger than previous gains obtained by adding HD SNPs. With many genotyped animals, many data sources, and millions of variants, computing strategies must efficiently balance costs of imputation, selection, and prediction to obtain subsets of markers that provide the highest accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-017-0307-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5339980
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-53399802017-03-10 Selecting sequence variants to improve genomic predictions for dairy cattle VanRaden, Paul M. Tooker, Melvin E. O’Connell, Jeffrey R. Cole, John B. Bickhart, Derek M. Genet Sel Evol Research Article BACKGROUND: Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project. METHODS: Candidate sequence variants for 444 Holstein animals were combined with high-density (HD) imputed genotypes for 26,970 progeny-tested Holstein bulls. Test 1 included single nucleotide polymorphisms (SNPs) for 481,904 candidate sequence variants. Test 2 also included 249,966 insertions-deletions (InDels). After merging sequence variants with 312,614 HD SNPs and editing steps, Tests 1 and 2 included 762,588 and 1,003,453 variants, respectively. Imputation quality from findhap software was assessed with 404 of the sequenced animals in the reference population and 40 randomly chosen animals for validation. Their sequence genotypes were reduced to the subset of genotypes that were in common with HD genotypes and then imputed back to sequence. Predictions were tested for 33 traits using 2015 data of 3983 US validation bulls with daughters that were first phenotyped after August 2011. RESULTS: The average percentage of correctly imputed variants across all chromosomes was 97.2 for Test 1 and 97.0 for Test 2. Total time required to prepare, edit, impute, and estimate the effects of sequence variants for 27,235 bulls was about 1 week using less than 33 threads. Many sequence variants had larger estimated effects than nearby HD SNPs, but prediction reliability improved only by 0.6 percentage points in Test 1 when sequence SNPs were added to HD SNPs and by 0.4 percentage points in Test 2 when sequence SNPs and InDels were included. However, selecting the 16,648 candidate SNPs with the largest estimated effects and adding them to the 60,671 SNPs used in routine evaluations improved reliabilities by 2.7 percentage points. CONCLUSIONS: Reliabilities for genomic predictions improved when selected sequence variants were added; gains were similar for simulated and real data for the same population, and larger than previous gains obtained by adding HD SNPs. With many genotyped animals, many data sources, and millions of variants, computing strategies must efficiently balance costs of imputation, selection, and prediction to obtain subsets of markers that provide the highest accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-017-0307-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-07 /pmc/articles/PMC5339980/ /pubmed/28270096 http://dx.doi.org/10.1186/s12711-017-0307-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article VanRaden, Paul M. Tooker, Melvin E. O’Connell, Jeffrey R. Cole, John B. Bickhart, Derek M. Selecting sequence variants to improve genomic predictions for dairy cattle
title	Selecting sequence variants to improve genomic predictions for dairy cattle
title_full	Selecting sequence variants to improve genomic predictions for dairy cattle
title_fullStr	Selecting sequence variants to improve genomic predictions for dairy cattle
title_full_unstemmed	Selecting sequence variants to improve genomic predictions for dairy cattle
title_short	Selecting sequence variants to improve genomic predictions for dairy cattle
title_sort	selecting sequence variants to improve genomic predictions for dairy cattle
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339980/ https://www.ncbi.nlm.nih.gov/pubmed/28270096 http://dx.doi.org/10.1186/s12711-017-0307-4
work_keys_str_mv	AT vanradenpaulm selectingsequencevariantstoimprovegenomicpredictionsfordairycattle AT tookermelvine selectingsequencevariantstoimprovegenomicpredictionsfordairycattle AT oconnelljeffreyr selectingsequencevariantstoimprovegenomicpredictionsfordairycattle AT colejohnb selectingsequencevariantstoimprovegenomicpredictionsfordairycattle AT bickhartderekm selectingsequencevariantstoimprovegenomicpredictionsfordairycattle

Selecting sequence variants to improve genomic predictions for dairy cattle

Ejemplares similares