Cargando…

Sequence- vs. chip-assisted genomic selection: accurate biological information is advised

BACKGROUND: The development of next-generation sequencing technologies (NGS) has made the use of whole-genome sequence data for routine genetic evaluations possible, which has triggered a considerable interest in animal and plant breeding fields. Here, we investigated whether complete or partial seq...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pérez-Enciso, Miguel, Rincón, Juan C, Legarra, Andrés
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4424891/ https://www.ncbi.nlm.nih.gov/pubmed/25956961 http://dx.doi.org/10.1186/s12711-015-0117-5

_version_	1782370399099551744
author	Pérez-Enciso, Miguel Rincón, Juan C Legarra, Andrés
author_facet	Pérez-Enciso, Miguel Rincón, Juan C Legarra, Andrés
author_sort	Pérez-Enciso, Miguel
collection	PubMed
description	BACKGROUND: The development of next-generation sequencing technologies (NGS) has made the use of whole-genome sequence data for routine genetic evaluations possible, which has triggered a considerable interest in animal and plant breeding fields. Here, we investigated whether complete or partial sequence data can improve upon existing SNP (single nucleotide polymorphism) array-based selection strategies by simulation using a mixed coalescence - gene-dropping approach. RESULTS: We simulated 20 or 100 causal mutations (quantitative trait nucleotides, QTN) within 65 predefined ‘gene’ regions, each 10 kb long, within a genome composed of ten 3-Mb chromosomes. We compared prediction accuracy by cross-validation using a medium-density chip (7.5 k SNPs), a high-density (HD, 17 k) and sequence data (335 k). Genetic evaluation was based on a GBLUP method. The simulations showed: (1) a law of diminishing returns with increasing number of SNPs; (2) a modest effect of SNP ascertainment bias in arrays; (3) a small advantage of using whole-genome sequence data vs. HD arrays i.e. ~4%; (4) a minor effect of NGS errors except when imputation error rates are high (≥20%); and (5) if QTN were known, prediction accuracy approached 1. Since this is obviously unrealistic, we explored milder assumptions. We showed that, if all SNPs within causal genes were included in the prediction model, accuracy could also dramatically increase by ~40%. However, this criterion was highly sensitive to either misspecification (including wrong genes) or to the use of an incomplete gene list; in these cases, accuracy fell rapidly towards that reached when all SNPs from sequence data were blindly included in the model. CONCLUSIONS: Our study shows that, unless an accurate prior estimate on the functionality of SNPs can be included in the predictor, there is a law of diminishing returns with increasing SNP density. As a result, use of whole-genome sequence data may not result in a highly increased selection response over high-density genotyping.
format	Online Article Text
id	pubmed-4424891
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44248912015-05-09 Sequence- vs. chip-assisted genomic selection: accurate biological information is advised Pérez-Enciso, Miguel Rincón, Juan C Legarra, Andrés Genet Sel Evol Research BACKGROUND: The development of next-generation sequencing technologies (NGS) has made the use of whole-genome sequence data for routine genetic evaluations possible, which has triggered a considerable interest in animal and plant breeding fields. Here, we investigated whether complete or partial sequence data can improve upon existing SNP (single nucleotide polymorphism) array-based selection strategies by simulation using a mixed coalescence - gene-dropping approach. RESULTS: We simulated 20 or 100 causal mutations (quantitative trait nucleotides, QTN) within 65 predefined ‘gene’ regions, each 10 kb long, within a genome composed of ten 3-Mb chromosomes. We compared prediction accuracy by cross-validation using a medium-density chip (7.5 k SNPs), a high-density (HD, 17 k) and sequence data (335 k). Genetic evaluation was based on a GBLUP method. The simulations showed: (1) a law of diminishing returns with increasing number of SNPs; (2) a modest effect of SNP ascertainment bias in arrays; (3) a small advantage of using whole-genome sequence data vs. HD arrays i.e. ~4%; (4) a minor effect of NGS errors except when imputation error rates are high (≥20%); and (5) if QTN were known, prediction accuracy approached 1. Since this is obviously unrealistic, we explored milder assumptions. We showed that, if all SNPs within causal genes were included in the prediction model, accuracy could also dramatically increase by ~40%. However, this criterion was highly sensitive to either misspecification (including wrong genes) or to the use of an incomplete gene list; in these cases, accuracy fell rapidly towards that reached when all SNPs from sequence data were blindly included in the model. CONCLUSIONS: Our study shows that, unless an accurate prior estimate on the functionality of SNPs can be included in the predictor, there is a law of diminishing returns with increasing SNP density. As a result, use of whole-genome sequence data may not result in a highly increased selection response over high-density genotyping. BioMed Central 2015-05-09 /pmc/articles/PMC4424891/ /pubmed/25956961 http://dx.doi.org/10.1186/s12711-015-0117-5 Text en © Perez-Enciso et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Pérez-Enciso, Miguel Rincón, Juan C Legarra, Andrés Sequence- vs. chip-assisted genomic selection: accurate biological information is advised
title	Sequence- vs. chip-assisted genomic selection: accurate biological information is advised
title_full	Sequence- vs. chip-assisted genomic selection: accurate biological information is advised
title_fullStr	Sequence- vs. chip-assisted genomic selection: accurate biological information is advised
title_full_unstemmed	Sequence- vs. chip-assisted genomic selection: accurate biological information is advised
title_short	Sequence- vs. chip-assisted genomic selection: accurate biological information is advised
title_sort	sequence- vs. chip-assisted genomic selection: accurate biological information is advised
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4424891/ https://www.ncbi.nlm.nih.gov/pubmed/25956961 http://dx.doi.org/10.1186/s12711-015-0117-5
work_keys_str_mv	AT perezencisomiguel sequencevschipassistedgenomicselectionaccuratebiologicalinformationisadvised AT rinconjuanc sequencevschipassistedgenomicselectionaccuratebiologicalinformationisadvised AT legarraandres sequencevschipassistedgenomicselectionaccuratebiologicalinformationisadvised

Sequence- vs. chip-assisted genomic selection: accurate biological information is advised

Ejemplares similares