Cargando…

Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts

The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individua...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pérez-Cabal, M. Angeles, Vazquez, Ana I., Gianola, Daniel, Rosa, Guilherme J. M., Weigel, Kent A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Research Foundation 2012
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288819/ https://www.ncbi.nlm.nih.gov/pubmed/22403583 http://dx.doi.org/10.3389/fgene.2012.00027

_version_	1782224828220047360
author	Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A.
author_facet	Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A.
author_sort	Pérez-Cabal, M. Angeles
collection	PubMed
description	The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.
format	Online Article Text
id	pubmed-3288819
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Frontiers Research Foundation
record_format	MEDLINE/PubMed
spelling	pubmed-32888192012-03-08 Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A. Front Genet Genetics The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered. Frontiers Research Foundation 2012-02-28 /pmc/articles/PMC3288819/ /pubmed/22403583 http://dx.doi.org/10.3389/fgene.2012.00027 Text en Copyright © 2012 Pérez-Cabal, Vazquez, Gianola, Rosa and Weigel. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
spellingShingle	Genetics Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A. Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title	Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_full	Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_fullStr	Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_full_unstemmed	Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_short	Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_sort	accuracy of genome-enabled prediction in a dairy cattle population using different cross-validation layouts
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288819/ https://www.ncbi.nlm.nih.gov/pubmed/22403583 http://dx.doi.org/10.3389/fgene.2012.00027
work_keys_str_mv	AT perezcabalmangeles accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT vazquezanai accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT gianoladaniel accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT rosaguilhermejm accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT weigelkenta accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts

Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts

Ejemplares similares