Cargando…

Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts

The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individua...

Descripción completa

Detalles Bibliográficos
Autores principales: Pérez-Cabal, M. Angeles, Vazquez, Ana I., Gianola, Daniel, Rosa, Guilherme J. M., Weigel, Kent A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288819/
https://www.ncbi.nlm.nih.gov/pubmed/22403583
http://dx.doi.org/10.3389/fgene.2012.00027
_version_ 1782224828220047360
author Pérez-Cabal, M. Angeles
Vazquez, Ana I.
Gianola, Daniel
Rosa, Guilherme J. M.
Weigel, Kent A.
author_facet Pérez-Cabal, M. Angeles
Vazquez, Ana I.
Gianola, Daniel
Rosa, Guilherme J. M.
Weigel, Kent A.
author_sort Pérez-Cabal, M. Angeles
collection PubMed
description The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.
format Online
Article
Text
id pubmed-3288819
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-32888192012-03-08 Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A. Front Genet Genetics The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered. Frontiers Research Foundation 2012-02-28 /pmc/articles/PMC3288819/ /pubmed/22403583 http://dx.doi.org/10.3389/fgene.2012.00027 Text en Copyright © 2012 Pérez-Cabal, Vazquez, Gianola, Rosa and Weigel. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
spellingShingle Genetics
Pérez-Cabal, M. Angeles
Vazquez, Ana I.
Gianola, Daniel
Rosa, Guilherme J. M.
Weigel, Kent A.
Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_full Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_fullStr Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_full_unstemmed Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_short Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
title_sort accuracy of genome-enabled prediction in a dairy cattle population using different cross-validation layouts
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288819/
https://www.ncbi.nlm.nih.gov/pubmed/22403583
http://dx.doi.org/10.3389/fgene.2012.00027
work_keys_str_mv AT perezcabalmangeles accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts
AT vazquezanai accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts
AT gianoladaniel accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts
AT rosaguilhermejm accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts
AT weigelkenta accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts