Cargando…
Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts
The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individua...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Research Foundation
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288819/ https://www.ncbi.nlm.nih.gov/pubmed/22403583 http://dx.doi.org/10.3389/fgene.2012.00027 |
_version_ | 1782224828220047360 |
---|---|
author | Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A. |
author_facet | Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A. |
author_sort | Pérez-Cabal, M. Angeles |
collection | PubMed |
description | The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered. |
format | Online Article Text |
id | pubmed-3288819 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Frontiers Research Foundation |
record_format | MEDLINE/PubMed |
spelling | pubmed-32888192012-03-08 Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A. Front Genet Genetics The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered. Frontiers Research Foundation 2012-02-28 /pmc/articles/PMC3288819/ /pubmed/22403583 http://dx.doi.org/10.3389/fgene.2012.00027 Text en Copyright © 2012 Pérez-Cabal, Vazquez, Gianola, Rosa and Weigel. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited. |
spellingShingle | Genetics Pérez-Cabal, M. Angeles Vazquez, Ana I. Gianola, Daniel Rosa, Guilherme J. M. Weigel, Kent A. Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts |
title | Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts |
title_full | Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts |
title_fullStr | Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts |
title_full_unstemmed | Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts |
title_short | Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts |
title_sort | accuracy of genome-enabled prediction in a dairy cattle population using different cross-validation layouts |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288819/ https://www.ncbi.nlm.nih.gov/pubmed/22403583 http://dx.doi.org/10.3389/fgene.2012.00027 |
work_keys_str_mv | AT perezcabalmangeles accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT vazquezanai accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT gianoladaniel accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT rosaguilhermejm accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts AT weigelkenta accuracyofgenomeenabledpredictioninadairycattlepopulationusingdifferentcrossvalidationlayouts |