Cargando…

Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method

BACKGROUND: Cross-validation tools are used increasingly to validate and compare genetic evaluation methods but analytical properties of cross-validation methods are rarely described. There is also a lack of cross-validation tools for complex problems such as prediction of indirect effects (e.g. mat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Legarra, Andres, Reverter, Antonio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219059/ https://www.ncbi.nlm.nih.gov/pubmed/30400768 http://dx.doi.org/10.1186/s12711-018-0426-6

_version_	1783368576314376192
author	Legarra, Andres Reverter, Antonio
author_facet	Legarra, Andres Reverter, Antonio
author_sort	Legarra, Andres
collection	PubMed
description	BACKGROUND: Cross-validation tools are used increasingly to validate and compare genetic evaluation methods but analytical properties of cross-validation methods are rarely described. There is also a lack of cross-validation tools for complex problems such as prediction of indirect effects (e.g. maternal effects) or for breeding schemes with small progeny group sizes. RESULTS: We derive the expected value of several quadratic forms by comparing genetic evaluations including “partial” and “whole” data. We propose statistics that compare genetic evaluations including “partial” and “whole” data based on differences in means, covariance, and correlation, and term the use of these statistics “method LR” (from linear regression). Contrary to common belief, the regression of true on estimated breeding values is (on expectation) lower than 1 for small or related validation sets, due to family structures. For validation sets that are sufficiently large, we show that these statistics yield estimators of bias, slope or dispersion, and population accuracy for estimated breeding values. Similar results hold for prediction of future phenotypes although we show that estimates of bias, slope or dispersion using prediction of future phenotypes are sensitive to incorrect heritabilities or precorrection for fixed effects. We present an example for a set of 2111 Brahman beef cattle for which, in repeated partitioning of the data into training and validation sets, there is very good agreement of statistics of method LR with prediction of future phenotypes. CONCLUSIONS: Analytical properties of cross-validation measures are presented. We present a new method named LR for cross-validation that is automatic, easy to use, and which yields the quantities of interest. The method compares predictions based on partial and whole data, which results in estimates of accuracy and biases. Prediction of observed records may yield biased results due to precorrection or use of incorrect heritabilities. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0426-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6219059
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62190592018-11-08 Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method Legarra, Andres Reverter, Antonio Genet Sel Evol Research Article BACKGROUND: Cross-validation tools are used increasingly to validate and compare genetic evaluation methods but analytical properties of cross-validation methods are rarely described. There is also a lack of cross-validation tools for complex problems such as prediction of indirect effects (e.g. maternal effects) or for breeding schemes with small progeny group sizes. RESULTS: We derive the expected value of several quadratic forms by comparing genetic evaluations including “partial” and “whole” data. We propose statistics that compare genetic evaluations including “partial” and “whole” data based on differences in means, covariance, and correlation, and term the use of these statistics “method LR” (from linear regression). Contrary to common belief, the regression of true on estimated breeding values is (on expectation) lower than 1 for small or related validation sets, due to family structures. For validation sets that are sufficiently large, we show that these statistics yield estimators of bias, slope or dispersion, and population accuracy for estimated breeding values. Similar results hold for prediction of future phenotypes although we show that estimates of bias, slope or dispersion using prediction of future phenotypes are sensitive to incorrect heritabilities or precorrection for fixed effects. We present an example for a set of 2111 Brahman beef cattle for which, in repeated partitioning of the data into training and validation sets, there is very good agreement of statistics of method LR with prediction of future phenotypes. CONCLUSIONS: Analytical properties of cross-validation measures are presented. We present a new method named LR for cross-validation that is automatic, easy to use, and which yields the quantities of interest. The method compares predictions based on partial and whole data, which results in estimates of accuracy and biases. Prediction of observed records may yield biased results due to precorrection or use of incorrect heritabilities. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-018-0426-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-06 /pmc/articles/PMC6219059/ /pubmed/30400768 http://dx.doi.org/10.1186/s12711-018-0426-6 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Legarra, Andres Reverter, Antonio Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method
title	Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method
title_full	Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method
title_fullStr	Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method
title_full_unstemmed	Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method
title_short	Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method
title_sort	semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the lr method
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219059/ https://www.ncbi.nlm.nih.gov/pubmed/30400768 http://dx.doi.org/10.1186/s12711-018-0426-6
work_keys_str_mv	AT legarraandres semiparametricestimatesofpopulationaccuracyandbiasofpredictionsofbreedingvaluesandfuturephenotypesusingthelrmethod AT reverterantonio semiparametricestimatesofpopulationaccuracyandbiasofpredictionsofbreedingvaluesandfuturephenotypesusingthelrmethod

Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method

Ejemplares similares