Cargando…

On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction

The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (r(2)) as a standardized me...

Descripción completa

Detalles Bibliográficos
Autor principal:	Waldmann, Patrik
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2019
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781837/ https://www.ncbi.nlm.nih.gov/pubmed/31632436 http://dx.doi.org/10.3389/fgene.2019.00899

_version_	1783457452742672384
author	Waldmann, Patrik
author_facet	Waldmann, Patrik
author_sort	Waldmann, Patrik
collection	PubMed
description	The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (r(2)) as a standardized measure of the predictive accuracy of a model. Based on arguments from the bias–variance trade-off theory in statistical learning, we show that shrinkage of the regression coefficients (i.e., QTL effects) reduces the prediction mean squared error (MSE) by introducing model bias compared with the ordinary least squares method. We also show that the LASSO and the adaptive LASSO (ALASSO) can reduce the model bias and prediction MSE by adding model variance. In an application of ridge regression, the LASSO and ALASSO to a simulated example based on results for 9,723 SNPs and 3,226 individuals, the best model selected was with the LASSO when r(2) was used as a measure. However, when model selection was based on test MSE and coefficient of determination R(2) the ALASSO proved to be the best method. Hence, use of r(2) may lead to selection of the wrong model and therefore also nonoptimal ranking of phenotype predictions and genomic breeding values. Instead, we propose use of the test MSE for model selection and R(2) as a standardized measure of the accuracy.
format	Online Article Text
id	pubmed-6781837
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-67818372019-10-18 On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction Waldmann, Patrik Front Genet Genetics The large number of markers in genome-wide prediction demands the use of methods with regularization and model comparison based on some hold-out test prediction error measure. In quantitative genetics, it is common practice to calculate the Pearson correlation coefficient (r(2)) as a standardized measure of the predictive accuracy of a model. Based on arguments from the bias–variance trade-off theory in statistical learning, we show that shrinkage of the regression coefficients (i.e., QTL effects) reduces the prediction mean squared error (MSE) by introducing model bias compared with the ordinary least squares method. We also show that the LASSO and the adaptive LASSO (ALASSO) can reduce the model bias and prediction MSE by adding model variance. In an application of ridge regression, the LASSO and ALASSO to a simulated example based on results for 9,723 SNPs and 3,226 individuals, the best model selected was with the LASSO when r(2) was used as a measure. However, when model selection was based on test MSE and coefficient of determination R(2) the ALASSO proved to be the best method. Hence, use of r(2) may lead to selection of the wrong model and therefore also nonoptimal ranking of phenotype predictions and genomic breeding values. Instead, we propose use of the test MSE for model selection and R(2) as a standardized measure of the accuracy. Frontiers Media S.A. 2019-09-26 /pmc/articles/PMC6781837/ /pubmed/31632436 http://dx.doi.org/10.3389/fgene.2019.00899 Text en Copyright © 2019 Waldmann http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Waldmann, Patrik On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction
title	On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction
title_full	On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction
title_fullStr	On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction
title_full_unstemmed	On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction
title_short	On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction
title_sort	on the use of the pearson correlation coefficient for model evaluation in genome-wide prediction
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781837/ https://www.ncbi.nlm.nih.gov/pubmed/31632436 http://dx.doi.org/10.3389/fgene.2019.00899
work_keys_str_mv	AT waldmannpatrik ontheuseofthepearsoncorrelationcoefficientformodelevaluationingenomewideprediction

On the Use of the Pearson Correlation Coefficient for Model Evaluation in Genome-Wide Prediction

Ejemplares similares