Cargando…

On the Upper Bounds of the Real-Valued Predictions

Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Benevenuta, Silvia, Fariselli, Piero
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2019
Materias:	Commentary
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6710671/ https://www.ncbi.nlm.nih.gov/pubmed/31488948 http://dx.doi.org/10.1177/1177932219871263

_version_	1783446382881800192
author	Benevenuta, Silvia Fariselli, Piero
author_facet	Benevenuta, Silvia Fariselli, Piero
author_sort	Benevenuta, Silvia
collection	PubMed
description	Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of determination (or [Formula: see text]). The common conception related to the last 2 indices is that the theoretical upper bound is 1; however, their upper bounds depend both on the experimental uncertainty and the distribution of target variables. A narrow distribution of the target variable may induce a low upper bound. The knowledge of the theoretical upper bounds also has 2 practical applications: (1) comparing different predictors tested on different data sets may lead to wrong ranking and (2) performances higher than the theoretical upper bounds indicate overtraining and improper usage of the learning data sets. Here, we derive the upper bound for the coefficient of determination showing that it is lower than that of the square of the Pearson correlation. We provide analytical equations for both indices that can be used to evaluate the upper bound of the predictions when the experimental uncertainty and the target distribution are available. Our considerations are general and applicable to all regression predictors.
format	Online Article Text
id	pubmed-6710671
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-67106712019-09-05 On the Upper Bounds of the Real-Valued Predictions Benevenuta, Silvia Fariselli, Piero Bioinform Biol Insights Commentary Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of determination (or [Formula: see text]). The common conception related to the last 2 indices is that the theoretical upper bound is 1; however, their upper bounds depend both on the experimental uncertainty and the distribution of target variables. A narrow distribution of the target variable may induce a low upper bound. The knowledge of the theoretical upper bounds also has 2 practical applications: (1) comparing different predictors tested on different data sets may lead to wrong ranking and (2) performances higher than the theoretical upper bounds indicate overtraining and improper usage of the learning data sets. Here, we derive the upper bound for the coefficient of determination showing that it is lower than that of the square of the Pearson correlation. We provide analytical equations for both indices that can be used to evaluate the upper bound of the predictions when the experimental uncertainty and the target distribution are available. Our considerations are general and applicable to all regression predictors. SAGE Publications 2019-08-23 /pmc/articles/PMC6710671/ /pubmed/31488948 http://dx.doi.org/10.1177/1177932219871263 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Commentary Benevenuta, Silvia Fariselli, Piero On the Upper Bounds of the Real-Valued Predictions
title	On the Upper Bounds of the Real-Valued Predictions
title_full	On the Upper Bounds of the Real-Valued Predictions
title_fullStr	On the Upper Bounds of the Real-Valued Predictions
title_full_unstemmed	On the Upper Bounds of the Real-Valued Predictions
title_short	On the Upper Bounds of the Real-Valued Predictions
title_sort	on the upper bounds of the real-valued predictions
topic	Commentary
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6710671/ https://www.ncbi.nlm.nih.gov/pubmed/31488948 http://dx.doi.org/10.1177/1177932219871263
work_keys_str_mv	AT benevenutasilvia ontheupperboundsoftherealvaluedpredictions AT farisellipiero ontheupperboundsoftherealvaluedpredictions

On the Upper Bounds of the Real-Valued Predictions

Ejemplares similares