Cargando…

DPRESS: Localizing estimates of predictive uncertainty

BACKGROUND: The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes tha...

Descripción completa

Detalles Bibliográficos
Autor principal:	Clark, Robert D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3225832/ https://www.ncbi.nlm.nih.gov/pubmed/20298517 http://dx.doi.org/10.1186/1758-2946-1-11

_version_	1782217533346021376
author	Clark, Robert D
author_facet	Clark, Robert D
author_sort	Clark, Robert D
collection	PubMed
description	BACKGROUND: The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object u: the standard error of prediction s(u )can be estimated as the non-cross-validated error s(t* )for the closest object t* in the training set adjusted for its separation d from u in the descriptor space relative to the size of the training set. [Image: see text] The predictive uncertainty factor γ(t* )is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: Distributed PRedictive Error Sum of Squares (DPRESS). Note that s(t* )and γ(t*)are characteristic of each training set compound contributing to the model of interest. RESULTS: The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (N = 75) drawn from a large (N = 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so. CONCLUSION: DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, a posteriori approach to defining applicability domains in terms of localized uncertainty.
format	Online Article Text
id	pubmed-3225832
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Springer
record_format	MEDLINE/PubMed
spelling	pubmed-32258322011-11-30 DPRESS: Localizing estimates of predictive uncertainty Clark, Robert D J Cheminform Research Article BACKGROUND: The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object u: the standard error of prediction s(u )can be estimated as the non-cross-validated error s(t* )for the closest object t* in the training set adjusted for its separation d from u in the descriptor space relative to the size of the training set. [Image: see text] The predictive uncertainty factor γ(t* )is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: Distributed PRedictive Error Sum of Squares (DPRESS). Note that s(t* )and γ(t*)are characteristic of each training set compound contributing to the model of interest. RESULTS: The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (N = 75) drawn from a large (N = 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so. CONCLUSION: DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, a posteriori approach to defining applicability domains in terms of localized uncertainty. Springer 2009-07-14 /pmc/articles/PMC3225832/ /pubmed/20298517 http://dx.doi.org/10.1186/1758-2946-1-11 Text en Copyright © 2009 Clark; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Clark, Robert D DPRESS: Localizing estimates of predictive uncertainty
title	DPRESS: Localizing estimates of predictive uncertainty
title_full	DPRESS: Localizing estimates of predictive uncertainty
title_fullStr	DPRESS: Localizing estimates of predictive uncertainty
title_full_unstemmed	DPRESS: Localizing estimates of predictive uncertainty
title_short	DPRESS: Localizing estimates of predictive uncertainty
title_sort	dpress: localizing estimates of predictive uncertainty
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3225832/ https://www.ncbi.nlm.nih.gov/pubmed/20298517 http://dx.doi.org/10.1186/1758-2946-1-11
work_keys_str_mv	AT clarkrobertd dpresslocalizingestimatesofpredictiveuncertainty

DPRESS: Localizing estimates of predictive uncertainty

Ejemplares similares