Cargando…

Using beta binomials to estimate classification uncertainty for ensemble models

BACKGROUND: Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regul...

Descripción completa

Detalles Bibliográficos
Autores principales:	Clark, Robert D, Liang, Wenkel, Lee, Adam C, Lawless, Michael S, Fraczkiewicz, Robert, Waldman, Marvin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4076254/ https://www.ncbi.nlm.nih.gov/pubmed/24987464 http://dx.doi.org/10.1186/1758-2946-6-34

_version_	1782323463144341504
author	Clark, Robert D Liang, Wenkel Lee, Adam C Lawless, Michael S Fraczkiewicz, Robert Waldman, Marvin
author_facet	Clark, Robert D Liang, Wenkel Lee, Adam C Lawless, Michael S Fraczkiewicz, Robert Waldman, Marvin
author_sort	Clark, Robert D
collection	PubMed
description	BACKGROUND: Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. RESULTS: Submodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification – one using vote tallies and the other averaging individual network outputs – we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool. CONCLUSIONS: Confidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.
format	Online Article Text
id	pubmed-4076254
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40762542014-07-01 Using beta binomials to estimate classification uncertainty for ensemble models Clark, Robert D Liang, Wenkel Lee, Adam C Lawless, Michael S Fraczkiewicz, Robert Waldman, Marvin J Cheminform Research Article BACKGROUND: Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. RESULTS: Submodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification – one using vote tallies and the other averaging individual network outputs – we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool. CONCLUSIONS: Confidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set. BioMed Central 2014-06-22 /pmc/articles/PMC4076254/ /pubmed/24987464 http://dx.doi.org/10.1186/1758-2946-6-34 Text en Copyright © 2014 Clark et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Clark, Robert D Liang, Wenkel Lee, Adam C Lawless, Michael S Fraczkiewicz, Robert Waldman, Marvin Using beta binomials to estimate classification uncertainty for ensemble models
title	Using beta binomials to estimate classification uncertainty for ensemble models
title_full	Using beta binomials to estimate classification uncertainty for ensemble models
title_fullStr	Using beta binomials to estimate classification uncertainty for ensemble models
title_full_unstemmed	Using beta binomials to estimate classification uncertainty for ensemble models
title_short	Using beta binomials to estimate classification uncertainty for ensemble models
title_sort	using beta binomials to estimate classification uncertainty for ensemble models
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4076254/ https://www.ncbi.nlm.nih.gov/pubmed/24987464 http://dx.doi.org/10.1186/1758-2946-6-34
work_keys_str_mv	AT clarkrobertd usingbetabinomialstoestimateclassificationuncertaintyforensemblemodels AT liangwenkel usingbetabinomialstoestimateclassificationuncertaintyforensemblemodels AT leeadamc usingbetabinomialstoestimateclassificationuncertaintyforensemblemodels AT lawlessmichaels usingbetabinomialstoestimateclassificationuncertaintyforensemblemodels AT fraczkiewiczrobert usingbetabinomialstoestimateclassificationuncertaintyforensemblemodels AT waldmanmarvin usingbetabinomialstoestimateclassificationuncertaintyforensemblemodels

Using beta binomials to estimate classification uncertainty for ensemble models

Ejemplares similares