Cargando…

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. Hig...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sledge, Isaac J., Príncipe, José C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2018
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512671/ https://www.ncbi.nlm.nih.gov/pubmed/33265246 http://dx.doi.org/10.3390/e20030155

_version_	1783586211888103424
author	Sledge, Isaac J. Príncipe, José C.
author_facet	Sledge, Isaac J. Príncipe, José C.
author_sort	Sledge, Isaac J.
collection	PubMed
description	In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls.
format	Online Article Text
id	pubmed-7512671
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75126712020-11-09 An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits Sledge, Isaac J. Príncipe, José C. Entropy (Basel) Article In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls. MDPI 2018-02-28 /pmc/articles/PMC7512671/ /pubmed/33265246 http://dx.doi.org/10.3390/e20030155 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Sledge, Isaac J. Príncipe, José C. An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title	An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_full	An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_fullStr	An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_full_unstemmed	An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_short	An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_sort	analysis of the value of information when exploring stochastic, discrete multi-armed bandits
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512671/ https://www.ncbi.nlm.nih.gov/pubmed/33265246 http://dx.doi.org/10.3390/e20030155
work_keys_str_mv	AT sledgeisaacj ananalysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits AT principejosec ananalysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits AT sledgeisaacj analysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits AT principejosec analysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits

Ejemplares similares