Cargando…

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. Hig...

Descripción completa

Detalles Bibliográficos
Autores principales: Sledge, Isaac J., Príncipe, José C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512671/
https://www.ncbi.nlm.nih.gov/pubmed/33265246
http://dx.doi.org/10.3390/e20030155
_version_ 1783586211888103424
author Sledge, Isaac J.
Príncipe, José C.
author_facet Sledge, Isaac J.
Príncipe, José C.
author_sort Sledge, Isaac J.
collection PubMed
description In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls.
format Online
Article
Text
id pubmed-7512671
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75126712020-11-09 An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits Sledge, Isaac J. Príncipe, José C. Entropy (Basel) Article In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls. MDPI 2018-02-28 /pmc/articles/PMC7512671/ /pubmed/33265246 http://dx.doi.org/10.3390/e20030155 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sledge, Isaac J.
Príncipe, José C.
An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_full An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_fullStr An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_full_unstemmed An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_short An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
title_sort analysis of the value of information when exploring stochastic, discrete multi-armed bandits
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512671/
https://www.ncbi.nlm.nih.gov/pubmed/33265246
http://dx.doi.org/10.3390/e20030155
work_keys_str_mv AT sledgeisaacj ananalysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits
AT principejosec ananalysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits
AT sledgeisaacj analysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits
AT principejosec analysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits