Cargando…
An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. Hig...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512671/ https://www.ncbi.nlm.nih.gov/pubmed/33265246 http://dx.doi.org/10.3390/e20030155 |
_version_ | 1783586211888103424 |
---|---|
author | Sledge, Isaac J. Príncipe, José C. |
author_facet | Sledge, Isaac J. Príncipe, José C. |
author_sort | Sledge, Isaac J. |
collection | PubMed |
description | In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls. |
format | Online Article Text |
id | pubmed-7512671 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75126712020-11-09 An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits Sledge, Isaac J. Príncipe, José C. Entropy (Basel) Article In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls. MDPI 2018-02-28 /pmc/articles/PMC7512671/ /pubmed/33265246 http://dx.doi.org/10.3390/e20030155 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Sledge, Isaac J. Príncipe, José C. An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits |
title | An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits |
title_full | An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits |
title_fullStr | An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits |
title_full_unstemmed | An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits |
title_short | An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits |
title_sort | analysis of the value of information when exploring stochastic, discrete multi-armed bandits |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512671/ https://www.ncbi.nlm.nih.gov/pubmed/33265246 http://dx.doi.org/10.3390/e20030155 |
work_keys_str_mv | AT sledgeisaacj ananalysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits AT principejosec ananalysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits AT sledgeisaacj analysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits AT principejosec analysisofthevalueofinformationwhenexploringstochasticdiscretemultiarmedbandits |