Cargando…

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cavenaghi, Emanuele, Sottocornola, Gabriele, Stella, Fabio, Zanker, Markus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8004723/ https://www.ncbi.nlm.nih.gov/pubmed/33807028 http://dx.doi.org/10.3390/e23030380

_version_	1783671968361349120
author	Cavenaghi, Emanuele Sottocornola, Gabriele Stella, Fabio Zanker, Markus
author_facet	Cavenaghi, Emanuele Sottocornola, Gabriele Stella, Fabio Zanker, Markus
author_sort	Cavenaghi, Emanuele
collection	PubMed
description	The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function [Formula: see text]. In particular, we proposed a pessimistic ([Formula: see text]), an optimistic ([Formula: see text]), as well as an averaged ([Formula: see text]) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version ([Formula: see text]) results as the most effective in all real-world tasks.
format	Online Article Text
id	pubmed-8004723
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-80047232021-03-29 Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm Cavenaghi, Emanuele Sottocornola, Gabriele Stella, Fabio Zanker, Markus Entropy (Basel) Article The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function [Formula: see text]. In particular, we proposed a pessimistic ([Formula: see text]), an optimistic ([Formula: see text]), as well as an averaged ([Formula: see text]) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version ([Formula: see text]) results as the most effective in all real-world tasks. MDPI 2021-03-23 /pmc/articles/PMC8004723/ /pubmed/33807028 http://dx.doi.org/10.3390/e23030380 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article Cavenaghi, Emanuele Sottocornola, Gabriele Stella, Fabio Zanker, Markus Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm
title	Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm
title_full	Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm
title_fullStr	Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm
title_full_unstemmed	Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm
title_short	Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm
title_sort	non stationary multi-armed bandit: empirical evaluation of a new concept drift-aware algorithm
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8004723/ https://www.ncbi.nlm.nih.gov/pubmed/33807028 http://dx.doi.org/10.3390/e23030380
work_keys_str_mv	AT cavenaghiemanuele nonstationarymultiarmedbanditempiricalevaluationofanewconceptdriftawarealgorithm AT sottocornolagabriele nonstationarymultiarmedbanditempiricalevaluationofanewconceptdriftawarealgorithm AT stellafabio nonstationarymultiarmedbanditempiricalevaluationofanewconceptdriftawarealgorithm AT zankermarkus nonstationarymultiarmedbanditempiricalevaluationofanewconceptdriftawarealgorithm

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm

Ejemplares similares