Cargando…

Adversarial Thresholding Semi-Bandits

The classical multi-armed bandit is one of the most common examples of sequential decision-making, either by trading-off between exploiting and exploring arms to maximise some payoff or purely exploring arms until the optimal arm is identified. In particular, a bandit player wanting to only pull arm...

Descripción completa

Detalles Bibliográficos
Autor principal: Bower, Craig Steven
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:http://cds.cern.ch/record/2790271
_version_ 1780972237771243520
author Bower, Craig Steven
author_facet Bower, Craig Steven
author_sort Bower, Craig Steven
collection CERN
description The classical multi-armed bandit is one of the most common examples of sequential decision-making, either by trading-off between exploiting and exploring arms to maximise some payoff or purely exploring arms until the optimal arm is identified. In particular, a bandit player wanting to only pull arms with stochastic feedback exceeding a given threshold, has been studied extensively in a pure exploration context. However, numerous applications fail to be expressed, where a player wishes to balance the need to observe regions of an uncertain environment that are currently interesting (exploit) and checking if neglected regions have become interesting since last observed (explore). We introduce the adversarial thresholding semi-bandit problem: a non-stochastic bandit model, where a player wants to only pull (potentially several) arms with feedback meeting some threshold condition. Our main objective is to design algorithms that meet the requirements of the adversarial thresholding semi-bandit problem theoretically, empirically and algorithmically, for a given application. In other words, we want to develop a machine that learns to select options according to some threshold condition and adapts quickly if the feedback from selecting an option unexpectedly changes. This work has many real-world applications and is motivated by online detector control monitoring in high energy physics experiments, on the Large Hadron Collider. We begin by describing the adversarial thresholding semi-bandit problem (ATSBP) in terms of a multi-armed bandit with multiple plays and extending the stochastic thresholding bandit problem to the adversarial setting. The adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (T- Exp3.M) and an algorithm combining label efficient prediction (LET-Exp3.M), are introduced that satisfy theoretical and computational Research specifications, but either perform poorly or fail completely under certain threshold conditions. To meet empirical performance requirements, we propose the dynamic label efficient adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (dLET-Exp3.M). Whilst computational requirements match those for T- Exp3.M, theoretical upper bounds on performance are proven to be worse. We also introduce an ATSBP algorithm (AliceBandit) that decomposes the action of pulling an arm into selection and observation decisions. Computational complexity and empirical performance under two different threshold conditions are significantly improved, com- pared with exponentially-weighted adversarial thresholding semi-bandits. Theoretical upper bounds on performance are also significantly improved, for certain environments. In the latter part of this thesis, we address the challenge of efficiently monitoring multiple condition parameters in high-energy experimental physics. Due to the extreme conditions experienced in heavy-ion particle colliders, the power supply to any device exceeding safe operating parameters is automatically shut down or tripped to preserve integrity and functionality of the device. Prior to recent upgrades, a device or channel trip would halt data-taking for the entire experiment. Post-trip recovery requires a costly procedure both in terms of expertise and data-taking time. After the completion of the current upgrading phase (scheduled for 2021), the detector will collect data continuously. In this new regime, a channel trip will result in only the affected components of the experiment being shut down. However, since the new upgraded experiment will enable data-taking to increase by a factor of 100, each trip will have a significant impact on the experiments ability to provide physicists with reliable data to analyse. We demonstrate that adversarial thresholding semi-bandits efficiently identify device channels either exceeding a fixed threshold or deviating by more than a prescribed range prior to a trip, extending the state-of-the-art in high-energy physics detector control.
id cern-2790271
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-27902712021-11-11T23:01:58Zhttp://cds.cern.ch/record/2790271engBower, Craig StevenAdversarial Thresholding Semi-BanditsComputing and ComputersDetectors and Experimental TechniquesThe classical multi-armed bandit is one of the most common examples of sequential decision-making, either by trading-off between exploiting and exploring arms to maximise some payoff or purely exploring arms until the optimal arm is identified. In particular, a bandit player wanting to only pull arms with stochastic feedback exceeding a given threshold, has been studied extensively in a pure exploration context. However, numerous applications fail to be expressed, where a player wishes to balance the need to observe regions of an uncertain environment that are currently interesting (exploit) and checking if neglected regions have become interesting since last observed (explore). We introduce the adversarial thresholding semi-bandit problem: a non-stochastic bandit model, where a player wants to only pull (potentially several) arms with feedback meeting some threshold condition. Our main objective is to design algorithms that meet the requirements of the adversarial thresholding semi-bandit problem theoretically, empirically and algorithmically, for a given application. In other words, we want to develop a machine that learns to select options according to some threshold condition and adapts quickly if the feedback from selecting an option unexpectedly changes. This work has many real-world applications and is motivated by online detector control monitoring in high energy physics experiments, on the Large Hadron Collider. We begin by describing the adversarial thresholding semi-bandit problem (ATSBP) in terms of a multi-armed bandit with multiple plays and extending the stochastic thresholding bandit problem to the adversarial setting. The adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (T- Exp3.M) and an algorithm combining label efficient prediction (LET-Exp3.M), are introduced that satisfy theoretical and computational Research specifications, but either perform poorly or fail completely under certain threshold conditions. To meet empirical performance requirements, we propose the dynamic label efficient adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (dLET-Exp3.M). Whilst computational requirements match those for T- Exp3.M, theoretical upper bounds on performance are proven to be worse. We also introduce an ATSBP algorithm (AliceBandit) that decomposes the action of pulling an arm into selection and observation decisions. Computational complexity and empirical performance under two different threshold conditions are significantly improved, com- pared with exponentially-weighted adversarial thresholding semi-bandits. Theoretical upper bounds on performance are also significantly improved, for certain environments. In the latter part of this thesis, we address the challenge of efficiently monitoring multiple condition parameters in high-energy experimental physics. Due to the extreme conditions experienced in heavy-ion particle colliders, the power supply to any device exceeding safe operating parameters is automatically shut down or tripped to preserve integrity and functionality of the device. Prior to recent upgrades, a device or channel trip would halt data-taking for the entire experiment. Post-trip recovery requires a costly procedure both in terms of expertise and data-taking time. After the completion of the current upgrading phase (scheduled for 2021), the detector will collect data continuously. In this new regime, a channel trip will result in only the affected components of the experiment being shut down. However, since the new upgraded experiment will enable data-taking to increase by a factor of 100, each trip will have a significant impact on the experiments ability to provide physicists with reliable data to analyse. We demonstrate that adversarial thresholding semi-bandits efficiently identify device channels either exceeding a fixed threshold or deviating by more than a prescribed range prior to a trip, extending the state-of-the-art in high-energy physics detector control.CERN-THESIS-2020-373oai:cds.cern.ch:27902712021-11-11T09:21:16Z
spellingShingle Computing and Computers
Detectors and Experimental Techniques
Bower, Craig Steven
Adversarial Thresholding Semi-Bandits
title Adversarial Thresholding Semi-Bandits
title_full Adversarial Thresholding Semi-Bandits
title_fullStr Adversarial Thresholding Semi-Bandits
title_full_unstemmed Adversarial Thresholding Semi-Bandits
title_short Adversarial Thresholding Semi-Bandits
title_sort adversarial thresholding semi-bandits
topic Computing and Computers
Detectors and Experimental Techniques
url http://cds.cern.ch/record/2790271
work_keys_str_mv AT bowercraigsteven adversarialthresholdingsemibandits