Cargando…

Reactive Reinforcement Learning in Asynchronous Environments

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact tha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Travnik, Jaden B., Mathewson, Kory W., Sutton, Richard S., Pilarski, Patrick M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2018
Materias:	Robotics and AI
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805616/ https://www.ncbi.nlm.nih.gov/pubmed/33500958 http://dx.doi.org/10.3389/frobt.2018.00079

_version_	1783636339959267328
author	Travnik, Jaden B. Mathewson, Kory W. Sutton, Richard S. Pilarski, Patrick M.
author_facet	Travnik, Jaden B. Mathewson, Kory W. Sutton, Richard S. Pilarski, Patrick M.
author_sort	Travnik, Jaden B.
collection	PubMed
description	The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation performed by the agent. In an asynchronous environment, minimizing reaction time—the time it takes for an agent to react to an observation—also minimizes the time in which the state of the environment may change following observation. In many environments, the reaction time of an agent directly impacts task performance by permitting the environment to transition into either an undesirable terminal state or a state where performing the chosen action is inappropriate. We propose a class of reactive reinforcement learning algorithms that address this problem of asynchronous environments by immediately acting after observing new state information. We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL algorithm reduces the reaction time of the agent by approximately the duration of the algorithm's learning update. This new class of reactive algorithms may facilitate safer control and faster decision making without any change to standard learning guarantees.
format	Online Article Text
id	pubmed-7805616
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-78056162021-01-25 Reactive Reinforcement Learning in Asynchronous Environments Travnik, Jaden B. Mathewson, Kory W. Sutton, Richard S. Pilarski, Patrick M. Front Robot AI Robotics and AI The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation performed by the agent. In an asynchronous environment, minimizing reaction time—the time it takes for an agent to react to an observation—also minimizes the time in which the state of the environment may change following observation. In many environments, the reaction time of an agent directly impacts task performance by permitting the environment to transition into either an undesirable terminal state or a state where performing the chosen action is inappropriate. We propose a class of reactive reinforcement learning algorithms that address this problem of asynchronous environments by immediately acting after observing new state information. We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL algorithm reduces the reaction time of the agent by approximately the duration of the algorithm's learning update. This new class of reactive algorithms may facilitate safer control and faster decision making without any change to standard learning guarantees. Frontiers Media S.A. 2018-06-26 /pmc/articles/PMC7805616/ /pubmed/33500958 http://dx.doi.org/10.3389/frobt.2018.00079 Text en Copyright © 2018 Travnik, Mathewson, Sutton and Pilarski. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Robotics and AI Travnik, Jaden B. Mathewson, Kory W. Sutton, Richard S. Pilarski, Patrick M. Reactive Reinforcement Learning in Asynchronous Environments
title	Reactive Reinforcement Learning in Asynchronous Environments
title_full	Reactive Reinforcement Learning in Asynchronous Environments
title_fullStr	Reactive Reinforcement Learning in Asynchronous Environments
title_full_unstemmed	Reactive Reinforcement Learning in Asynchronous Environments
title_short	Reactive Reinforcement Learning in Asynchronous Environments
title_sort	reactive reinforcement learning in asynchronous environments
topic	Robotics and AI
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805616/ https://www.ncbi.nlm.nih.gov/pubmed/33500958 http://dx.doi.org/10.3389/frobt.2018.00079
work_keys_str_mv	AT travnikjadenb reactivereinforcementlearninginasynchronousenvironments AT mathewsonkoryw reactivereinforcementlearninginasynchronousenvironments AT suttonrichards reactivereinforcementlearninginasynchronousenvironments AT pilarskipatrickm reactivereinforcementlearninginasynchronousenvironments

Reactive Reinforcement Learning in Asynchronous Environments

Ejemplares similares