Cargando…

Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment

Intuitively, the level of autonomy of an agent is related to the degree to which the agent’s goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating auton...

Descripción completa

Detalles Bibliográficos
Autores principales: Ingel, Anti, Makkeh, Abdullah, Corcoll, Oriol, Vicente, Raul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8947692/
https://www.ncbi.nlm.nih.gov/pubmed/35327912
http://dx.doi.org/10.3390/e24030401
_version_ 1784674499927474176
author Ingel, Anti
Makkeh, Abdullah
Corcoll, Oriol
Vicente, Raul
author_facet Ingel, Anti
Makkeh, Abdullah
Corcoll, Oriol
Vicente, Raul
author_sort Ingel, Anti
collection PubMed
description Intuitively, the level of autonomy of an agent is related to the degree to which the agent’s goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent’s behaviour against perturbations in the observations.
format Online
Article
Text
id pubmed-8947692
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-89476922022-03-25 Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment Ingel, Anti Makkeh, Abdullah Corcoll, Oriol Vicente, Raul Entropy (Basel) Article Intuitively, the level of autonomy of an agent is related to the degree to which the agent’s goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent’s behaviour against perturbations in the observations. MDPI 2022-03-13 /pmc/articles/PMC8947692/ /pubmed/35327912 http://dx.doi.org/10.3390/e24030401 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ingel, Anti
Makkeh, Abdullah
Corcoll, Oriol
Vicente, Raul
Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment
title Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment
title_full Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment
title_fullStr Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment
title_full_unstemmed Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment
title_short Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment
title_sort quantifying reinforcement-learning agent’s autonomy, reliance on memory and internalisation of the environment
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8947692/
https://www.ncbi.nlm.nih.gov/pubmed/35327912
http://dx.doi.org/10.3390/e24030401
work_keys_str_mv AT ingelanti quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment
AT makkehabdullah quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment
AT corcolloriol quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment
AT vicenteraul quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment