Cargando…
Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment
Intuitively, the level of autonomy of an agent is related to the degree to which the agent’s goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating auton...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8947692/ https://www.ncbi.nlm.nih.gov/pubmed/35327912 http://dx.doi.org/10.3390/e24030401 |
_version_ | 1784674499927474176 |
---|---|
author | Ingel, Anti Makkeh, Abdullah Corcoll, Oriol Vicente, Raul |
author_facet | Ingel, Anti Makkeh, Abdullah Corcoll, Oriol Vicente, Raul |
author_sort | Ingel, Anti |
collection | PubMed |
description | Intuitively, the level of autonomy of an agent is related to the degree to which the agent’s goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent’s behaviour against perturbations in the observations. |
format | Online Article Text |
id | pubmed-8947692 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-89476922022-03-25 Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment Ingel, Anti Makkeh, Abdullah Corcoll, Oriol Vicente, Raul Entropy (Basel) Article Intuitively, the level of autonomy of an agent is related to the degree to which the agent’s goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent’s behaviour against perturbations in the observations. MDPI 2022-03-13 /pmc/articles/PMC8947692/ /pubmed/35327912 http://dx.doi.org/10.3390/e24030401 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ingel, Anti Makkeh, Abdullah Corcoll, Oriol Vicente, Raul Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment |
title | Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment |
title_full | Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment |
title_fullStr | Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment |
title_full_unstemmed | Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment |
title_short | Quantifying Reinforcement-Learning Agent’s Autonomy, Reliance on Memory and Internalisation of the Environment |
title_sort | quantifying reinforcement-learning agent’s autonomy, reliance on memory and internalisation of the environment |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8947692/ https://www.ncbi.nlm.nih.gov/pubmed/35327912 http://dx.doi.org/10.3390/e24030401 |
work_keys_str_mv | AT ingelanti quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment AT makkehabdullah quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment AT corcolloriol quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment AT vicenteraul quantifyingreinforcementlearningagentsautonomyrelianceonmemoryandinternalisationoftheenvironment |