Cargando…

Temporal-Difference Reinforcement Learning with Distributed Representations

Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the beli...

Descripción completa

Detalles Bibliográficos
Autores principales: Kurth-Nelson, Zeb, Redish, A. David
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760757/
https://www.ncbi.nlm.nih.gov/pubmed/19841749
http://dx.doi.org/10.1371/journal.pone.0007362
_version_ 1782172775208714240
author Kurth-Nelson, Zeb
Redish, A. David
author_facet Kurth-Nelson, Zeb
Redish, A. David
author_sort Kurth-Nelson, Zeb
collection PubMed
description Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.
format Text
id pubmed-2760757
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27607572009-10-20 Temporal-Difference Reinforcement Learning with Distributed Representations Kurth-Nelson, Zeb Redish, A. David PLoS One Research Article Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments. Public Library of Science 2009-10-20 /pmc/articles/PMC2760757/ /pubmed/19841749 http://dx.doi.org/10.1371/journal.pone.0007362 Text en Kurth-Nelson, Redish. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Kurth-Nelson, Zeb
Redish, A. David
Temporal-Difference Reinforcement Learning with Distributed Representations
title Temporal-Difference Reinforcement Learning with Distributed Representations
title_full Temporal-Difference Reinforcement Learning with Distributed Representations
title_fullStr Temporal-Difference Reinforcement Learning with Distributed Representations
title_full_unstemmed Temporal-Difference Reinforcement Learning with Distributed Representations
title_short Temporal-Difference Reinforcement Learning with Distributed Representations
title_sort temporal-difference reinforcement learning with distributed representations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760757/
https://www.ncbi.nlm.nih.gov/pubmed/19841749
http://dx.doi.org/10.1371/journal.pone.0007362
work_keys_str_mv AT kurthnelsonzeb temporaldifferencereinforcementlearningwithdistributedrepresentations
AT redishadavid temporaldifferencereinforcementlearningwithdistributedrepresentations