Cargando…

Emergence of belief-like representations through reinforcement learning

To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hennig, Jay A., Romero Pinto, Sandra A., Yamaguchi, Takahiro, Linderman, Scott W., Uchida, Naoshige, Gershman, Samuel J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10513382/ https://www.ncbi.nlm.nih.gov/pubmed/37695776 http://dx.doi.org/10.1371/journal.pcbi.1011067

_version_	1785108557424754688
author	Hennig, Jay A. Romero Pinto, Sandra A. Yamaguchi, Takahiro Linderman, Scott W. Uchida, Naoshige Gershman, Samuel J.
author_facet	Hennig, Jay A. Romero Pinto, Sandra A. Yamaguchi, Takahiro Linderman, Scott W. Uchida, Naoshige Gershman, Samuel J.
author_sort	Hennig, Jay A.
collection	PubMed
description	To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming “beliefs”—optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN’s learned representation encodes belief information, but only when the RNN’s capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
format	Online Article Text
id	pubmed-10513382
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-105133822023-09-22 Emergence of belief-like representations through reinforcement learning Hennig, Jay A. Romero Pinto, Sandra A. Yamaguchi, Takahiro Linderman, Scott W. Uchida, Naoshige Gershman, Samuel J. PLoS Comput Biol Research Article To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming “beliefs”—optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN’s learned representation encodes belief information, but only when the RNN’s capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity. Public Library of Science 2023-09-11 /pmc/articles/PMC10513382/ /pubmed/37695776 http://dx.doi.org/10.1371/journal.pcbi.1011067 Text en © 2023 Hennig et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hennig, Jay A. Romero Pinto, Sandra A. Yamaguchi, Takahiro Linderman, Scott W. Uchida, Naoshige Gershman, Samuel J. Emergence of belief-like representations through reinforcement learning
title	Emergence of belief-like representations through reinforcement learning
title_full	Emergence of belief-like representations through reinforcement learning
title_fullStr	Emergence of belief-like representations through reinforcement learning
title_full_unstemmed	Emergence of belief-like representations through reinforcement learning
title_short	Emergence of belief-like representations through reinforcement learning
title_sort	emergence of belief-like representations through reinforcement learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10513382/ https://www.ncbi.nlm.nih.gov/pubmed/37695776 http://dx.doi.org/10.1371/journal.pcbi.1011067
work_keys_str_mv	AT hennigjaya emergenceofbelieflikerepresentationsthroughreinforcementlearning AT romeropintosandraa emergenceofbelieflikerepresentationsthroughreinforcementlearning AT yamaguchitakahiro emergenceofbelieflikerepresentationsthroughreinforcementlearning AT lindermanscottw emergenceofbelieflikerepresentationsthroughreinforcementlearning AT uchidanaoshige emergenceofbelieflikerepresentationsthroughreinforcementlearning AT gershmansamuelj emergenceofbelieflikerepresentationsthroughreinforcementlearning

Emergence of belief-like representations through reinforcement learning

Ejemplares similares