Cargando…

Predictive representations can link model-based reinforcement learning to model-free mechanisms

Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown...

Descripción completa

Detalles Bibliográficos
Autores principales: Russek, Evan M., Momennejad, Ida, Botvinick, Matthew M., Gershman, Samuel J., Daw, Nathaniel D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5628940/
https://www.ncbi.nlm.nih.gov/pubmed/28945743
http://dx.doi.org/10.1371/journal.pcbi.1005768
_version_ 1783268969775366144
author Russek, Evan M.
Momennejad, Ida
Botvinick, Matthew M.
Gershman, Samuel J.
Daw, Nathaniel D.
author_facet Russek, Evan M.
Momennejad, Ida
Botvinick, Matthew M.
Gershman, Samuel J.
Daw, Nathaniel D.
author_sort Russek, Evan M.
collection PubMed
description Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
format Online
Article
Text
id pubmed-5628940
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56289402017-10-20 Predictive representations can link model-based reinforcement learning to model-free mechanisms Russek, Evan M. Momennejad, Ida Botvinick, Matthew M. Gershman, Samuel J. Daw, Nathaniel D. PLoS Comput Biol Research Article Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation. Public Library of Science 2017-09-25 /pmc/articles/PMC5628940/ /pubmed/28945743 http://dx.doi.org/10.1371/journal.pcbi.1005768 Text en © 2017 Russek et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Russek, Evan M.
Momennejad, Ida
Botvinick, Matthew M.
Gershman, Samuel J.
Daw, Nathaniel D.
Predictive representations can link model-based reinforcement learning to model-free mechanisms
title Predictive representations can link model-based reinforcement learning to model-free mechanisms
title_full Predictive representations can link model-based reinforcement learning to model-free mechanisms
title_fullStr Predictive representations can link model-based reinforcement learning to model-free mechanisms
title_full_unstemmed Predictive representations can link model-based reinforcement learning to model-free mechanisms
title_short Predictive representations can link model-based reinforcement learning to model-free mechanisms
title_sort predictive representations can link model-based reinforcement learning to model-free mechanisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5628940/
https://www.ncbi.nlm.nih.gov/pubmed/28945743
http://dx.doi.org/10.1371/journal.pcbi.1005768
work_keys_str_mv AT russekevanm predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms
AT momennejadida predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms
AT botvinickmatthewm predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms
AT gershmansamuelj predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms
AT dawnathanield predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms