Cargando…
Predictive representations can link model-based reinforcement learning to model-free mechanisms
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5628940/ https://www.ncbi.nlm.nih.gov/pubmed/28945743 http://dx.doi.org/10.1371/journal.pcbi.1005768 |
_version_ | 1783268969775366144 |
---|---|
author | Russek, Evan M. Momennejad, Ida Botvinick, Matthew M. Gershman, Samuel J. Daw, Nathaniel D. |
author_facet | Russek, Evan M. Momennejad, Ida Botvinick, Matthew M. Gershman, Samuel J. Daw, Nathaniel D. |
author_sort | Russek, Evan M. |
collection | PubMed |
description | Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation. |
format | Online Article Text |
id | pubmed-5628940 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-56289402017-10-20 Predictive representations can link model-based reinforcement learning to model-free mechanisms Russek, Evan M. Momennejad, Ida Botvinick, Matthew M. Gershman, Samuel J. Daw, Nathaniel D. PLoS Comput Biol Research Article Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation. Public Library of Science 2017-09-25 /pmc/articles/PMC5628940/ /pubmed/28945743 http://dx.doi.org/10.1371/journal.pcbi.1005768 Text en © 2017 Russek et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Russek, Evan M. Momennejad, Ida Botvinick, Matthew M. Gershman, Samuel J. Daw, Nathaniel D. Predictive representations can link model-based reinforcement learning to model-free mechanisms |
title | Predictive representations can link model-based reinforcement learning to model-free mechanisms |
title_full | Predictive representations can link model-based reinforcement learning to model-free mechanisms |
title_fullStr | Predictive representations can link model-based reinforcement learning to model-free mechanisms |
title_full_unstemmed | Predictive representations can link model-based reinforcement learning to model-free mechanisms |
title_short | Predictive representations can link model-based reinforcement learning to model-free mechanisms |
title_sort | predictive representations can link model-based reinforcement learning to model-free mechanisms |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5628940/ https://www.ncbi.nlm.nih.gov/pubmed/28945743 http://dx.doi.org/10.1371/journal.pcbi.1005768 |
work_keys_str_mv | AT russekevanm predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms AT momennejadida predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms AT botvinickmatthewm predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms AT gershmansamuelj predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms AT dawnathanield predictiverepresentationscanlinkmodelbasedreinforcementlearningtomodelfreemechanisms |