Cargando…
‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function
BACKGROUND: Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5086043/ https://www.ncbi.nlm.nih.gov/pubmed/27793098 http://dx.doi.org/10.1186/s12868-016-0302-7 |
_version_ | 1782463671356620800 |
---|---|
author | Zsuga, Judit Biro, Klara Tajti, Gabor Szilasi, Magdolna Emma Papp, Csaba Juhasz, Bela Gesztelyi, Rudolf |
author_facet | Zsuga, Judit Biro, Klara Tajti, Gabor Szilasi, Magdolna Emma Papp, Csaba Juhasz, Bela Gesztelyi, Rudolf |
author_sort | Zsuga, Judit |
collection | PubMed |
description | BACKGROUND: Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent’s knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent’s control either using, or not using a model. RESULTS: In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. CONCLUSIONS: Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed. |
format | Online Article Text |
id | pubmed-5086043 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50860432016-10-31 ‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function Zsuga, Judit Biro, Klara Tajti, Gabor Szilasi, Magdolna Emma Papp, Csaba Juhasz, Bela Gesztelyi, Rudolf BMC Neurosci Research Article BACKGROUND: Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent’s knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent’s control either using, or not using a model. RESULTS: In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively. CONCLUSIONS: Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed. BioMed Central 2016-10-28 /pmc/articles/PMC5086043/ /pubmed/27793098 http://dx.doi.org/10.1186/s12868-016-0302-7 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Zsuga, Judit Biro, Klara Tajti, Gabor Szilasi, Magdolna Emma Papp, Csaba Juhasz, Bela Gesztelyi, Rudolf ‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function |
title | ‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function |
title_full | ‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function |
title_fullStr | ‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function |
title_full_unstemmed | ‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function |
title_short | ‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function |
title_sort | ‘proactive’ use of cue-context congruence for building reinforcement learning’s reward function |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5086043/ https://www.ncbi.nlm.nih.gov/pubmed/27793098 http://dx.doi.org/10.1186/s12868-016-0302-7 |
work_keys_str_mv | AT zsugajudit proactiveuseofcuecontextcongruenceforbuildingreinforcementlearningsrewardfunction AT biroklara proactiveuseofcuecontextcongruenceforbuildingreinforcementlearningsrewardfunction AT tajtigabor proactiveuseofcuecontextcongruenceforbuildingreinforcementlearningsrewardfunction AT szilasimagdolnaemma proactiveuseofcuecontextcongruenceforbuildingreinforcementlearningsrewardfunction AT pappcsaba proactiveuseofcuecontextcongruenceforbuildingreinforcementlearningsrewardfunction AT juhaszbela proactiveuseofcuecontextcongruenceforbuildingreinforcementlearningsrewardfunction AT gesztelyirudolf proactiveuseofcuecontextcongruenceforbuildingreinforcementlearningsrewardfunction |