Cargando…
Dopaminergic Balance between Reward Maximization and Policy Complexity
Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Frontiers Research Foundation
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093748/ https://www.ncbi.nlm.nih.gov/pubmed/21603228 http://dx.doi.org/10.3389/fnsys.2011.00022 |
_version_ | 1782203491729539072 |
---|---|
author | Parush, Naama Tishby, Naftali Bergman, Hagai |
author_facet | Parush, Naama Tishby, Naftali Bergman, Hagai |
author_sort | Parush, Naama |
collection | PubMed |
description | Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model of the basal ganglia with emphasis on the role of dopamine as both a reinforcement learning signal and as a pseudo-temperature signal controlling the general level of basal ganglia excitability and motor vigilance of the acting agent. We argue that the basal ganglia endow the thalamic-cortical networks with the optimal dynamic tradeoff between two constraints: minimizing the policy complexity (cost) and maximizing the expected future reward (gain). We show that this multi-dimensional optimization processes results in an experience-modulated version of the softmax behavioral policy. Thus, as in classical softmax behavioral policies, probability of actions are selected according to their estimated values and the pseudo-temperature, but in addition also vary according to the frequency of previous choices of these actions. We conclude that the computational goal of the basal ganglia is not to maximize cumulative (positive and negative) reward. Rather, the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy. We suggest that beyond its role in the modulation of the efficacy of the cortico-striatal synapses, dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost. The resulting experience and dopamine modulated softmax policy can then serve as a theoretical framework to account for the broad range of behaviors and clinical states governed by the basal ganglia and dopamine systems. |
format | Text |
id | pubmed-3093748 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Frontiers Research Foundation |
record_format | MEDLINE/PubMed |
spelling | pubmed-30937482011-05-20 Dopaminergic Balance between Reward Maximization and Policy Complexity Parush, Naama Tishby, Naftali Bergman, Hagai Front Syst Neurosci Neuroscience Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model of the basal ganglia with emphasis on the role of dopamine as both a reinforcement learning signal and as a pseudo-temperature signal controlling the general level of basal ganglia excitability and motor vigilance of the acting agent. We argue that the basal ganglia endow the thalamic-cortical networks with the optimal dynamic tradeoff between two constraints: minimizing the policy complexity (cost) and maximizing the expected future reward (gain). We show that this multi-dimensional optimization processes results in an experience-modulated version of the softmax behavioral policy. Thus, as in classical softmax behavioral policies, probability of actions are selected according to their estimated values and the pseudo-temperature, but in addition also vary according to the frequency of previous choices of these actions. We conclude that the computational goal of the basal ganglia is not to maximize cumulative (positive and negative) reward. Rather, the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy. We suggest that beyond its role in the modulation of the efficacy of the cortico-striatal synapses, dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost. The resulting experience and dopamine modulated softmax policy can then serve as a theoretical framework to account for the broad range of behaviors and clinical states governed by the basal ganglia and dopamine systems. Frontiers Research Foundation 2011-05-09 /pmc/articles/PMC3093748/ /pubmed/21603228 http://dx.doi.org/10.3389/fnsys.2011.00022 Text en Copyright © 2011 Parush, Tishby and Bergman. http://www.frontiersin.org/licenseagreement This is an open-access article subject to a non-exclusive license between the authors and Frontiers Media SA, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and other Frontiers conditions are complied with. |
spellingShingle | Neuroscience Parush, Naama Tishby, Naftali Bergman, Hagai Dopaminergic Balance between Reward Maximization and Policy Complexity |
title | Dopaminergic Balance between Reward Maximization and Policy Complexity |
title_full | Dopaminergic Balance between Reward Maximization and Policy Complexity |
title_fullStr | Dopaminergic Balance between Reward Maximization and Policy Complexity |
title_full_unstemmed | Dopaminergic Balance between Reward Maximization and Policy Complexity |
title_short | Dopaminergic Balance between Reward Maximization and Policy Complexity |
title_sort | dopaminergic balance between reward maximization and policy complexity |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093748/ https://www.ncbi.nlm.nih.gov/pubmed/21603228 http://dx.doi.org/10.3389/fnsys.2011.00022 |
work_keys_str_mv | AT parushnaama dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity AT tishbynaftali dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity AT bergmanhagai dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity |