Cargando…

Dopaminergic Balance between Reward Maximization and Policy Complexity

Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model...

Descripción completa

Detalles Bibliográficos
Autores principales: Parush, Naama, Tishby, Naftali, Bergman, Hagai
Formato: Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093748/
https://www.ncbi.nlm.nih.gov/pubmed/21603228
http://dx.doi.org/10.3389/fnsys.2011.00022
_version_ 1782203491729539072
author Parush, Naama
Tishby, Naftali
Bergman, Hagai
author_facet Parush, Naama
Tishby, Naftali
Bergman, Hagai
author_sort Parush, Naama
collection PubMed
description Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model of the basal ganglia with emphasis on the role of dopamine as both a reinforcement learning signal and as a pseudo-temperature signal controlling the general level of basal ganglia excitability and motor vigilance of the acting agent. We argue that the basal ganglia endow the thalamic-cortical networks with the optimal dynamic tradeoff between two constraints: minimizing the policy complexity (cost) and maximizing the expected future reward (gain). We show that this multi-dimensional optimization processes results in an experience-modulated version of the softmax behavioral policy. Thus, as in classical softmax behavioral policies, probability of actions are selected according to their estimated values and the pseudo-temperature, but in addition also vary according to the frequency of previous choices of these actions. We conclude that the computational goal of the basal ganglia is not to maximize cumulative (positive and negative) reward. Rather, the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy. We suggest that beyond its role in the modulation of the efficacy of the cortico-striatal synapses, dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost. The resulting experience and dopamine modulated softmax policy can then serve as a theoretical framework to account for the broad range of behaviors and clinical states governed by the basal ganglia and dopamine systems.
format Text
id pubmed-3093748
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-30937482011-05-20 Dopaminergic Balance between Reward Maximization and Policy Complexity Parush, Naama Tishby, Naftali Bergman, Hagai Front Syst Neurosci Neuroscience Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model of the basal ganglia with emphasis on the role of dopamine as both a reinforcement learning signal and as a pseudo-temperature signal controlling the general level of basal ganglia excitability and motor vigilance of the acting agent. We argue that the basal ganglia endow the thalamic-cortical networks with the optimal dynamic tradeoff between two constraints: minimizing the policy complexity (cost) and maximizing the expected future reward (gain). We show that this multi-dimensional optimization processes results in an experience-modulated version of the softmax behavioral policy. Thus, as in classical softmax behavioral policies, probability of actions are selected according to their estimated values and the pseudo-temperature, but in addition also vary according to the frequency of previous choices of these actions. We conclude that the computational goal of the basal ganglia is not to maximize cumulative (positive and negative) reward. Rather, the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy. We suggest that beyond its role in the modulation of the efficacy of the cortico-striatal synapses, dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost. The resulting experience and dopamine modulated softmax policy can then serve as a theoretical framework to account for the broad range of behaviors and clinical states governed by the basal ganglia and dopamine systems. Frontiers Research Foundation 2011-05-09 /pmc/articles/PMC3093748/ /pubmed/21603228 http://dx.doi.org/10.3389/fnsys.2011.00022 Text en Copyright © 2011 Parush, Tishby and Bergman. http://www.frontiersin.org/licenseagreement This is an open-access article subject to a non-exclusive license between the authors and Frontiers Media SA, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and other Frontiers conditions are complied with.
spellingShingle Neuroscience
Parush, Naama
Tishby, Naftali
Bergman, Hagai
Dopaminergic Balance between Reward Maximization and Policy Complexity
title Dopaminergic Balance between Reward Maximization and Policy Complexity
title_full Dopaminergic Balance between Reward Maximization and Policy Complexity
title_fullStr Dopaminergic Balance between Reward Maximization and Policy Complexity
title_full_unstemmed Dopaminergic Balance between Reward Maximization and Policy Complexity
title_short Dopaminergic Balance between Reward Maximization and Policy Complexity
title_sort dopaminergic balance between reward maximization and policy complexity
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093748/
https://www.ncbi.nlm.nih.gov/pubmed/21603228
http://dx.doi.org/10.3389/fnsys.2011.00022
work_keys_str_mv AT parushnaama dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity
AT tishbynaftali dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity
AT bergmanhagai dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity