Cargando…

Dopaminergic Balance between Reward Maximization and Policy Complexity

Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model...

Descripción completa

Detalles Bibliográficos
Autores principales:	Parush, Naama, Tishby, Naftali, Bergman, Hagai
Formato:	Texto
Lenguaje:	English
Publicado:	Frontiers Research Foundation 2011
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093748/ https://www.ncbi.nlm.nih.gov/pubmed/21603228 http://dx.doi.org/10.3389/fnsys.2011.00022

_version_	1782203491729539072
author	Parush, Naama Tishby, Naftali Bergman, Hagai
author_facet	Parush, Naama Tishby, Naftali Bergman, Hagai
author_sort	Parush, Naama
collection	PubMed
description	Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model of the basal ganglia with emphasis on the role of dopamine as both a reinforcement learning signal and as a pseudo-temperature signal controlling the general level of basal ganglia excitability and motor vigilance of the acting agent. We argue that the basal ganglia endow the thalamic-cortical networks with the optimal dynamic tradeoff between two constraints: minimizing the policy complexity (cost) and maximizing the expected future reward (gain). We show that this multi-dimensional optimization processes results in an experience-modulated version of the softmax behavioral policy. Thus, as in classical softmax behavioral policies, probability of actions are selected according to their estimated values and the pseudo-temperature, but in addition also vary according to the frequency of previous choices of these actions. We conclude that the computational goal of the basal ganglia is not to maximize cumulative (positive and negative) reward. Rather, the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy. We suggest that beyond its role in the modulation of the efficacy of the cortico-striatal synapses, dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost. The resulting experience and dopamine modulated softmax policy can then serve as a theoretical framework to account for the broad range of behaviors and clinical states governed by the basal ganglia and dopamine systems.
format	Text
id	pubmed-3093748
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Frontiers Research Foundation
record_format	MEDLINE/PubMed
spelling	pubmed-30937482011-05-20 Dopaminergic Balance between Reward Maximization and Policy Complexity Parush, Naama Tishby, Naftali Bergman, Hagai Front Syst Neurosci Neuroscience Previous reinforcement-learning models of the basal ganglia network have highlighted the role of dopamine in encoding the mismatch between prediction and reality. Far less attention has been paid to the computational goals and algorithms of the main-axis (actor). Here, we construct a top-down model of the basal ganglia with emphasis on the role of dopamine as both a reinforcement learning signal and as a pseudo-temperature signal controlling the general level of basal ganglia excitability and motor vigilance of the acting agent. We argue that the basal ganglia endow the thalamic-cortical networks with the optimal dynamic tradeoff between two constraints: minimizing the policy complexity (cost) and maximizing the expected future reward (gain). We show that this multi-dimensional optimization processes results in an experience-modulated version of the softmax behavioral policy. Thus, as in classical softmax behavioral policies, probability of actions are selected according to their estimated values and the pseudo-temperature, but in addition also vary according to the frequency of previous choices of these actions. We conclude that the computational goal of the basal ganglia is not to maximize cumulative (positive and negative) reward. Rather, the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy. We suggest that beyond its role in the modulation of the efficacy of the cortico-striatal synapses, dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost. The resulting experience and dopamine modulated softmax policy can then serve as a theoretical framework to account for the broad range of behaviors and clinical states governed by the basal ganglia and dopamine systems. Frontiers Research Foundation 2011-05-09 /pmc/articles/PMC3093748/ /pubmed/21603228 http://dx.doi.org/10.3389/fnsys.2011.00022 Text en Copyright © 2011 Parush, Tishby and Bergman. http://www.frontiersin.org/licenseagreement This is an open-access article subject to a non-exclusive license between the authors and Frontiers Media SA, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and other Frontiers conditions are complied with.
spellingShingle	Neuroscience Parush, Naama Tishby, Naftali Bergman, Hagai Dopaminergic Balance between Reward Maximization and Policy Complexity
title	Dopaminergic Balance between Reward Maximization and Policy Complexity
title_full	Dopaminergic Balance between Reward Maximization and Policy Complexity
title_fullStr	Dopaminergic Balance between Reward Maximization and Policy Complexity
title_full_unstemmed	Dopaminergic Balance between Reward Maximization and Policy Complexity
title_short	Dopaminergic Balance between Reward Maximization and Policy Complexity
title_sort	dopaminergic balance between reward maximization and policy complexity
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093748/ https://www.ncbi.nlm.nih.gov/pubmed/21603228 http://dx.doi.org/10.3389/fnsys.2011.00022
work_keys_str_mv	AT parushnaama dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity AT tishbynaftali dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity AT bergmanhagai dopaminergicbalancebetweenrewardmaximizationandpolicycomplexity

Dopaminergic Balance between Reward Maximization and Policy Complexity

Ejemplares similares