Cargando…

Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation

It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced beh...

Descripción completa

Detalles Bibliográficos
Autores principales: Kato, Ayaka, Morita, Kenji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5063413/
https://www.ncbi.nlm.nih.gov/pubmed/27736881
http://dx.doi.org/10.1371/journal.pcbi.1005145
_version_ 1782459971215032320
author Kato, Ayaka
Morita, Kenji
author_facet Kato, Ayaka
Morita, Kenji
author_sort Kato, Ayaka
collection PubMed
description It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.
format Online
Article
Text
id pubmed-5063413
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50634132016-11-04 Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation Kato, Ayaka Morita, Kenji PLoS Comput Biol Research Article It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. Public Library of Science 2016-10-13 /pmc/articles/PMC5063413/ /pubmed/27736881 http://dx.doi.org/10.1371/journal.pcbi.1005145 Text en © 2016 Kato, Morita http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kato, Ayaka
Morita, Kenji
Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
title Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
title_full Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
title_fullStr Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
title_full_unstemmed Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
title_short Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
title_sort forgetting in reinforcement learning links sustained dopamine signals to motivation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5063413/
https://www.ncbi.nlm.nih.gov/pubmed/27736881
http://dx.doi.org/10.1371/journal.pcbi.1005145
work_keys_str_mv AT katoayaka forgettinginreinforcementlearninglinkssustaineddopaminesignalstomotivation
AT moritakenji forgettinginreinforcementlearninglinkssustaineddopaminesignalstomotivation