Cargando…

The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces

The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in mach...

Descripción completa

Detalles Bibliográficos
Autores principales: Huertas, Marco A., Schwettmann, Sarah E., Shouval, Harel Z.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5156839/
https://www.ncbi.nlm.nih.gov/pubmed/28018206
http://dx.doi.org/10.3389/fnsyn.2016.00037
_version_ 1782481334713712640
author Huertas, Marco A.
Schwettmann, Sarah E.
Shouval, Harel Z.
author_facet Huertas, Marco A.
Schwettmann, Sarah E.
Shouval, Harel Z.
author_sort Huertas, Marco A.
collection PubMed
description The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal.
format Online
Article
Text
id pubmed-5156839
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-51568392016-12-23 The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z. Front Synaptic Neurosci Neuroscience The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal. Frontiers Media S.A. 2016-12-15 /pmc/articles/PMC5156839/ /pubmed/28018206 http://dx.doi.org/10.3389/fnsyn.2016.00037 Text en Copyright © 2016 Huertas, Schwettmann and Shouval. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Huertas, Marco A.
Schwettmann, Sarah E.
Shouval, Harel Z.
The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_full The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_fullStr The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_full_unstemmed The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_short The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_sort role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5156839/
https://www.ncbi.nlm.nih.gov/pubmed/28018206
http://dx.doi.org/10.3389/fnsyn.2016.00037
work_keys_str_mv AT huertasmarcoa theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces
AT schwettmannsarahe theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces
AT shouvalharelz theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces
AT huertasmarcoa roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces
AT schwettmannsarahe roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces
AT shouvalharelz roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces