Cargando…
The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in mach...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5156839/ https://www.ncbi.nlm.nih.gov/pubmed/28018206 http://dx.doi.org/10.3389/fnsyn.2016.00037 |
_version_ | 1782481334713712640 |
---|---|
author | Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z. |
author_facet | Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z. |
author_sort | Huertas, Marco A. |
collection | PubMed |
description | The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal. |
format | Online Article Text |
id | pubmed-5156839 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-51568392016-12-23 The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z. Front Synaptic Neurosci Neuroscience The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal. Frontiers Media S.A. 2016-12-15 /pmc/articles/PMC5156839/ /pubmed/28018206 http://dx.doi.org/10.3389/fnsyn.2016.00037 Text en Copyright © 2016 Huertas, Schwettmann and Shouval. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z. The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces |
title | The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces |
title_full | The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces |
title_fullStr | The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces |
title_full_unstemmed | The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces |
title_short | The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces |
title_sort | role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5156839/ https://www.ncbi.nlm.nih.gov/pubmed/28018206 http://dx.doi.org/10.3389/fnsyn.2016.00037 |
work_keys_str_mv | AT huertasmarcoa theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT schwettmannsarahe theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT shouvalharelz theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT huertasmarcoa roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT schwettmannsarahe roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT shouvalharelz roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces |