Cargando…

The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces

The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in mach...

Descripción completa

Detalles Bibliográficos
Autores principales:	Huertas, Marco A., Schwettmann, Sarah E., Shouval, Harel Z.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2016
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5156839/ https://www.ncbi.nlm.nih.gov/pubmed/28018206 http://dx.doi.org/10.3389/fnsyn.2016.00037

_version_	1782481334713712640
author	Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z.
author_facet	Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z.
author_sort	Huertas, Marco A.
collection	PubMed
description	The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal.
format	Online Article Text
id	pubmed-5156839
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-51568392016-12-23 The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z. Front Synaptic Neurosci Neuroscience The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal. Frontiers Media S.A. 2016-12-15 /pmc/articles/PMC5156839/ /pubmed/28018206 http://dx.doi.org/10.3389/fnsyn.2016.00037 Text en Copyright © 2016 Huertas, Schwettmann and Shouval. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Huertas, Marco A. Schwettmann, Sarah E. Shouval, Harel Z. The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title	The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_full	The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_fullStr	The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_full_unstemmed	The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_short	The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
title_sort	role of multiple neuromodulators in reinforcement learning that is based on competition between eligibility traces
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5156839/ https://www.ncbi.nlm.nih.gov/pubmed/28018206 http://dx.doi.org/10.3389/fnsyn.2016.00037
work_keys_str_mv	AT huertasmarcoa theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT schwettmannsarahe theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT shouvalharelz theroleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT huertasmarcoa roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT schwettmannsarahe roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces AT shouvalharelz roleofmultipleneuromodulatorsinreinforcementlearningthatisbasedoncompetitionbetweeneligibilitytraces

The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces

Ejemplares similares