Cargando…

Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous...

Descripción completa

Detalles Bibliográficos
Autores principales: Vasilaki, Eleni, Frémaux, Nicolas, Urbanczik, Robert, Senn, Walter, Gerstner, Wulfram
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778872/
https://www.ncbi.nlm.nih.gov/pubmed/19997492
http://dx.doi.org/10.1371/journal.pcbi.1000586
_version_ 1782174309627723776
author Vasilaki, Eleni
Frémaux, Nicolas
Urbanczik, Robert
Senn, Walter
Gerstner, Wulfram
author_facet Vasilaki, Eleni
Frémaux, Nicolas
Urbanczik, Robert
Senn, Walter
Gerstner, Wulfram
author_sort Vasilaki, Eleni
collection PubMed
description Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties.
format Text
id pubmed-2778872
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27788722009-12-08 Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail Vasilaki, Eleni Frémaux, Nicolas Urbanczik, Robert Senn, Walter Gerstner, Wulfram PLoS Comput Biol Research Article Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties. Public Library of Science 2009-12-04 /pmc/articles/PMC2778872/ /pubmed/19997492 http://dx.doi.org/10.1371/journal.pcbi.1000586 Text en Vasilaki et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Vasilaki, Eleni
Frémaux, Nicolas
Urbanczik, Robert
Senn, Walter
Gerstner, Wulfram
Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
title Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
title_full Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
title_fullStr Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
title_full_unstemmed Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
title_short Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail
title_sort spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778872/
https://www.ncbi.nlm.nih.gov/pubmed/19997492
http://dx.doi.org/10.1371/journal.pcbi.1000586
work_keys_str_mv AT vasilakieleni spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail
AT fremauxnicolas spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail
AT urbanczikrobert spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail
AT sennwalter spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail
AT gerstnerwulfram spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail