Cargando…

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference...

Descripción completa

Detalles Bibliográficos
Autores principales:	Potjans, Wiebke, Diesmann, Markus, Morrison, Abigail
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093351/ https://www.ncbi.nlm.nih.gov/pubmed/21589888 http://dx.doi.org/10.1371/journal.pcbi.1001133

_version_	1782203462045401088
author	Potjans, Wiebke Diesmann, Markus Morrison, Abigail
author_facet	Potjans, Wiebke Diesmann, Markus Morrison, Abigail
author_sort	Potjans, Wiebke
collection	PubMed
description	An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards.
format	Text
id	pubmed-3093351
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-30933512011-05-17 An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning Potjans, Wiebke Diesmann, Markus Morrison, Abigail PLoS Comput Biol Research Article An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards. Public Library of Science 2011-05-12 /pmc/articles/PMC3093351/ /pubmed/21589888 http://dx.doi.org/10.1371/journal.pcbi.1001133 Text en Potjans et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Potjans, Wiebke Diesmann, Markus Morrison, Abigail An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
title	An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
title_full	An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
title_fullStr	An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
title_full_unstemmed	An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
title_short	An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
title_sort	imperfect dopaminergic error signal can drive temporal-difference learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3093351/ https://www.ncbi.nlm.nih.gov/pubmed/21589888 http://dx.doi.org/10.1371/journal.pcbi.1001133
work_keys_str_mv	AT potjanswiebke animperfectdopaminergicerrorsignalcandrivetemporaldifferencelearning AT diesmannmarkus animperfectdopaminergicerrorsignalcandrivetemporaldifferencelearning AT morrisonabigail animperfectdopaminergicerrorsignalcandrivetemporaldifferencelearning AT potjanswiebke imperfectdopaminergicerrorsignalcandrivetemporaldifferencelearning AT diesmannmarkus imperfectdopaminergicerrorsignalcandrivetemporaldifferencelearning AT morrisonabigail imperfectdopaminergicerrorsignalcandrivetemporaldifferencelearning

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

Ejemplares similares