Cargando…

Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cone, Ian, Clopath, Claudia, Shouval, Harel Z.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Journal Experts 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10543312/ https://www.ncbi.nlm.nih.gov/pubmed/37790466 http://dx.doi.org/10.21203/rs.3.rs-3289985/v1

_version_	1785114274075508736
author	Cone, Ian Clopath, Claudia Shouval, Harel Z.
author_facet	Cone, Ian Clopath, Claudia Shouval, Harel Z.
author_sort	Cone, Ian
collection	PubMed
description	The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.
format	Online Article Text
id	pubmed-10543312
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Journal Experts
record_format	MEDLINE/PubMed
spelling	pubmed-105433122023-10-03 Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time Cone, Ian Clopath, Claudia Shouval, Harel Z. Res Sq Article The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data. American Journal Experts 2023-09-19 /pmc/articles/PMC10543312/ /pubmed/37790466 http://dx.doi.org/10.21203/rs.3.rs-3289985/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Cone, Ian Clopath, Claudia Shouval, Harel Z. Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time
title	Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time
title_full	Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time
title_fullStr	Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time
title_full_unstemmed	Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time
title_short	Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time
title_sort	learning to express reward prediction error-like dopaminergic activity requires plastic representations of time
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10543312/ https://www.ncbi.nlm.nih.gov/pubmed/37790466 http://dx.doi.org/10.21203/rs.3.rs-3289985/v1
work_keys_str_mv	AT coneian learningtoexpressrewardpredictionerrorlikedopaminergicactivityrequiresplasticrepresentationsoftime AT clopathclaudia learningtoexpressrewardpredictionerrorlikedopaminergicactivityrequiresplasticrepresentationsoftime AT shouvalharelz learningtoexpressrewardpredictionerrorlikedopaminergicactivityrequiresplasticrepresentationsoftime

Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time

Ejemplares similares