Cargando…

Dopamine reward prediction errors reflect hidden state inference across time

Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Starkweather, Clara Kwon, Babayan, Benedicte M., Uchida, Naoshige, Gershman, Samuel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374025/
https://www.ncbi.nlm.nih.gov/pubmed/28263301
http://dx.doi.org/10.1038/nn.4520
_version_ 1782518826920837120
author Starkweather, Clara Kwon
Babayan, Benedicte M.
Uchida, Naoshige
Gershman, Samuel J.
author_facet Starkweather, Clara Kwon
Babayan, Benedicte M.
Uchida, Naoshige
Gershman, Samuel J.
author_sort Starkweather, Clara Kwon
collection PubMed
description Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference.
format Online
Article
Text
id pubmed-5374025
institution National Center for Biotechnology Information
language English
publishDate 2017
record_format MEDLINE/PubMed
spelling pubmed-53740252017-09-06 Dopamine reward prediction errors reflect hidden state inference across time Starkweather, Clara Kwon Babayan, Benedicte M. Uchida, Naoshige Gershman, Samuel J. Nat Neurosci Article Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a ‘belief state’). In this work, we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling exhibited a striking difference between two tasks that differed only with respect to whether reward was delivered deterministically. Our results favor an associative learning rule that combines cached values with hidden state inference. 2017-03-06 2017-04 /pmc/articles/PMC5374025/ /pubmed/28263301 http://dx.doi.org/10.1038/nn.4520 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Starkweather, Clara Kwon
Babayan, Benedicte M.
Uchida, Naoshige
Gershman, Samuel J.
Dopamine reward prediction errors reflect hidden state inference across time
title Dopamine reward prediction errors reflect hidden state inference across time
title_full Dopamine reward prediction errors reflect hidden state inference across time
title_fullStr Dopamine reward prediction errors reflect hidden state inference across time
title_full_unstemmed Dopamine reward prediction errors reflect hidden state inference across time
title_short Dopamine reward prediction errors reflect hidden state inference across time
title_sort dopamine reward prediction errors reflect hidden state inference across time
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374025/
https://www.ncbi.nlm.nih.gov/pubmed/28263301
http://dx.doi.org/10.1038/nn.4520
work_keys_str_mv AT starkweatherclarakwon dopaminerewardpredictionerrorsreflecthiddenstateinferenceacrosstime
AT babayanbenedictem dopaminerewardpredictionerrorsreflecthiddenstateinferenceacrosstime
AT uchidanaoshige dopaminerewardpredictionerrorsreflecthiddenstateinferenceacrosstime
AT gershmansamuelj dopaminerewardpredictionerrorsreflecthiddenstateinferenceacrosstime