Cargando…

Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI

Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well deter...

Descripción completa

Detalles Bibliográficos
Autores principales: Colas, Jaron T., Pauli, Wolfgang M., Larsen, Tobias, Tyszka, J. Michael, O’Doherty, John P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673235/
https://www.ncbi.nlm.nih.gov/pubmed/29049406
http://dx.doi.org/10.1371/journal.pcbi.1005810
_version_ 1783276569635061760
author Colas, Jaron T.
Pauli, Wolfgang M.
Larsen, Tobias
Tyszka, J. Michael
O’Doherty, John P.
author_facet Colas, Jaron T.
Pauli, Wolfgang M.
Larsen, Tobias
Tyszka, J. Michael
O’Doherty, John P.
author_sort Colas, Jaron T.
collection PubMed
description Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.
format Online
Article
Text
id pubmed-5673235
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56732352017-11-18 Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI Colas, Jaron T. Pauli, Wolfgang M. Larsen, Tobias Tyszka, J. Michael O’Doherty, John P. PLoS Comput Biol Research Article Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning. Public Library of Science 2017-10-19 /pmc/articles/PMC5673235/ /pubmed/29049406 http://dx.doi.org/10.1371/journal.pcbi.1005810 Text en © 2017 Colas et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Colas, Jaron T.
Pauli, Wolfgang M.
Larsen, Tobias
Tyszka, J. Michael
O’Doherty, John P.
Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI
title Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI
title_full Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI
title_fullStr Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI
title_full_unstemmed Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI
title_short Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI
title_sort distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fmri
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673235/
https://www.ncbi.nlm.nih.gov/pubmed/29049406
http://dx.doi.org/10.1371/journal.pcbi.1005810
work_keys_str_mv AT colasjaront distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri
AT pauliwolfgangm distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri
AT larsentobias distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri
AT tyszkajmichael distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri
AT odohertyjohnp distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri