Cargando…

Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes

Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparat...

Descripción completa

Detalles Bibliográficos
Autores principales: Keramati, Mehdi, Dezfouli, Amir, Piray, Payam
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102758/
https://www.ncbi.nlm.nih.gov/pubmed/21637741
http://dx.doi.org/10.1371/journal.pcbi.1002055
_version_ 1782204426893656064
author Keramati, Mehdi
Dezfouli, Amir
Piray, Payam
author_facet Keramati, Mehdi
Dezfouli, Amir
Piray, Payam
author_sort Keramati, Mehdi
collection PubMed
description Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time.
format Text
id pubmed-3102758
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31027582011-06-02 Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes Keramati, Mehdi Dezfouli, Amir Piray, Payam PLoS Comput Biol Research Article Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time. Public Library of Science 2011-05-26 /pmc/articles/PMC3102758/ /pubmed/21637741 http://dx.doi.org/10.1371/journal.pcbi.1002055 Text en Keramati et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Keramati, Mehdi
Dezfouli, Amir
Piray, Payam
Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes
title Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes
title_full Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes
title_fullStr Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes
title_full_unstemmed Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes
title_short Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes
title_sort speed/accuracy trade-off between the habitual and the goal-directed processes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102758/
https://www.ncbi.nlm.nih.gov/pubmed/21637741
http://dx.doi.org/10.1371/journal.pcbi.1002055
work_keys_str_mv AT keramatimehdi speedaccuracytradeoffbetweenthehabitualandthegoaldirectedprocesses
AT dezfouliamir speedaccuracytradeoffbetweenthehabitualandthegoaldirectedprocesses
AT piraypayam speedaccuracytradeoffbetweenthehabitualandthegoaldirectedprocesses