Cargando…

Combined model-free and model-sensitive reinforcement learning in non-human primates

Contemporary reinforcement learning (RL) theory suggests that potential choices can be evaluated by strategies that may or may not be sensitive to the computational structure of tasks. A paradigmatic model-free (MF) strategy simply repeats actions that have been rewarded in the past; by contrast, mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Miranda, Bruno, Malalasekera, W. M. Nishantha, Behrens, Timothy E., Dayan, Peter, Kennerley, Steven W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7332075/
https://www.ncbi.nlm.nih.gov/pubmed/32569311
http://dx.doi.org/10.1371/journal.pcbi.1007944
_version_ 1783553456567484416
author Miranda, Bruno
Malalasekera, W. M. Nishantha
Behrens, Timothy E.
Dayan, Peter
Kennerley, Steven W.
author_facet Miranda, Bruno
Malalasekera, W. M. Nishantha
Behrens, Timothy E.
Dayan, Peter
Kennerley, Steven W.
author_sort Miranda, Bruno
collection PubMed
description Contemporary reinforcement learning (RL) theory suggests that potential choices can be evaluated by strategies that may or may not be sensitive to the computational structure of tasks. A paradigmatic model-free (MF) strategy simply repeats actions that have been rewarded in the past; by contrast, model-sensitive (MS) strategies exploit richer information associated with knowledge of task dynamics. MF and MS strategies should typically be combined, because they have complementary statistical and computational strengths; however, this tradeoff between MF/MS RL has mostly only been demonstrated in humans, often with only modest numbers of trials. We trained rhesus monkeys to perform a two-stage decision task designed to elicit and discriminate the use of MF and MS methods. A descriptive analysis of choice behaviour revealed directly that the structure of the task (of MS importance) and the reward history (of MF and MS importance) significantly influenced both choice and response vigour. A detailed, trial-by-trial computational analysis confirmed that choices were made according to a combination of strategies, with a dominant influence of a particular form of model sensitivity that persisted over weeks of testing. The residuals from this model necessitated development of a new combined RL model which incorporates a particular credit assignment weighting procedure. Finally, response vigor exhibited a subtly different collection of MF and MS influences. These results provide new illumination onto RL behavioural processes in non-human primates.
format Online
Article
Text
id pubmed-7332075
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-73320752020-07-15 Combined model-free and model-sensitive reinforcement learning in non-human primates Miranda, Bruno Malalasekera, W. M. Nishantha Behrens, Timothy E. Dayan, Peter Kennerley, Steven W. PLoS Comput Biol Research Article Contemporary reinforcement learning (RL) theory suggests that potential choices can be evaluated by strategies that may or may not be sensitive to the computational structure of tasks. A paradigmatic model-free (MF) strategy simply repeats actions that have been rewarded in the past; by contrast, model-sensitive (MS) strategies exploit richer information associated with knowledge of task dynamics. MF and MS strategies should typically be combined, because they have complementary statistical and computational strengths; however, this tradeoff between MF/MS RL has mostly only been demonstrated in humans, often with only modest numbers of trials. We trained rhesus monkeys to perform a two-stage decision task designed to elicit and discriminate the use of MF and MS methods. A descriptive analysis of choice behaviour revealed directly that the structure of the task (of MS importance) and the reward history (of MF and MS importance) significantly influenced both choice and response vigour. A detailed, trial-by-trial computational analysis confirmed that choices were made according to a combination of strategies, with a dominant influence of a particular form of model sensitivity that persisted over weeks of testing. The residuals from this model necessitated development of a new combined RL model which incorporates a particular credit assignment weighting procedure. Finally, response vigor exhibited a subtly different collection of MF and MS influences. These results provide new illumination onto RL behavioural processes in non-human primates. Public Library of Science 2020-06-22 /pmc/articles/PMC7332075/ /pubmed/32569311 http://dx.doi.org/10.1371/journal.pcbi.1007944 Text en © 2020 Miranda et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Miranda, Bruno
Malalasekera, W. M. Nishantha
Behrens, Timothy E.
Dayan, Peter
Kennerley, Steven W.
Combined model-free and model-sensitive reinforcement learning in non-human primates
title Combined model-free and model-sensitive reinforcement learning in non-human primates
title_full Combined model-free and model-sensitive reinforcement learning in non-human primates
title_fullStr Combined model-free and model-sensitive reinforcement learning in non-human primates
title_full_unstemmed Combined model-free and model-sensitive reinforcement learning in non-human primates
title_short Combined model-free and model-sensitive reinforcement learning in non-human primates
title_sort combined model-free and model-sensitive reinforcement learning in non-human primates
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7332075/
https://www.ncbi.nlm.nih.gov/pubmed/32569311
http://dx.doi.org/10.1371/journal.pcbi.1007944
work_keys_str_mv AT mirandabruno combinedmodelfreeandmodelsensitivereinforcementlearninginnonhumanprimates
AT malalasekerawmnishantha combinedmodelfreeandmodelsensitivereinforcementlearninginnonhumanprimates
AT behrenstimothye combinedmodelfreeandmodelsensitivereinforcementlearninginnonhumanprimates
AT dayanpeter combinedmodelfreeandmodelsensitivereinforcementlearninginnonhumanprimates
AT kennerleystevenw combinedmodelfreeandmodelsensitivereinforcementlearninginnonhumanprimates