Cargando…

The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning

While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinf...

Descripción completa

Detalles Bibliográficos
Autores principales: Najar, Anis, Bonnet, Emmanuelle, Bahrami, Bahador, Palminteri, Stefano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7723279/
https://www.ncbi.nlm.nih.gov/pubmed/33290387
http://dx.doi.org/10.1371/journal.pbio.3001028
_version_ 1783620311863787520
author Najar, Anis
Bonnet, Emmanuelle
Bahrami, Bahador
Palminteri, Stefano
author_facet Najar, Anis
Bonnet, Emmanuelle
Bahrami, Bahador
Palminteri, Stefano
author_sort Najar, Anis
collection PubMed
description While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner’s action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator’s value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator’s actions directly affect the learner’s value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner’s behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators’ choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.
format Online
Article
Text
id pubmed-7723279
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-77232792020-12-16 The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning Najar, Anis Bonnet, Emmanuelle Bahrami, Bahador Palminteri, Stefano PLoS Biol Research Article While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner’s action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator’s value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator’s actions directly affect the learner’s value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner’s behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators’ choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning. Public Library of Science 2020-12-08 /pmc/articles/PMC7723279/ /pubmed/33290387 http://dx.doi.org/10.1371/journal.pbio.3001028 Text en © 2020 Najar et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Najar, Anis
Bonnet, Emmanuelle
Bahrami, Bahador
Palminteri, Stefano
The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
title The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
title_full The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
title_fullStr The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
title_full_unstemmed The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
title_short The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
title_sort actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7723279/
https://www.ncbi.nlm.nih.gov/pubmed/33290387
http://dx.doi.org/10.1371/journal.pbio.3001028
work_keys_str_mv AT najaranis theactionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning
AT bonnetemmanuelle theactionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning
AT bahramibahador theactionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning
AT palminteristefano theactionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning
AT najaranis actionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning
AT bonnetemmanuelle actionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning
AT bahramibahador actionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning
AT palminteristefano actionsofothersactasapseudorewardtodriveimitationinthecontextofsocialreinforcementlearning