Cargando…

Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward

“To do or not to do” is a fundamental decision that has to be made in daily life. Behaviors related to multiple “to do” choice tasks have long been explained by reinforcement learning, and “to do or not to do” tasks such as the go/no-go task have also been recently discussed within the framework of...

Descripción completa

Detalles Bibliográficos
Autores principales: Tanimoto, Sai, Kondo, Masashi, Morita, Kenji, Yoshida, Eriko, Matsuzaki, Masanori
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7498735/
https://www.ncbi.nlm.nih.gov/pubmed/33100979
http://dx.doi.org/10.3389/fnbeh.2020.00141
_version_ 1783583575452418048
author Tanimoto, Sai
Kondo, Masashi
Morita, Kenji
Yoshida, Eriko
Matsuzaki, Masanori
author_facet Tanimoto, Sai
Kondo, Masashi
Morita, Kenji
Yoshida, Eriko
Matsuzaki, Masanori
author_sort Tanimoto, Sai
collection PubMed
description “To do or not to do” is a fundamental decision that has to be made in daily life. Behaviors related to multiple “to do” choice tasks have long been explained by reinforcement learning, and “to do or not to do” tasks such as the go/no-go task have also been recently discussed within the framework of reinforcement learning. In this learning framework, alternative actions and/or the non-action to take are determined by evaluating explicitly given (overt) reward and punishment. However, we assume that there are real life cases in which an action/non-action is repeated, even though there is no obvious reward or punishment, because implicitly given outcomes such as saving physical energy and regret (we refer to this as “covert reward”) can affect the decision-making. In the current task, mice chose to pull a lever or not according to two tone cues assigned with different water reward probabilities (70% and 30% in condition 1, and 30% and 10% in condition 2). As the mice learned, the probability that they would choose to pull the lever decreased (<0.25) in trials with a 30% reward probability cue (30% cue) in condition 1, and in trials with a 10% cue in condition 2, but increased (>0.8) in trials with a 70% cue in condition 1 and a 30% cue in condition 2, even though a non-pull was followed by neither an overt reward nor avoidance of overt punishment in any trial. This behavioral tendency was not well explained by a combination of commonly used Q-learning models, which take only the action choice with an overt reward outcome into account. Instead, we found that the non-action preference of the mice was best explained by Q-learning models, which regarded the non-action as the other choice, and updated non-action values with a covert reward. We propose that “doing nothing” can be actively chosen as an alternative to “doing something,” and that a covert reward could serve as a reinforcer of “doing nothing.”
format Online
Article
Text
id pubmed-7498735
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-74987352020-10-22 Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward Tanimoto, Sai Kondo, Masashi Morita, Kenji Yoshida, Eriko Matsuzaki, Masanori Front Behav Neurosci Neuroscience “To do or not to do” is a fundamental decision that has to be made in daily life. Behaviors related to multiple “to do” choice tasks have long been explained by reinforcement learning, and “to do or not to do” tasks such as the go/no-go task have also been recently discussed within the framework of reinforcement learning. In this learning framework, alternative actions and/or the non-action to take are determined by evaluating explicitly given (overt) reward and punishment. However, we assume that there are real life cases in which an action/non-action is repeated, even though there is no obvious reward or punishment, because implicitly given outcomes such as saving physical energy and regret (we refer to this as “covert reward”) can affect the decision-making. In the current task, mice chose to pull a lever or not according to two tone cues assigned with different water reward probabilities (70% and 30% in condition 1, and 30% and 10% in condition 2). As the mice learned, the probability that they would choose to pull the lever decreased (<0.25) in trials with a 30% reward probability cue (30% cue) in condition 1, and in trials with a 10% cue in condition 2, but increased (>0.8) in trials with a 70% cue in condition 1 and a 30% cue in condition 2, even though a non-pull was followed by neither an overt reward nor avoidance of overt punishment in any trial. This behavioral tendency was not well explained by a combination of commonly used Q-learning models, which take only the action choice with an overt reward outcome into account. Instead, we found that the non-action preference of the mice was best explained by Q-learning models, which regarded the non-action as the other choice, and updated non-action values with a covert reward. We propose that “doing nothing” can be actively chosen as an alternative to “doing something,” and that a covert reward could serve as a reinforcer of “doing nothing.” Frontiers Media S.A. 2020-09-04 /pmc/articles/PMC7498735/ /pubmed/33100979 http://dx.doi.org/10.3389/fnbeh.2020.00141 Text en Copyright © 2020 Tanimoto, Kondo, Morita, Yoshida and Matsuzaki. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Tanimoto, Sai
Kondo, Masashi
Morita, Kenji
Yoshida, Eriko
Matsuzaki, Masanori
Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward
title Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward
title_full Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward
title_fullStr Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward
title_full_unstemmed Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward
title_short Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward
title_sort non-action learning: saving action-associated cost serves as a covert reward
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7498735/
https://www.ncbi.nlm.nih.gov/pubmed/33100979
http://dx.doi.org/10.3389/fnbeh.2020.00141
work_keys_str_mv AT tanimotosai nonactionlearningsavingactionassociatedcostservesasacovertreward
AT kondomasashi nonactionlearningsavingactionassociatedcostservesasacovertreward
AT moritakenji nonactionlearningsavingactionassociatedcostservesasacovertreward
AT yoshidaeriko nonactionlearningsavingactionassociatedcostservesasacovertreward
AT matsuzakimasanori nonactionlearningsavingactionassociatedcostservesasacovertreward