Cargando…

Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules

It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in t...

Descripción completa

Detalles Bibliográficos
Autores principales: La Camera, Giancarlo, Richmond, Barry J.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2453237/
https://www.ncbi.nlm.nih.gov/pubmed/18688266
http://dx.doi.org/10.1371/journal.pcbi.1000131
_version_ 1782157366579429376
author La Camera, Giancarlo
Richmond, Barry J.
author_facet La Camera, Giancarlo
Richmond, Barry J.
author_sort La Camera, Giancarlo
collection PubMed
description It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys.
format Text
id pubmed-2453237
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-24532372008-08-08 Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules La Camera, Giancarlo Richmond, Barry J. PLoS Comput Biol Research Article It is often assumed that animals and people adjust their behavior to maximize reward acquisition. In visually cued reinforcement schedules, monkeys make errors in trials that are not immediately rewarded, despite having to repeat error trials. Here we show that error rates are typically smaller in trials equally distant from reward but belonging to longer schedules (referred to as “schedule length effect”). This violates the principles of reward maximization and invariance and cannot be predicted by the standard methods of Reinforcement Learning, such as the method of temporal differences. We develop a heuristic model that accounts for all of the properties of the behavior in the reinforcement schedule task but whose predictions are not different from those of the standard temporal difference model in choice tasks. In the modification of temporal difference learning introduced here, the effect of schedule length emerges spontaneously from the sensitivity to the immediately preceding trial. We also introduce a policy for general Markov Decision Processes, where the decision made at each node is conditioned on the motivation to perform an instrumental action, and show that the application of our model to the reinforcement schedule task and the choice task are special cases of this general theoretical framework. Within this framework, Reinforcement Learning can approach contextual learning with the mixture of empirical findings and principled assumptions that seem to coexist in the best descriptions of animal behavior. As examples, we discuss two phenomena observed in humans that often derive from the violation of the principle of invariance: “framing,” wherein equivalent options are treated differently depending on the context in which they are presented, and the “sunk cost” effect, the greater tendency to continue an endeavor once an investment in money, effort, or time has been made. The schedule length effect might be a manifestation of these phenomena in monkeys. Public Library of Science 2008-08-08 /pmc/articles/PMC2453237/ /pubmed/18688266 http://dx.doi.org/10.1371/journal.pcbi.1000131 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
La Camera, Giancarlo
Richmond, Barry J.
Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
title Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
title_full Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
title_fullStr Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
title_full_unstemmed Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
title_short Modeling the Violation of Reward Maximization and Invariance in Reinforcement Schedules
title_sort modeling the violation of reward maximization and invariance in reinforcement schedules
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2453237/
https://www.ncbi.nlm.nih.gov/pubmed/18688266
http://dx.doi.org/10.1371/journal.pcbi.1000131
work_keys_str_mv AT lacameragiancarlo modelingtheviolationofrewardmaximizationandinvarianceinreinforcementschedules
AT richmondbarryj modelingtheviolationofrewardmaximizationandinvarianceinreinforcementschedules