Cargando…

Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness

The real world is essentially an indefinite environment in which the probability space, i. e., what can happen, cannot be specified in advance. Conventional reinforcement learning models that learn under uncertain conditions are given the state space as prior knowledge. Here, we developed a reinforc...

Descripción completa

Detalles Bibliográficos
Autores principales: Katakura, Tokio, Yoshida, Mikihiro, Hisano, Haruki, Mushiake, Hajime, Sakamoto, Kazuhiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8855153/
https://www.ncbi.nlm.nih.gov/pubmed/35185502
http://dx.doi.org/10.3389/fncom.2021.784592
_version_ 1784653592892801024
author Katakura, Tokio
Yoshida, Mikihiro
Hisano, Haruki
Mushiake, Hajime
Sakamoto, Kazuhiro
author_facet Katakura, Tokio
Yoshida, Mikihiro
Hisano, Haruki
Mushiake, Hajime
Sakamoto, Kazuhiro
author_sort Katakura, Tokio
collection PubMed
description The real world is essentially an indefinite environment in which the probability space, i. e., what can happen, cannot be specified in advance. Conventional reinforcement learning models that learn under uncertain conditions are given the state space as prior knowledge. Here, we developed a reinforcement learning model with a dynamic state space and tested it on a two-target search task previously used for monkeys. In the task, two out of four neighboring spots were alternately correct, and the valid pair was switched after consecutive correct trials in the exploitation phase. The agent was required to find a new pair during the exploration phase, but it could not obtain the maximum reward by referring only to the single previous one trial; it needed to select an action based on the two previous trials. To adapt to this task structure without prior knowledge, the model expanded its state space so that it referred to more than one trial as the previous state, based on two explicit criteria for appropriateness of state expansion: experience saturation and decision uniqueness of action selection. The model not only performed comparably to the ideal model given prior knowledge of the task structure, but also performed well on a task that was not envisioned when the models were developed. Moreover, it learned how to search rationally without falling into the exploration–exploitation trade-off. For constructing a learning model that can adapt to an indefinite environment, the method of expanding the state space based on experience saturation and decision uniqueness of action selection used by our model is promising.
format Online
Article
Text
id pubmed-8855153
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-88551532022-02-19 Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness Katakura, Tokio Yoshida, Mikihiro Hisano, Haruki Mushiake, Hajime Sakamoto, Kazuhiro Front Comput Neurosci Neuroscience The real world is essentially an indefinite environment in which the probability space, i. e., what can happen, cannot be specified in advance. Conventional reinforcement learning models that learn under uncertain conditions are given the state space as prior knowledge. Here, we developed a reinforcement learning model with a dynamic state space and tested it on a two-target search task previously used for monkeys. In the task, two out of four neighboring spots were alternately correct, and the valid pair was switched after consecutive correct trials in the exploitation phase. The agent was required to find a new pair during the exploration phase, but it could not obtain the maximum reward by referring only to the single previous one trial; it needed to select an action based on the two previous trials. To adapt to this task structure without prior knowledge, the model expanded its state space so that it referred to more than one trial as the previous state, based on two explicit criteria for appropriateness of state expansion: experience saturation and decision uniqueness of action selection. The model not only performed comparably to the ideal model given prior knowledge of the task structure, but also performed well on a task that was not envisioned when the models were developed. Moreover, it learned how to search rationally without falling into the exploration–exploitation trade-off. For constructing a learning model that can adapt to an indefinite environment, the method of expanding the state space based on experience saturation and decision uniqueness of action selection used by our model is promising. Frontiers Media S.A. 2022-02-04 /pmc/articles/PMC8855153/ /pubmed/35185502 http://dx.doi.org/10.3389/fncom.2021.784592 Text en Copyright © 2022 Katakura, Yoshida, Hisano, Mushiake and Sakamoto. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Katakura, Tokio
Yoshida, Mikihiro
Hisano, Haruki
Mushiake, Hajime
Sakamoto, Kazuhiro
Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
title Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
title_full Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
title_fullStr Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
title_full_unstemmed Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
title_short Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
title_sort reinforcement learning model with dynamic state space tested on target search tasks for monkeys: self-determination of previous states based on experience saturation and decision uniqueness
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8855153/
https://www.ncbi.nlm.nih.gov/pubmed/35185502
http://dx.doi.org/10.3389/fncom.2021.784592
work_keys_str_mv AT katakuratokio reinforcementlearningmodelwithdynamicstatespacetestedontargetsearchtasksformonkeysselfdeterminationofpreviousstatesbasedonexperiencesaturationanddecisionuniqueness
AT yoshidamikihiro reinforcementlearningmodelwithdynamicstatespacetestedontargetsearchtasksformonkeysselfdeterminationofpreviousstatesbasedonexperiencesaturationanddecisionuniqueness
AT hisanoharuki reinforcementlearningmodelwithdynamicstatespacetestedontargetsearchtasksformonkeysselfdeterminationofpreviousstatesbasedonexperiencesaturationanddecisionuniqueness
AT mushiakehajime reinforcementlearningmodelwithdynamicstatespacetestedontargetsearchtasksformonkeysselfdeterminationofpreviousstatesbasedonexperiencesaturationanddecisionuniqueness
AT sakamotokazuhiro reinforcementlearningmodelwithdynamicstatespacetestedontargetsearchtasksformonkeysselfdeterminationofpreviousstatesbasedonexperiencesaturationanddecisionuniqueness