Cargando…
Meta-reinforcement learning via orbitofrontal cortex
The meta-reinforcement learning (meta-RL) framework, which involves RL over multiple timescales, has been successful in training deep RL models that generalize to new environments. It has been hypothesized that the prefrontal cortex may mediate meta-RL in the brain, but the evidence is scarce. Here...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group US
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10689244/ https://www.ncbi.nlm.nih.gov/pubmed/37957318 http://dx.doi.org/10.1038/s41593-023-01485-3 |
_version_ | 1785152330366189568 |
---|---|
author | Hattori, Ryoma Hedrick, Nathan G. Jain, Anant Chen, Shuqi You, Hanjia Hattori, Mariko Choi, Jun-Hyeok Lim, Byung Kook Yasuda, Ryohei Komiyama, Takaki |
author_facet | Hattori, Ryoma Hedrick, Nathan G. Jain, Anant Chen, Shuqi You, Hanjia Hattori, Mariko Choi, Jun-Hyeok Lim, Byung Kook Yasuda, Ryohei Komiyama, Takaki |
author_sort | Hattori, Ryoma |
collection | PubMed |
description | The meta-reinforcement learning (meta-RL) framework, which involves RL over multiple timescales, has been successful in training deep RL models that generalize to new environments. It has been hypothesized that the prefrontal cortex may mediate meta-RL in the brain, but the evidence is scarce. Here we show that the orbitofrontal cortex (OFC) mediates meta-RL. We trained mice and deep RL models on a probabilistic reversal learning task across sessions during which they improved their trial-by-trial RL policy through meta-learning. Ca(2+)/calmodulin-dependent protein kinase II-dependent synaptic plasticity in OFC was necessary for this meta-learning but not for the within-session trial-by-trial RL in experts. After meta-learning, OFC activity robustly encoded value signals, and OFC inactivation impaired the RL behaviors. Longitudinal tracking of OFC activity revealed that meta-learning gradually shapes population value coding to guide the ongoing behavioral policy. Our results indicate that two distinct RL algorithms with distinct neural mechanisms and timescales coexist in OFC to support adaptive decision-making. |
format | Online Article Text |
id | pubmed-10689244 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group US |
record_format | MEDLINE/PubMed |
spelling | pubmed-106892442023-12-02 Meta-reinforcement learning via orbitofrontal cortex Hattori, Ryoma Hedrick, Nathan G. Jain, Anant Chen, Shuqi You, Hanjia Hattori, Mariko Choi, Jun-Hyeok Lim, Byung Kook Yasuda, Ryohei Komiyama, Takaki Nat Neurosci Article The meta-reinforcement learning (meta-RL) framework, which involves RL over multiple timescales, has been successful in training deep RL models that generalize to new environments. It has been hypothesized that the prefrontal cortex may mediate meta-RL in the brain, but the evidence is scarce. Here we show that the orbitofrontal cortex (OFC) mediates meta-RL. We trained mice and deep RL models on a probabilistic reversal learning task across sessions during which they improved their trial-by-trial RL policy through meta-learning. Ca(2+)/calmodulin-dependent protein kinase II-dependent synaptic plasticity in OFC was necessary for this meta-learning but not for the within-session trial-by-trial RL in experts. After meta-learning, OFC activity robustly encoded value signals, and OFC inactivation impaired the RL behaviors. Longitudinal tracking of OFC activity revealed that meta-learning gradually shapes population value coding to guide the ongoing behavioral policy. Our results indicate that two distinct RL algorithms with distinct neural mechanisms and timescales coexist in OFC to support adaptive decision-making. Nature Publishing Group US 2023-11-13 2023 /pmc/articles/PMC10689244/ /pubmed/37957318 http://dx.doi.org/10.1038/s41593-023-01485-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Hattori, Ryoma Hedrick, Nathan G. Jain, Anant Chen, Shuqi You, Hanjia Hattori, Mariko Choi, Jun-Hyeok Lim, Byung Kook Yasuda, Ryohei Komiyama, Takaki Meta-reinforcement learning via orbitofrontal cortex |
title | Meta-reinforcement learning via orbitofrontal cortex |
title_full | Meta-reinforcement learning via orbitofrontal cortex |
title_fullStr | Meta-reinforcement learning via orbitofrontal cortex |
title_full_unstemmed | Meta-reinforcement learning via orbitofrontal cortex |
title_short | Meta-reinforcement learning via orbitofrontal cortex |
title_sort | meta-reinforcement learning via orbitofrontal cortex |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10689244/ https://www.ncbi.nlm.nih.gov/pubmed/37957318 http://dx.doi.org/10.1038/s41593-023-01485-3 |
work_keys_str_mv | AT hattoriryoma metareinforcementlearningviaorbitofrontalcortex AT hedricknathang metareinforcementlearningviaorbitofrontalcortex AT jainanant metareinforcementlearningviaorbitofrontalcortex AT chenshuqi metareinforcementlearningviaorbitofrontalcortex AT youhanjia metareinforcementlearningviaorbitofrontalcortex AT hattorimariko metareinforcementlearningviaorbitofrontalcortex AT choijunhyeok metareinforcementlearningviaorbitofrontalcortex AT limbyungkook metareinforcementlearningviaorbitofrontalcortex AT yasudaryohei metareinforcementlearningviaorbitofrontalcortex AT komiyamatakaki metareinforcementlearningviaorbitofrontalcortex |