Cargando…

Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks

A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plastici...

Descripción completa

Detalles Bibliográficos
Autores principales:	Blackwell, Kim T., Doya, Kenji
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10479916/ https://www.ncbi.nlm.nih.gov/pubmed/37594982 http://dx.doi.org/10.1371/journal.pcbi.1011385

_version_	1785101694884904960
author	Blackwell, Kim T. Doya, Kenji
author_facet	Blackwell, Kim T. Doya, Kenji
author_sort	Blackwell, Kim T.
collection	PubMed
description	A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons.
format	Online Article Text
id	pubmed-10479916
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-104799162023-09-06 Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks Blackwell, Kim T. Doya, Kenji PLoS Comput Biol Research Article A major advance in understanding learning behavior stems from experiments showing that reward learning requires dopamine inputs to striatal neurons and arises from synaptic plasticity of cortico-striatal synapses. Numerous reinforcement learning models mimic this dopamine-dependent synaptic plasticity by using the reward prediction error, which resembles dopamine neuron firing, to learn the best action in response to a set of cues. Though these models can explain many facets of behavior, reproducing some types of goal-directed behavior, such as renewal and reversal, require additional model components. Here we present a reinforcement learning model, TD2Q, which better corresponds to the basal ganglia with two Q matrices, one representing direct pathway neurons (G) and another representing indirect pathway neurons (N). Unlike previous two-Q architectures, a novel and critical aspect of TD2Q is to update the G and N matrices utilizing the temporal difference reward prediction error. A best action is selected for N and G using a softmax with a reward-dependent adaptive exploration parameter, and then differences are resolved using a second selection step applied to the two action probabilities. The model is tested on a range of multi-step tasks including extinction, renewal, discrimination; switching reward probability learning; and sequence learning. Simulations show that TD2Q produces behaviors similar to rodents in choice and sequence learning tasks, and that use of the temporal difference reward prediction error is required to learn multi-step tasks. Blocking the update rule on the N matrix blocks discrimination learning, as observed experimentally. Performance in the sequence learning task is dramatically improved with two matrices. These results suggest that including additional aspects of basal ganglia physiology can improve the performance of reinforcement learning models, better reproduce animal behaviors, and provide insight as to the role of direct- and indirect-pathway striatal neurons. Public Library of Science 2023-08-18 /pmc/articles/PMC10479916/ /pubmed/37594982 http://dx.doi.org/10.1371/journal.pcbi.1011385 Text en © 2023 Blackwell, Doya https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Blackwell, Kim T. Doya, Kenji Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
title	Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
title_full	Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
title_fullStr	Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
title_full_unstemmed	Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
title_short	Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
title_sort	enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10479916/ https://www.ncbi.nlm.nih.gov/pubmed/37594982 http://dx.doi.org/10.1371/journal.pcbi.1011385
work_keys_str_mv	AT blackwellkimt enhancingreinforcementlearningmodelsbyincludingdirectandindirectpathwaysimprovesperformanceonstriataldependenttasks AT doyakenji enhancingreinforcementlearningmodelsbyincludingdirectandindirectpathwaysimprovesperformanceonstriataldependenttasks

Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks

Ejemplares similares