Cargando…

A neural network model for timing control with reinforcement

How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Jing, El-Jayyousi, Yousuf, Ozden, Ilker
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9579423/ https://www.ncbi.nlm.nih.gov/pubmed/36277612 http://dx.doi.org/10.3389/fncom.2022.918031

_version_	1784812178606391296
author	Wang, Jing El-Jayyousi, Yousuf Ozden, Ilker
author_facet	Wang, Jing El-Jayyousi, Yousuf Ozden, Ilker
author_sort	Wang, Jing
collection	PubMed
description	How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment, human subjects proactively generated a series of timed motor outputs. Positive or negative feedback was provided after each response based on the timing accuracy. We found that the sequential motor timing varied at two temporal scales: long-term correlation around the target interval due to memory drifts and short-term adjustments of timing variability according to feedback. We have previously described these two key features of timing variability with an augmented Gaussian process, termed reward-sensitive Gaussian process (RSGP). In a nutshell, the temporal covariance of the timing variable was updated based on the feedback history to recreate the two behavioral characteristics mentioned above. However, the RSGP was mainly descriptive and lacked a neurobiological basis of how the reward feedback can be used by a neural circuit to adjust motor variability. Here we provide a mechanistic model and simulate the process by borrowing the architecture of recurrent neural networks (RNNs). While recurrent connection provided the long-term serial correlation in motor timing, to facilitate reward-driven short-term variations, we introduced reward-dependent variability in the network connectivity, inspired by the stochastic nature of synaptic transmission in the brain. Our model was able to recursively generate an output sequence incorporating internal variability and external reinforcement in a Bayesian framework. We show that the model can generate the temporal structure of the motor variability as a basis for exploration and exploitation trade-off. Unlike other neural network models that search for unique network connectivity for the best match between the model prediction and observation, this model can estimate the uncertainty associated with each outcome and thus did a better job in teasing apart adjustable task-relevant variability from unexplained variability. The proposed artificial neural network model parallels the mechanisms of information processing in neural systems and can extend the framework of brain-inspired reinforcement learning (RL) in continuous state control.
format	Online Article Text
id	pubmed-9579423
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-95794232022-10-20 A neural network model for timing control with reinforcement Wang, Jing El-Jayyousi, Yousuf Ozden, Ilker Front Comput Neurosci Neuroscience How do humans and animals perform trial-and-error learning when the space of possibilities is infinite? In a previous study, we used an interval timing production task and discovered an updating strategy in which the agent adjusted the behavioral and neuronal noise for exploration. In the experiment, human subjects proactively generated a series of timed motor outputs. Positive or negative feedback was provided after each response based on the timing accuracy. We found that the sequential motor timing varied at two temporal scales: long-term correlation around the target interval due to memory drifts and short-term adjustments of timing variability according to feedback. We have previously described these two key features of timing variability with an augmented Gaussian process, termed reward-sensitive Gaussian process (RSGP). In a nutshell, the temporal covariance of the timing variable was updated based on the feedback history to recreate the two behavioral characteristics mentioned above. However, the RSGP was mainly descriptive and lacked a neurobiological basis of how the reward feedback can be used by a neural circuit to adjust motor variability. Here we provide a mechanistic model and simulate the process by borrowing the architecture of recurrent neural networks (RNNs). While recurrent connection provided the long-term serial correlation in motor timing, to facilitate reward-driven short-term variations, we introduced reward-dependent variability in the network connectivity, inspired by the stochastic nature of synaptic transmission in the brain. Our model was able to recursively generate an output sequence incorporating internal variability and external reinforcement in a Bayesian framework. We show that the model can generate the temporal structure of the motor variability as a basis for exploration and exploitation trade-off. Unlike other neural network models that search for unique network connectivity for the best match between the model prediction and observation, this model can estimate the uncertainty associated with each outcome and thus did a better job in teasing apart adjustable task-relevant variability from unexplained variability. The proposed artificial neural network model parallels the mechanisms of information processing in neural systems and can extend the framework of brain-inspired reinforcement learning (RL) in continuous state control. Frontiers Media S.A. 2022-10-05 /pmc/articles/PMC9579423/ /pubmed/36277612 http://dx.doi.org/10.3389/fncom.2022.918031 Text en Copyright © 2022 Wang, El-Jayyousi and Ozden. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Wang, Jing El-Jayyousi, Yousuf Ozden, Ilker A neural network model for timing control with reinforcement
title	A neural network model for timing control with reinforcement
title_full	A neural network model for timing control with reinforcement
title_fullStr	A neural network model for timing control with reinforcement
title_full_unstemmed	A neural network model for timing control with reinforcement
title_short	A neural network model for timing control with reinforcement
title_sort	neural network model for timing control with reinforcement
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9579423/ https://www.ncbi.nlm.nih.gov/pubmed/36277612 http://dx.doi.org/10.3389/fncom.2022.918031
work_keys_str_mv	AT wangjing aneuralnetworkmodelfortimingcontrolwithreinforcement AT eljayyousiyousuf aneuralnetworkmodelfortimingcontrolwithreinforcement AT ozdenilker aneuralnetworkmodelfortimingcontrolwithreinforcement AT wangjing neuralnetworkmodelfortimingcontrolwithreinforcement AT eljayyousiyousuf neuralnetworkmodelfortimingcontrolwithreinforcement AT ozdenilker neuralnetworkmodelfortimingcontrolwithreinforcement

A neural network model for timing control with reinforcement

Ejemplares similares