Cargando…

Minibatch Recursive Least Squares Q-Learning

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Chunyuan, Song, Qi, Meng, Zeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519689/ https://www.ncbi.nlm.nih.gov/pubmed/34659393 http://dx.doi.org/10.1155/2021/5370281

_version_	1784584501832187904
author	Zhang, Chunyuan Song, Qi Meng, Zeng
author_facet	Zhang, Chunyuan Song, Qi Meng, Zeng
author_sort	Zhang, Chunyuan
collection	PubMed
description	The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent's states rather than the agent's state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.
format	Online Article Text
id	pubmed-8519689
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-85196892021-10-16 Minibatch Recursive Least Squares Q-Learning Zhang, Chunyuan Song, Qi Meng, Zeng Comput Intell Neurosci Research Article The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent's states rather than the agent's state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally. Hindawi 2021-10-08 /pmc/articles/PMC8519689/ /pubmed/34659393 http://dx.doi.org/10.1155/2021/5370281 Text en Copyright © 2021 Chunyuan Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Zhang, Chunyuan Song, Qi Meng, Zeng Minibatch Recursive Least Squares Q-Learning
title	Minibatch Recursive Least Squares Q-Learning
title_full	Minibatch Recursive Least Squares Q-Learning
title_fullStr	Minibatch Recursive Least Squares Q-Learning
title_full_unstemmed	Minibatch Recursive Least Squares Q-Learning
title_short	Minibatch Recursive Least Squares Q-Learning
title_sort	minibatch recursive least squares q-learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519689/ https://www.ncbi.nlm.nih.gov/pubmed/34659393 http://dx.doi.org/10.1155/2021/5370281
work_keys_str_mv	AT zhangchunyuan minibatchrecursiveleastsquaresqlearning AT songqi minibatchrecursiveleastsquaresqlearning AT mengzeng minibatchrecursiveleastsquaresqlearning

Minibatch Recursive Least Squares Q-Learning

Ejemplares similares