Cargando…

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ohnishi, Shota, Uchibe, Eiji, Yamaguchi, Yotaro, Nakanishi, Kosuke, Yasui, Yuji, Ishii, Shin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2019
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914867/ https://www.ncbi.nlm.nih.gov/pubmed/31920613 http://dx.doi.org/10.3389/fnbot.2019.00103

_version_	1783479900100886528
author	Ohnishi, Shota Uchibe, Eiji Yamaguchi, Yotaro Nakanishi, Kosuke Yasui, Yuji Ishii, Shin
author_facet	Ohnishi, Shota Uchibe, Eiji Yamaguchi, Yotaro Nakanishi, Kosuke Yasui, Yuji Ishii, Shin
author_sort	Ohnishi, Shota
collection	PubMed
description	A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods.
format	Online Article Text
id	pubmed-6914867
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-69148672020-01-09 Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning Ohnishi, Shota Uchibe, Eiji Yamaguchi, Yotaro Nakanishi, Kosuke Yasui, Yuji Ishii, Shin Front Neurorobot Neuroscience A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods. Frontiers Media S.A. 2019-12-10 /pmc/articles/PMC6914867/ /pubmed/31920613 http://dx.doi.org/10.3389/fnbot.2019.00103 Text en Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Ohnishi, Shota Uchibe, Eiji Yamaguchi, Yotaro Nakanishi, Kosuke Yasui, Yuji Ishii, Shin Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_full	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_fullStr	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_full_unstemmed	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_short	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_sort	constrained deep q-learning gradually approaching ordinary q-learning
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914867/ https://www.ncbi.nlm.nih.gov/pubmed/31920613 http://dx.doi.org/10.3389/fnbot.2019.00103
work_keys_str_mv	AT ohnishishota constraineddeepqlearninggraduallyapproachingordinaryqlearning AT uchibeeiji constraineddeepqlearninggraduallyapproachingordinaryqlearning AT yamaguchiyotaro constraineddeepqlearninggraduallyapproachingordinaryqlearning AT nakanishikosuke constraineddeepqlearninggraduallyapproachingordinaryqlearning AT yasuiyuji constraineddeepqlearninggraduallyapproachingordinaryqlearning AT ishiishin constraineddeepqlearninggraduallyapproachingordinaryqlearning

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

Ejemplares similares