Cargando…

Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty

Reinforcement learning (RL) trains an agent by maximizing the sum of a discounted reward. Since the discount factor has a critical effect on the learning performance of the RL agent, it is important to choose the discount factor properly. When uncertainties are involved in the training, the learning...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, MyeongSeop, Kim, Jung-Su, Choi, Myoung-Su, Park, Jae-Han
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570626/ https://www.ncbi.nlm.nih.gov/pubmed/36236366 http://dx.doi.org/10.3390/s22197266

_version_	1784810157348225024
author	Kim, MyeongSeop Kim, Jung-Su Choi, Myoung-Su Park, Jae-Han
author_facet	Kim, MyeongSeop Kim, Jung-Su Choi, Myoung-Su Park, Jae-Han
author_sort	Kim, MyeongSeop
collection	PubMed
description	Reinforcement learning (RL) trains an agent by maximizing the sum of a discounted reward. Since the discount factor has a critical effect on the learning performance of the RL agent, it is important to choose the discount factor properly. When uncertainties are involved in the training, the learning performance with a constant discount factor can be limited. For the purpose of obtaining acceptable learning performance consistently, this paper proposes an adaptive rule for the discount factor based on the advantage function. Additionally, how to use the advantage function in both on-policy and off-policy algorithms is presented. To demonstrate the performance of the proposed adaptive rule, it is applied to PPO (Proximal Policy Optimization) for Tetris in order to validate the on-policy case, and to SAC (Soft Actor-Critic) for the motion planning of a robot manipulator to validate the off-policy case. In both cases, the proposed method results in a better or similar performance compared with cases using the best constant discount factors found by exhaustive search. Hence, the proposed adaptive discount factor automatically finds a discount factor that leads to comparable training performance, and that can be applied to representative deep reinforcement learning problems.
format	Online Article Text
id	pubmed-9570626
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-95706262022-10-17 Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty Kim, MyeongSeop Kim, Jung-Su Choi, Myoung-Su Park, Jae-Han Sensors (Basel) Article Reinforcement learning (RL) trains an agent by maximizing the sum of a discounted reward. Since the discount factor has a critical effect on the learning performance of the RL agent, it is important to choose the discount factor properly. When uncertainties are involved in the training, the learning performance with a constant discount factor can be limited. For the purpose of obtaining acceptable learning performance consistently, this paper proposes an adaptive rule for the discount factor based on the advantage function. Additionally, how to use the advantage function in both on-policy and off-policy algorithms is presented. To demonstrate the performance of the proposed adaptive rule, it is applied to PPO (Proximal Policy Optimization) for Tetris in order to validate the on-policy case, and to SAC (Soft Actor-Critic) for the motion planning of a robot manipulator to validate the off-policy case. In both cases, the proposed method results in a better or similar performance compared with cases using the best constant discount factors found by exhaustive search. Hence, the proposed adaptive discount factor automatically finds a discount factor that leads to comparable training performance, and that can be applied to representative deep reinforcement learning problems. MDPI 2022-09-25 /pmc/articles/PMC9570626/ /pubmed/36236366 http://dx.doi.org/10.3390/s22197266 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kim, MyeongSeop Kim, Jung-Su Choi, Myoung-Su Park, Jae-Han Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty
title	Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty
title_full	Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty
title_fullStr	Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty
title_full_unstemmed	Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty
title_short	Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty
title_sort	adaptive discount factor for deep reinforcement learning in continuing tasks with uncertainty
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9570626/ https://www.ncbi.nlm.nih.gov/pubmed/36236366 http://dx.doi.org/10.3390/s22197266
work_keys_str_mv	AT kimmyeongseop adaptivediscountfactorfordeepreinforcementlearningincontinuingtaskswithuncertainty AT kimjungsu adaptivediscountfactorfordeepreinforcementlearningincontinuingtaskswithuncertainty AT choimyoungsu adaptivediscountfactorfordeepreinforcementlearningincontinuingtaskswithuncertainty AT parkjaehan adaptivediscountfactorfordeepreinforcementlearningincontinuingtaskswithuncertainty

Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty

Ejemplares similares