Cargando…

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DD...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Haifei, Xu, Jian, Zhang, Jian, Liu, Quan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699738/ https://www.ncbi.nlm.nih.gov/pubmed/36438689 http://dx.doi.org/10.1155/2022/1117781

_version_	1784839147624595456
author	Zhang, Haifei Xu, Jian Zhang, Jian Liu, Quan
author_facet	Zhang, Haifei Xu, Jian Zhang, Jian Liu, Quan
author_sort	Zhang, Haifei
collection	PubMed
description	The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability of local optimal phenomenon; then, introduce the idea of dual-actor network to alleviate the underestimation of value generated by dual-evaluator network, and select the action with the greatest value in the two-actor networks to update to stabilize the training of the algorithm process. Finally, the improved method is validated on four continuous action tasks provided by MuJoCo, and the results show that the improved method can reduce the fluctuation range of error and improve the cumulative return compared with the classical algorithm.
format	Online Article Text
id	pubmed-9699738
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-96997382022-11-26 Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms Zhang, Haifei Xu, Jian Zhang, Jian Liu, Quan Comput Intell Neurosci Research Article The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability of local optimal phenomenon; then, introduce the idea of dual-actor network to alleviate the underestimation of value generated by dual-evaluator network, and select the action with the greatest value in the two-actor networks to update to stabilize the training of the algorithm process. Finally, the improved method is validated on four continuous action tasks provided by MuJoCo, and the results show that the improved method can reduce the fluctuation range of error and improve the cumulative return compared with the classical algorithm. Hindawi 2022-11-18 /pmc/articles/PMC9699738/ /pubmed/36438689 http://dx.doi.org/10.1155/2022/1117781 Text en Copyright © 2022 Haifei Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Zhang, Haifei Xu, Jian Zhang, Jian Liu, Quan Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
title	Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
title_full	Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
title_fullStr	Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
title_full_unstemmed	Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
title_short	Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms
title_sort	network architecture for optimizing deep deterministic policy gradient algorithms
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699738/ https://www.ncbi.nlm.nih.gov/pubmed/36438689 http://dx.doi.org/10.1155/2022/1117781
work_keys_str_mv	AT zhanghaifei networkarchitectureforoptimizingdeepdeterministicpolicygradientalgorithms AT xujian networkarchitectureforoptimizingdeepdeterministicpolicygradientalgorithms AT zhangjian networkarchitectureforoptimizingdeepdeterministicpolicygradientalgorithms AT liuquan networkarchitectureforoptimizingdeepdeterministicpolicygradientalgorithms

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Ejemplares similares