Cargando…

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Weiwei, Chu, Hairong, Miao, Xikui, Guo, Lihong, Shen, Honghai, Zhu, Chenhao, Zhang, Feng, Liang, Dongxin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471982/ https://www.ncbi.nlm.nih.gov/pubmed/32823783 http://dx.doi.org/10.3390/s20164546

_version_	1783578884278583296
author	Zhao, Weiwei Chu, Hairong Miao, Xikui Guo, Lihong Shen, Honghai Zhu, Chenhao Zhang, Feng Liang, Dongxin
author_facet	Zhao, Weiwei Chu, Hairong Miao, Xikui Guo, Lihong Shen, Honghai Zhu, Chenhao Zhang, Feng Liang, Dongxin
author_sort	Zhao, Weiwei
collection	PubMed
description	Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.
format	Online Article Text
id	pubmed-7471982
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-74719822020-09-17 Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance Zhao, Weiwei Chu, Hairong Miao, Xikui Guo, Lihong Shen, Honghai Zhu, Chenhao Zhang, Feng Liang, Dongxin Sensors (Basel) Article Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability. MDPI 2020-08-13 /pmc/articles/PMC7471982/ /pubmed/32823783 http://dx.doi.org/10.3390/s20164546 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhao, Weiwei Chu, Hairong Miao, Xikui Guo, Lihong Shen, Honghai Zhu, Chenhao Zhang, Feng Liang, Dongxin Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title	Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_full	Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_fullStr	Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_full_unstemmed	Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_short	Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
title_sort	research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing uav obstacle avoidance
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471982/ https://www.ncbi.nlm.nih.gov/pubmed/32823783 http://dx.doi.org/10.3390/s20164546
work_keys_str_mv	AT zhaoweiwei researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance AT chuhairong researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance AT miaoxikui researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance AT guolihong researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance AT shenhonghai researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance AT zhuchenhao researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance AT zhangfeng researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance AT liangdongxin researchonthemultiagentjointproximalpolicyoptimizationalgorithmcontrollingcooperativefixedwinguavobstacleavoidance

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

Ejemplares similares