Cargando…

A hierarchical reinforcement learning method for missile evasion and guidance

This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Mengda, Yang, Rennong, Zhang, Ying, Yue, Longfei, Hu, Dongyuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9640633/
https://www.ncbi.nlm.nih.gov/pubmed/36344598
http://dx.doi.org/10.1038/s41598-022-21756-6
_version_ 1784825899526389760
author Yan, Mengda
Yang, Rennong
Zhang, Ying
Yue, Longfei
Hu, Dongyuan
author_facet Yan, Mengda
Yang, Rennong
Zhang, Ying
Yue, Longfei
Hu, Dongyuan
author_sort Yan, Mengda
collection PubMed
description This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer structure, in which low-level agents control basic actions and are controlled by a high-level agent. The low level has two agents called a guidance agent and an evasion agent, which are trained in simple scenarios and embedded in the high-level agent. The high level has a policy selector agent, which chooses one of the low-level agents to activate at each decision moment. The reward functions for each agent are different, considering the guidance accuracy, flight time, and energy consumption metrics, as well as a field-of-view constraint. Simulation shows that the PPO algorithm without a hierarchical structure cannot complete the task, while the hierarchical PPO algorithm has a 100% success rate on a test dataset. The agent shows good adaptability and strong robustness to the second-order lag of autopilot and measurement noises. Compared with a traditional guidance law, the reinforcement learning guidance law has satisfactory guidance accuracy and significant advantages in average time and average energy consumption.
format Online
Article
Text
id pubmed-9640633
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96406332022-11-15 A hierarchical reinforcement learning method for missile evasion and guidance Yan, Mengda Yang, Rennong Zhang, Ying Yue, Longfei Hu, Dongyuan Sci Rep Article This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer structure, in which low-level agents control basic actions and are controlled by a high-level agent. The low level has two agents called a guidance agent and an evasion agent, which are trained in simple scenarios and embedded in the high-level agent. The high level has a policy selector agent, which chooses one of the low-level agents to activate at each decision moment. The reward functions for each agent are different, considering the guidance accuracy, flight time, and energy consumption metrics, as well as a field-of-view constraint. Simulation shows that the PPO algorithm without a hierarchical structure cannot complete the task, while the hierarchical PPO algorithm has a 100% success rate on a test dataset. The agent shows good adaptability and strong robustness to the second-order lag of autopilot and measurement noises. Compared with a traditional guidance law, the reinforcement learning guidance law has satisfactory guidance accuracy and significant advantages in average time and average energy consumption. Nature Publishing Group UK 2022-11-07 /pmc/articles/PMC9640633/ /pubmed/36344598 http://dx.doi.org/10.1038/s41598-022-21756-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Yan, Mengda
Yang, Rennong
Zhang, Ying
Yue, Longfei
Hu, Dongyuan
A hierarchical reinforcement learning method for missile evasion and guidance
title A hierarchical reinforcement learning method for missile evasion and guidance
title_full A hierarchical reinforcement learning method for missile evasion and guidance
title_fullStr A hierarchical reinforcement learning method for missile evasion and guidance
title_full_unstemmed A hierarchical reinforcement learning method for missile evasion and guidance
title_short A hierarchical reinforcement learning method for missile evasion and guidance
title_sort hierarchical reinforcement learning method for missile evasion and guidance
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9640633/
https://www.ncbi.nlm.nih.gov/pubmed/36344598
http://dx.doi.org/10.1038/s41598-022-21756-6
work_keys_str_mv AT yanmengda ahierarchicalreinforcementlearningmethodformissileevasionandguidance
AT yangrennong ahierarchicalreinforcementlearningmethodformissileevasionandguidance
AT zhangying ahierarchicalreinforcementlearningmethodformissileevasionandguidance
AT yuelongfei ahierarchicalreinforcementlearningmethodformissileevasionandguidance
AT hudongyuan ahierarchicalreinforcementlearningmethodformissileevasionandguidance
AT yanmengda hierarchicalreinforcementlearningmethodformissileevasionandguidance
AT yangrennong hierarchicalreinforcementlearningmethodformissileevasionandguidance
AT zhangying hierarchicalreinforcementlearningmethodformissileevasionandguidance
AT yuelongfei hierarchicalreinforcementlearningmethodformissileevasionandguidance
AT hudongyuan hierarchicalreinforcementlearningmethodformissileevasionandguidance