Cargando…
A hierarchical reinforcement learning method for missile evasion and guidance
This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9640633/ https://www.ncbi.nlm.nih.gov/pubmed/36344598 http://dx.doi.org/10.1038/s41598-022-21756-6 |
_version_ | 1784825899526389760 |
---|---|
author | Yan, Mengda Yang, Rennong Zhang, Ying Yue, Longfei Hu, Dongyuan |
author_facet | Yan, Mengda Yang, Rennong Zhang, Ying Yue, Longfei Hu, Dongyuan |
author_sort | Yan, Mengda |
collection | PubMed |
description | This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer structure, in which low-level agents control basic actions and are controlled by a high-level agent. The low level has two agents called a guidance agent and an evasion agent, which are trained in simple scenarios and embedded in the high-level agent. The high level has a policy selector agent, which chooses one of the low-level agents to activate at each decision moment. The reward functions for each agent are different, considering the guidance accuracy, flight time, and energy consumption metrics, as well as a field-of-view constraint. Simulation shows that the PPO algorithm without a hierarchical structure cannot complete the task, while the hierarchical PPO algorithm has a 100% success rate on a test dataset. The agent shows good adaptability and strong robustness to the second-order lag of autopilot and measurement noises. Compared with a traditional guidance law, the reinforcement learning guidance law has satisfactory guidance accuracy and significant advantages in average time and average energy consumption. |
format | Online Article Text |
id | pubmed-9640633 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-96406332022-11-15 A hierarchical reinforcement learning method for missile evasion and guidance Yan, Mengda Yang, Rennong Zhang, Ying Yue, Longfei Hu, Dongyuan Sci Rep Article This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer structure, in which low-level agents control basic actions and are controlled by a high-level agent. The low level has two agents called a guidance agent and an evasion agent, which are trained in simple scenarios and embedded in the high-level agent. The high level has a policy selector agent, which chooses one of the low-level agents to activate at each decision moment. The reward functions for each agent are different, considering the guidance accuracy, flight time, and energy consumption metrics, as well as a field-of-view constraint. Simulation shows that the PPO algorithm without a hierarchical structure cannot complete the task, while the hierarchical PPO algorithm has a 100% success rate on a test dataset. The agent shows good adaptability and strong robustness to the second-order lag of autopilot and measurement noises. Compared with a traditional guidance law, the reinforcement learning guidance law has satisfactory guidance accuracy and significant advantages in average time and average energy consumption. Nature Publishing Group UK 2022-11-07 /pmc/articles/PMC9640633/ /pubmed/36344598 http://dx.doi.org/10.1038/s41598-022-21756-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Yan, Mengda Yang, Rennong Zhang, Ying Yue, Longfei Hu, Dongyuan A hierarchical reinforcement learning method for missile evasion and guidance |
title | A hierarchical reinforcement learning method for missile evasion and guidance |
title_full | A hierarchical reinforcement learning method for missile evasion and guidance |
title_fullStr | A hierarchical reinforcement learning method for missile evasion and guidance |
title_full_unstemmed | A hierarchical reinforcement learning method for missile evasion and guidance |
title_short | A hierarchical reinforcement learning method for missile evasion and guidance |
title_sort | hierarchical reinforcement learning method for missile evasion and guidance |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9640633/ https://www.ncbi.nlm.nih.gov/pubmed/36344598 http://dx.doi.org/10.1038/s41598-022-21756-6 |
work_keys_str_mv | AT yanmengda ahierarchicalreinforcementlearningmethodformissileevasionandguidance AT yangrennong ahierarchicalreinforcementlearningmethodformissileevasionandguidance AT zhangying ahierarchicalreinforcementlearningmethodformissileevasionandguidance AT yuelongfei ahierarchicalreinforcementlearningmethodformissileevasionandguidance AT hudongyuan ahierarchicalreinforcementlearningmethodformissileevasionandguidance AT yanmengda hierarchicalreinforcementlearningmethodformissileevasionandguidance AT yangrennong hierarchicalreinforcementlearningmethodformissileevasionandguidance AT zhangying hierarchicalreinforcementlearningmethodformissileevasionandguidance AT yuelongfei hierarchicalreinforcementlearningmethodformissileevasionandguidance AT hudongyuan hierarchicalreinforcementlearningmethodformissileevasionandguidance |