Cargando…

End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function

An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor–critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a non...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jeng, Shyr-Long, Chiang, Chienhsun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10610759/ https://www.ncbi.nlm.nih.gov/pubmed/37896743 http://dx.doi.org/10.3390/s23208651

_version_	1785128332426215424
author	Jeng, Shyr-Long Chiang, Chienhsun
author_facet	Jeng, Shyr-Long Chiang, Chienhsun
author_sort	Jeng, Shyr-Long
collection	PubMed
description	An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor–critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a nonholonomic wheeled mobile robot (WMR) to perform navigation in dynamic environments containing obstacles and for which no maps are available. A comprehensive reward based on the survival penalty function is introduced; this approach effectively solves the sparse reward problem and enables the WMR to move toward its target. Consecutive episodes are connected to increase the cumulative penalty for scenarios involving obstacles; this method prevents training failure and enables the WMR to plan a collision-free path. Simulations are conducted for four scenarios—movement in an obstacle-free space, in a parking lot, at an intersection without and with a central obstacle, and in a multiple obstacle space—to demonstrate the efficiency and operational safety of our method. For the same navigation environment, compared with the DDPG algorithm, the TD3 algorithm exhibits faster numerical convergence and higher stability in the training phase, as well as a higher task execution success rate in the evaluation phase.
format	Online Article Text
id	pubmed-10610759
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106107592023-10-28 End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function Jeng, Shyr-Long Chiang, Chienhsun Sensors (Basel) Article An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor–critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a nonholonomic wheeled mobile robot (WMR) to perform navigation in dynamic environments containing obstacles and for which no maps are available. A comprehensive reward based on the survival penalty function is introduced; this approach effectively solves the sparse reward problem and enables the WMR to move toward its target. Consecutive episodes are connected to increase the cumulative penalty for scenarios involving obstacles; this method prevents training failure and enables the WMR to plan a collision-free path. Simulations are conducted for four scenarios—movement in an obstacle-free space, in a parking lot, at an intersection without and with a central obstacle, and in a multiple obstacle space—to demonstrate the efficiency and operational safety of our method. For the same navigation environment, compared with the DDPG algorithm, the TD3 algorithm exhibits faster numerical convergence and higher stability in the training phase, as well as a higher task execution success rate in the evaluation phase. MDPI 2023-10-23 /pmc/articles/PMC10610759/ /pubmed/37896743 http://dx.doi.org/10.3390/s23208651 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Jeng, Shyr-Long Chiang, Chienhsun End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function
title	End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function
title_full	End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function
title_fullStr	End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function
title_full_unstemmed	End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function
title_short	End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function
title_sort	end-to-end autonomous navigation based on deep reinforcement learning with a survival penalty function
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10610759/ https://www.ncbi.nlm.nih.gov/pubmed/37896743 http://dx.doi.org/10.3390/s23208651
work_keys_str_mv	AT jengshyrlong endtoendautonomousnavigationbasedondeepreinforcementlearningwithasurvivalpenaltyfunction AT chiangchienhsun endtoendautonomousnavigationbasedondeepreinforcementlearningwithasurvivalpenaltyfunction

End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function

Ejemplares similares