Cargando…

Prioritized experience replay in path planning via multi-dimensional transition priority fusion

INTRODUCTION: Deep deterministic policy gradient (DDPG)-based path planning algorithms for intelligent robots struggle to discern the value of experience transitions during training due to their reliance on a random experience replay. This can lead to inappropriate sampling of experience transitions...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cheng, Nuo, Wang, Peng, Zhang, Guangyuan, Ni, Cui, Nematov, Erkin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684733/ https://www.ncbi.nlm.nih.gov/pubmed/38034838 http://dx.doi.org/10.3389/fnbot.2023.1281166

_version_	1785151471510093824
author	Cheng, Nuo Wang, Peng Zhang, Guangyuan Ni, Cui Nematov, Erkin
author_facet	Cheng, Nuo Wang, Peng Zhang, Guangyuan Ni, Cui Nematov, Erkin
author_sort	Cheng, Nuo
collection	PubMed
description	INTRODUCTION: Deep deterministic policy gradient (DDPG)-based path planning algorithms for intelligent robots struggle to discern the value of experience transitions during training due to their reliance on a random experience replay. This can lead to inappropriate sampling of experience transitions and overemphasis on edge experience transitions. As a result, the algorithm's convergence becomes slower, and the success rate of path planning diminishes. METHODS: We comprehensively examines the impacts of immediate reward, temporal-difference error (TD-error), and Actor network loss function on the training process. It calculates experience transition priorities based on these three factors. Subsequently, using information entropy as a weight, the three calculated priorities are merged to determine the final priority of the experience transition. In addition, we introduce a method for adaptively adjusting the priority of positive experience transitions to focus on positive experience transitions and maintain a balanced distribution. Finally, the sampling probability of each experience transition is derived from its respective priority. RESULTS: The experimental results showed that the test time of our method is shorter than that of PER algorithm, and the number of collisions with obstacles is less. It indicated that the determined experience transition priority accurately gauges the significance of distinct experience transitions for path planning algorithm training. DISCUSSION: This method enhances the utilization rate of transition conversion and the convergence speed of the algorithm and also improves the success rate of path planning.
format	Online Article Text
id	pubmed-10684733
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-106847332023-11-30 Prioritized experience replay in path planning via multi-dimensional transition priority fusion Cheng, Nuo Wang, Peng Zhang, Guangyuan Ni, Cui Nematov, Erkin Front Neurorobot Neuroscience INTRODUCTION: Deep deterministic policy gradient (DDPG)-based path planning algorithms for intelligent robots struggle to discern the value of experience transitions during training due to their reliance on a random experience replay. This can lead to inappropriate sampling of experience transitions and overemphasis on edge experience transitions. As a result, the algorithm's convergence becomes slower, and the success rate of path planning diminishes. METHODS: We comprehensively examines the impacts of immediate reward, temporal-difference error (TD-error), and Actor network loss function on the training process. It calculates experience transition priorities based on these three factors. Subsequently, using information entropy as a weight, the three calculated priorities are merged to determine the final priority of the experience transition. In addition, we introduce a method for adaptively adjusting the priority of positive experience transitions to focus on positive experience transitions and maintain a balanced distribution. Finally, the sampling probability of each experience transition is derived from its respective priority. RESULTS: The experimental results showed that the test time of our method is shorter than that of PER algorithm, and the number of collisions with obstacles is less. It indicated that the determined experience transition priority accurately gauges the significance of distinct experience transitions for path planning algorithm training. DISCUSSION: This method enhances the utilization rate of transition conversion and the convergence speed of the algorithm and also improves the success rate of path planning. Frontiers Media S.A. 2023-11-15 /pmc/articles/PMC10684733/ /pubmed/38034838 http://dx.doi.org/10.3389/fnbot.2023.1281166 Text en Copyright © 2023 Cheng, Wang, Zhang, Ni and Nematov. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Cheng, Nuo Wang, Peng Zhang, Guangyuan Ni, Cui Nematov, Erkin Prioritized experience replay in path planning via multi-dimensional transition priority fusion
title	Prioritized experience replay in path planning via multi-dimensional transition priority fusion
title_full	Prioritized experience replay in path planning via multi-dimensional transition priority fusion
title_fullStr	Prioritized experience replay in path planning via multi-dimensional transition priority fusion
title_full_unstemmed	Prioritized experience replay in path planning via multi-dimensional transition priority fusion
title_short	Prioritized experience replay in path planning via multi-dimensional transition priority fusion
title_sort	prioritized experience replay in path planning via multi-dimensional transition priority fusion
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684733/ https://www.ncbi.nlm.nih.gov/pubmed/38034838 http://dx.doi.org/10.3389/fnbot.2023.1281166
work_keys_str_mv	AT chengnuo prioritizedexperiencereplayinpathplanningviamultidimensionaltransitionpriorityfusion AT wangpeng prioritizedexperiencereplayinpathplanningviamultidimensionaltransitionpriorityfusion AT zhangguangyuan prioritizedexperiencereplayinpathplanningviamultidimensionaltransitionpriorityfusion AT nicui prioritizedexperiencereplayinpathplanningviamultidimensionaltransitionpriorityfusion AT nematoverkin prioritizedexperiencereplayinpathplanningviamultidimensionaltransitionpriorityfusion

Prioritized experience replay in path planning via multi-dimensional transition priority fusion

Ejemplares similares