Cargando…
Diversity Evolutionary Policy Deep Reinforcement Learning
The reinforcement learning algorithms based on policy gradient may fall into local optimal due to gradient disappearance during the update process, which in turn affects the exploration ability of the reinforcement learning agent. In order to solve the above problem, in this paper, the cross-entropy...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8357468/ https://www.ncbi.nlm.nih.gov/pubmed/34394336 http://dx.doi.org/10.1155/2021/5300189 |
_version_ | 1783737133282885632 |
---|---|
author | Liu, Jian Feng, Liming |
author_facet | Liu, Jian Feng, Liming |
author_sort | Liu, Jian |
collection | PubMed |
description | The reinforcement learning algorithms based on policy gradient may fall into local optimal due to gradient disappearance during the update process, which in turn affects the exploration ability of the reinforcement learning agent. In order to solve the above problem, in this paper, the cross-entropy method (CEM) in evolution policy, maximum mean difference (MMD), and twin delayed deep deterministic policy gradient algorithm (TD3) are combined to propose a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm. By using the maximum mean discrepancy as a measure of the distance between different policies, some of the policies in the population maximize the distance between them and the previous generation of policies while maximizing the cumulative return during the gradient update. Furthermore, combining the cumulative returns and the distance between policies as the fitness of the population encourages more diversity in the offspring policies, which in turn can reduce the risk of falling into local optimal due to the disappearance of the gradient. The results in the MuJoCo test environment show that DEPRL has achieved excellent performance on continuous control tasks; especially in the Ant-v2 environment, the return of DEPRL ultimately achieved a nearly 20% improvement compared to TD3. |
format | Online Article Text |
id | pubmed-8357468 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-83574682021-08-12 Diversity Evolutionary Policy Deep Reinforcement Learning Liu, Jian Feng, Liming Comput Intell Neurosci Research Article The reinforcement learning algorithms based on policy gradient may fall into local optimal due to gradient disappearance during the update process, which in turn affects the exploration ability of the reinforcement learning agent. In order to solve the above problem, in this paper, the cross-entropy method (CEM) in evolution policy, maximum mean difference (MMD), and twin delayed deep deterministic policy gradient algorithm (TD3) are combined to propose a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm. By using the maximum mean discrepancy as a measure of the distance between different policies, some of the policies in the population maximize the distance between them and the previous generation of policies while maximizing the cumulative return during the gradient update. Furthermore, combining the cumulative returns and the distance between policies as the fitness of the population encourages more diversity in the offspring policies, which in turn can reduce the risk of falling into local optimal due to the disappearance of the gradient. The results in the MuJoCo test environment show that DEPRL has achieved excellent performance on continuous control tasks; especially in the Ant-v2 environment, the return of DEPRL ultimately achieved a nearly 20% improvement compared to TD3. Hindawi 2021-08-03 /pmc/articles/PMC8357468/ /pubmed/34394336 http://dx.doi.org/10.1155/2021/5300189 Text en Copyright © 2021 Jian Liu and Liming Feng. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Liu, Jian Feng, Liming Diversity Evolutionary Policy Deep Reinforcement Learning |
title | Diversity Evolutionary Policy Deep Reinforcement Learning |
title_full | Diversity Evolutionary Policy Deep Reinforcement Learning |
title_fullStr | Diversity Evolutionary Policy Deep Reinforcement Learning |
title_full_unstemmed | Diversity Evolutionary Policy Deep Reinforcement Learning |
title_short | Diversity Evolutionary Policy Deep Reinforcement Learning |
title_sort | diversity evolutionary policy deep reinforcement learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8357468/ https://www.ncbi.nlm.nih.gov/pubmed/34394336 http://dx.doi.org/10.1155/2021/5300189 |
work_keys_str_mv | AT liujian diversityevolutionarypolicydeepreinforcementlearning AT fengliming diversityevolutionarypolicydeepreinforcementlearning |