Cargando…

Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient

There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cao, Junjie, Liu, Weiwei, Liu, Yong, Yang, Jian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7188386/ https://www.ncbi.nlm.nih.gov/pubmed/32372940 http://dx.doi.org/10.3389/fnbot.2020.00021

_version_	1783527303343505408
author	Cao, Junjie Liu, Weiwei Liu, Yong Yang, Jian
author_facet	Cao, Junjie Liu, Weiwei Liu, Yong Yang, Jian
author_sort	Cao, Junjie
collection	PubMed
description	There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided.
format	Online Article Text
id	pubmed-7188386
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-71883862020-05-05 Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient Cao, Junjie Liu, Weiwei Liu, Yong Yang, Jian Front Neurorobot Neuroscience There has been substantial growth in research on the robot automation, which aims to make robots capable of directly interacting with the world or human. Robot learning for automation from human demonstration is central to such situation. However, the dependence of demonstration restricts robot to a fixed scenario, without the ability to explore in variant situations to accomplish the same task as in demonstration. Deep reinforcement learning methods may be a good method to make robot learning beyond human demonstration and fulfilling the task in unknown situations. The exploration is the core of such generalization to different environments. While the exploration in reinforcement learning may be ineffective and suffer from the problem of low sample efficiency. In this paper, we present Evolutionary Policy Gradient (EPG) to make robot learn from demonstration and perform goal oriented exploration efficiently. Through goal oriented exploration, our method can generalize robot learned skill to environments with different parameters. Our Evolutionary Policy Gradient combines parameter perturbation with policy gradient method in the framework of Evolutionary Algorithms (EAs) and can fuse the benefits of both, achieving effective and efficient exploration. With demonstration guiding the evolutionary process, robot can accelerate the goal oriented exploration to generalize its capability to variant scenarios. The experiments, carried out in robot control tasks in OpenAI Gym with dense and sparse rewards, show that our EPG is able to provide competitive performance over the original policy gradient methods and EAs. In the manipulator task, our robot can learn to open the door with vision in environments which are different from where the demonstrations are provided. Frontiers Media S.A. 2020-04-21 /pmc/articles/PMC7188386/ /pubmed/32372940 http://dx.doi.org/10.3389/fnbot.2020.00021 Text en Copyright © 2020 Cao, Liu, Liu and Yang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Cao, Junjie Liu, Weiwei Liu, Yong Yang, Jian Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient
title	Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient
title_full	Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient
title_fullStr	Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient
title_full_unstemmed	Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient
title_short	Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient
title_sort	generalize robot learning from demonstration to variant scenarios with evolutionary policy gradient
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7188386/ https://www.ncbi.nlm.nih.gov/pubmed/32372940 http://dx.doi.org/10.3389/fnbot.2020.00021
work_keys_str_mv	AT caojunjie generalizerobotlearningfromdemonstrationtovariantscenarioswithevolutionarypolicygradient AT liuweiwei generalizerobotlearningfromdemonstrationtovariantscenarioswithevolutionarypolicygradient AT liuyong generalizerobotlearningfromdemonstrationtovariantscenarioswithevolutionarypolicygradient AT yangjian generalizerobotlearningfromdemonstrationtovariantscenarioswithevolutionarypolicygradient

Generalize Robot Learning From Demonstration to Variant Scenarios With Evolutionary Policy Gradient

Ejemplares similares