Cargando…

Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction

Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge insta...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yichuan, Lan, Yixing, Fang, Qiang, Xu, Xin, Li, Junxiang, Zeng, Yujun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8486502/
https://www.ncbi.nlm.nih.gov/pubmed/34603434
http://dx.doi.org/10.1155/2021/7588221
_version_ 1784577752274305024
author Zhang, Yichuan
Lan, Yixing
Fang, Qiang
Xu, Xin
Li, Junxiang
Zeng, Yujun
author_facet Zhang, Yichuan
Lan, Yixing
Fang, Qiang
Xu, Xin
Li, Junxiang
Zeng, Yujun
author_sort Zhang, Yichuan
collection PubMed
description Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy's confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods.
format Online
Article
Text
id pubmed-8486502
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-84865022021-10-02 Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction Zhang, Yichuan Lan, Yixing Fang, Qiang Xu, Xin Li, Junxiang Zeng, Yujun Comput Intell Neurosci Research Article Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy's confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods. Hindawi 2021-09-24 /pmc/articles/PMC8486502/ /pubmed/34603434 http://dx.doi.org/10.1155/2021/7588221 Text en Copyright © 2021 Yichuan Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhang, Yichuan
Lan, Yixing
Fang, Qiang
Xu, Xin
Li, Junxiang
Zeng, Yujun
Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction
title Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction
title_full Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction
title_fullStr Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction
title_full_unstemmed Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction
title_short Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction
title_sort efficient reinforcement learning from demonstration via bayesian network-based knowledge extraction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8486502/
https://www.ncbi.nlm.nih.gov/pubmed/34603434
http://dx.doi.org/10.1155/2021/7588221
work_keys_str_mv AT zhangyichuan efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction
AT lanyixing efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction
AT fangqiang efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction
AT xuxin efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction
AT lijunxiang efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction
AT zengyujun efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction