Cargando…
Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction
Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge insta...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8486502/ https://www.ncbi.nlm.nih.gov/pubmed/34603434 http://dx.doi.org/10.1155/2021/7588221 |
_version_ | 1784577752274305024 |
---|---|
author | Zhang, Yichuan Lan, Yixing Fang, Qiang Xu, Xin Li, Junxiang Zeng, Yujun |
author_facet | Zhang, Yichuan Lan, Yixing Fang, Qiang Xu, Xin Li, Junxiang Zeng, Yujun |
author_sort | Zhang, Yichuan |
collection | PubMed |
description | Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy's confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods. |
format | Online Article Text |
id | pubmed-8486502 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-84865022021-10-02 Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction Zhang, Yichuan Lan, Yixing Fang, Qiang Xu, Xin Li, Junxiang Zeng, Yujun Comput Intell Neurosci Research Article Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy's confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods. Hindawi 2021-09-24 /pmc/articles/PMC8486502/ /pubmed/34603434 http://dx.doi.org/10.1155/2021/7588221 Text en Copyright © 2021 Yichuan Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zhang, Yichuan Lan, Yixing Fang, Qiang Xu, Xin Li, Junxiang Zeng, Yujun Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction |
title | Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction |
title_full | Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction |
title_fullStr | Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction |
title_full_unstemmed | Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction |
title_short | Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction |
title_sort | efficient reinforcement learning from demonstration via bayesian network-based knowledge extraction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8486502/ https://www.ncbi.nlm.nih.gov/pubmed/34603434 http://dx.doi.org/10.1155/2021/7588221 |
work_keys_str_mv | AT zhangyichuan efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction AT lanyixing efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction AT fangqiang efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction AT xuxin efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction AT lijunxiang efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction AT zengyujun efficientreinforcementlearningfromdemonstrationviabayesiannetworkbasedknowledgeextraction |