Cargando…
Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805756/ https://www.ncbi.nlm.nih.gov/pubmed/33501289 http://dx.doi.org/10.3389/frobt.2020.00123 |
_version_ | 1783636374039035904 |
---|---|
author | Nguyen, Quang Dang Prokopenko, Mikhail |
author_facet | Nguyen, Quang Dang Prokopenko, Mikhail |
author_sort | Nguyen, Quang Dang |
collection | PubMed |
description | We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, Gliders2d, in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution. |
format | Online Article Text |
id | pubmed-7805756 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78057562021-01-25 Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment Nguyen, Quang Dang Prokopenko, Mikhail Front Robot AI Robotics and AI We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, Gliders2d, in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution. Frontiers Media S.A. 2020-09-16 /pmc/articles/PMC7805756/ /pubmed/33501289 http://dx.doi.org/10.3389/frobt.2020.00123 Text en Copyright © 2020 Nguyen and Prokopenko. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Robotics and AI Nguyen, Quang Dang Prokopenko, Mikhail Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment |
title | Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment |
title_full | Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment |
title_fullStr | Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment |
title_full_unstemmed | Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment |
title_short | Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment |
title_sort | structure-preserving imitation learning with delayed reward: an evaluation within the robocup soccer 2d simulation environment |
topic | Robotics and AI |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805756/ https://www.ncbi.nlm.nih.gov/pubmed/33501289 http://dx.doi.org/10.3389/frobt.2020.00123 |
work_keys_str_mv | AT nguyenquangdang structurepreservingimitationlearningwithdelayedrewardanevaluationwithintherobocupsoccer2dsimulationenvironment AT prokopenkomikhail structurepreservingimitationlearningwithdelayedrewardanevaluationwithintherobocupsoccer2dsimulationenvironment |