Cargando…

Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment

We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Quang Dang, Prokopenko, Mikhail
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805756/
https://www.ncbi.nlm.nih.gov/pubmed/33501289
http://dx.doi.org/10.3389/frobt.2020.00123
_version_ 1783636374039035904
author Nguyen, Quang Dang
Prokopenko, Mikhail
author_facet Nguyen, Quang Dang
Prokopenko, Mikhail
author_sort Nguyen, Quang Dang
collection PubMed
description We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, Gliders2d, in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution.
format Online
Article
Text
id pubmed-7805756
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78057562021-01-25 Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment Nguyen, Quang Dang Prokopenko, Mikhail Front Robot AI Robotics and AI We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, Gliders2d, in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution. Frontiers Media S.A. 2020-09-16 /pmc/articles/PMC7805756/ /pubmed/33501289 http://dx.doi.org/10.3389/frobt.2020.00123 Text en Copyright © 2020 Nguyen and Prokopenko. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Robotics and AI
Nguyen, Quang Dang
Prokopenko, Mikhail
Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_full Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_fullStr Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_full_unstemmed Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_short Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_sort structure-preserving imitation learning with delayed reward: an evaluation within the robocup soccer 2d simulation environment
topic Robotics and AI
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805756/
https://www.ncbi.nlm.nih.gov/pubmed/33501289
http://dx.doi.org/10.3389/frobt.2020.00123
work_keys_str_mv AT nguyenquangdang structurepreservingimitationlearningwithdelayedrewardanevaluationwithintherobocupsoccer2dsimulationenvironment
AT prokopenkomikhail structurepreservingimitationlearningwithdelayedrewardanevaluationwithintherobocupsoccer2dsimulationenvironment