Cargando…

Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment

We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nguyen, Quang Dang, Prokopenko, Mikhail
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Robotics and AI
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805756/ https://www.ncbi.nlm.nih.gov/pubmed/33501289 http://dx.doi.org/10.3389/frobt.2020.00123

_version_	1783636374039035904
author	Nguyen, Quang Dang Prokopenko, Mikhail
author_facet	Nguyen, Quang Dang Prokopenko, Mikhail
author_sort	Nguyen, Quang Dang
collection	PubMed
description	We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, Gliders2d, in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution.
format	Online Article Text
id	pubmed-7805756
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-78057562021-01-25 Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment Nguyen, Quang Dang Prokopenko, Mikhail Front Robot AI Robotics and AI We describe and evaluate a neural network-based architecture aimed to imitate and improve the performance of a fully autonomous soccer team in RoboCup Soccer 2D Simulation environment. The approach utilizes deep Q-network architecture for action determination and a deep neural network for parameter learning. The proposed solution is shown to be feasible for replacing a selected behavioral module in a well-established RoboCup base team, Gliders2d, in which behavioral modules have been evolved with human experts in the loop. Furthermore, we introduce an additional performance-correlated signal (a delayed reward signal), enabling a search for local maxima during a training phase. The extension is compared against a known benchmark. Finally, we investigate the extent to which preserving the structure of expert-designed behaviors affects the performance of a neural network-based solution. Frontiers Media S.A. 2020-09-16 /pmc/articles/PMC7805756/ /pubmed/33501289 http://dx.doi.org/10.3389/frobt.2020.00123 Text en Copyright © 2020 Nguyen and Prokopenko. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Robotics and AI Nguyen, Quang Dang Prokopenko, Mikhail Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title	Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_full	Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_fullStr	Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_full_unstemmed	Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_short	Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment
title_sort	structure-preserving imitation learning with delayed reward: an evaluation within the robocup soccer 2d simulation environment
topic	Robotics and AI
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805756/ https://www.ncbi.nlm.nih.gov/pubmed/33501289 http://dx.doi.org/10.3389/frobt.2020.00123
work_keys_str_mv	AT nguyenquangdang structurepreservingimitationlearningwithdelayedrewardanevaluationwithintherobocupsoccer2dsimulationenvironment AT prokopenkomikhail structurepreservingimitationlearningwithdelayedrewardanevaluationwithintherobocupsoccer2dsimulationenvironment

Structure-Preserving Imitation Learning With Delayed Reward: An Evaluation Within the RoboCup Soccer 2D Simulation Environment

Ejemplares similares