Cargando…

Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning

Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fernandez-Gauna, Borja, Etxeberria-Agiriano, Ismael, Graña, Manuel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497621/ https://www.ncbi.nlm.nih.gov/pubmed/26158587 http://dx.doi.org/10.1371/journal.pone.0127129

_version_	1782380530223808512
author	Fernandez-Gauna, Borja Etxeberria-Agiriano, Ismael Graña, Manuel
author_facet	Fernandez-Gauna, Borja Etxeberria-Agiriano, Ismael Graña, Manuel
author_sort	Fernandez-Gauna, Borja
collection	PubMed
description	Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent’s local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots.
format	Online Article Text
id	pubmed-4497621
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-44976212015-07-14 Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning Fernandez-Gauna, Borja Etxeberria-Agiriano, Ismael Graña, Manuel PLoS One Research Article Multi-Agent Reinforcement Learning (MARL) algorithms face two main difficulties: the curse of dimensionality, and environment non-stationarity due to the independent learning processes carried out by the agents concurrently. In this paper we formalize and prove the convergence of a Distributed Round Robin Q-learning (D-RR-QL) algorithm for cooperative systems. The computational complexity of this algorithm increases linearly with the number of agents. Moreover, it eliminates environment non sta tionarity by carrying a round-robin scheduling of the action selection and execution. That this learning scheme allows the implementation of Modular State-Action Vetoes (MSAV) in cooperative multi-agent systems, which speeds up learning convergence in over-constrained systems by vetoing state-action pairs which lead to undesired termination states (UTS) in the relevant state-action subspace. Each agent’s local state-action value function learning is an independent process, including the MSAV policies. Coordination of locally optimal policies to obtain the global optimal joint policy is achieved by a greedy selection procedure using message passing. We show that D-RR-QL improves over state-of-the-art approaches, such as Distributed Q-Learning, Team Q-Learning and Coordinated Reinforcement Learning in a paradigmatic Linked Multi-Component Robotic System (L-MCRS) control problem: the hose transportation task. L-MCRS are over-constrained systems with many UTS induced by the interaction of the passive linking element and the active mobile robots. Public Library of Science 2015-07-09 /pmc/articles/PMC4497621/ /pubmed/26158587 http://dx.doi.org/10.1371/journal.pone.0127129 Text en © 2015 Fernandez-Gauna et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Fernandez-Gauna, Borja Etxeberria-Agiriano, Ismael Graña, Manuel Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning
title	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning
title_full	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning
title_fullStr	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning
title_full_unstemmed	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning
title_short	Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning
title_sort	learning multirobot hose transportation and deployment by distributed round-robin q-learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497621/ https://www.ncbi.nlm.nih.gov/pubmed/26158587 http://dx.doi.org/10.1371/journal.pone.0127129
work_keys_str_mv	AT fernandezgaunaborja learningmultirobothosetransportationanddeploymentbydistributedroundrobinqlearning AT etxeberriaagirianoismael learningmultirobothosetransportationanddeploymentbydistributedroundrobinqlearning AT granamanuel learningmultirobothosetransportationanddeploymentbydistributedroundrobinqlearning

Learning Multirobot Hose Transportation and Deployment by Distributed Round-Robin Q-Learning

Ejemplares similares