Cargando…

α-Rank: Multi-Agent Evaluation by Evolution

We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete...

Descripción completa

Detalles Bibliográficos
Autores principales: Omidshafiei, Shayegan, Papadimitriou, Christos, Piliouras, Georgios, Tuyls, Karl, Rowland, Mark, Lespiau, Jean-Baptiste, Czarnecki, Wojciech M., Lanctot, Marc, Perolat, Julien, Munos, Remi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617105/
https://www.ncbi.nlm.nih.gov/pubmed/31289288
http://dx.doi.org/10.1038/s41598-019-45619-9
_version_ 1783433614760869888
author Omidshafiei, Shayegan
Papadimitriou, Christos
Piliouras, Georgios
Tuyls, Karl
Rowland, Mark
Lespiau, Jean-Baptiste
Czarnecki, Wojciech M.
Lanctot, Marc
Perolat, Julien
Munos, Remi
author_facet Omidshafiei, Shayegan
Papadimitriou, Christos
Piliouras, Georgios
Tuyls, Karl
Rowland, Mark
Lespiau, Jean-Baptiste
Czarnecki, Wojciech M.
Lanctot, Marc
Perolat, Julien
Munos, Remi
author_sort Omidshafiei, Shayegan
collection PubMed
description We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). α-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model’s ranking-intensity parameter, α, is chosen to be large, which exactly forms the basis of α-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our α-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the α-Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.
format Online
Article
Text
id pubmed-6617105
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-66171052019-07-18 α-Rank: Multi-Agent Evaluation by Evolution Omidshafiei, Shayegan Papadimitriou, Christos Piliouras, Georgios Tuyls, Karl Rowland, Mark Lespiau, Jean-Baptiste Czarnecki, Wojciech M. Lanctot, Marc Perolat, Julien Munos, Remi Sci Rep Article We introduce α-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). α-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model’s ranking-intensity parameter, α, is chosen to be large, which exactly forms the basis of α-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our α-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the α-Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker. Nature Publishing Group UK 2019-07-09 /pmc/articles/PMC6617105/ /pubmed/31289288 http://dx.doi.org/10.1038/s41598-019-45619-9 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Omidshafiei, Shayegan
Papadimitriou, Christos
Piliouras, Georgios
Tuyls, Karl
Rowland, Mark
Lespiau, Jean-Baptiste
Czarnecki, Wojciech M.
Lanctot, Marc
Perolat, Julien
Munos, Remi
α-Rank: Multi-Agent Evaluation by Evolution
title α-Rank: Multi-Agent Evaluation by Evolution
title_full α-Rank: Multi-Agent Evaluation by Evolution
title_fullStr α-Rank: Multi-Agent Evaluation by Evolution
title_full_unstemmed α-Rank: Multi-Agent Evaluation by Evolution
title_short α-Rank: Multi-Agent Evaluation by Evolution
title_sort α-rank: multi-agent evaluation by evolution
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617105/
https://www.ncbi.nlm.nih.gov/pubmed/31289288
http://dx.doi.org/10.1038/s41598-019-45619-9
work_keys_str_mv AT omidshafieishayegan arankmultiagentevaluationbyevolution
AT papadimitriouchristos arankmultiagentevaluationbyevolution
AT piliourasgeorgios arankmultiagentevaluationbyevolution
AT tuylskarl arankmultiagentevaluationbyevolution
AT rowlandmark arankmultiagentevaluationbyevolution
AT lespiaujeanbaptiste arankmultiagentevaluationbyevolution
AT czarneckiwojciechm arankmultiagentevaluationbyevolution
AT lanctotmarc arankmultiagentevaluationbyevolution
AT perolatjulien arankmultiagentevaluationbyevolution
AT munosremi arankmultiagentevaluationbyevolution