Cargando…

Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning

Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear tha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shi, Daming, Guo, Xudong, Liu, Yi, Fan, Wenhui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9222241/ https://www.ncbi.nlm.nih.gov/pubmed/35741495 http://dx.doi.org/10.3390/e24060774

_version_	1784732825666191360
author	Shi, Daming Guo, Xudong Liu, Yi Fan, Wenhui
author_facet	Shi, Daming Guo, Xudong Liu, Yi Fan, Wenhui
author_sort	Shi, Daming
collection	PubMed
description	Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold ’em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.
format	Online Article Text
id	pubmed-9222241
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-92222412022-06-24 Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning Shi, Daming Guo, Xudong Liu, Yi Fan, Wenhui Entropy (Basel) Article Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold ’em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy. MDPI 2022-05-30 /pmc/articles/PMC9222241/ /pubmed/35741495 http://dx.doi.org/10.3390/e24060774 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Shi, Daming Guo, Xudong Liu, Yi Fan, Wenhui Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
title	Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
title_full	Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
title_fullStr	Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
title_full_unstemmed	Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
title_short	Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
title_sort	optimal policy of multiplayer poker via actor-critic reinforcement learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9222241/ https://www.ncbi.nlm.nih.gov/pubmed/35741495 http://dx.doi.org/10.3390/e24060774
work_keys_str_mv	AT shidaming optimalpolicyofmultiplayerpokerviaactorcriticreinforcementlearning AT guoxudong optimalpolicyofmultiplayerpokerviaactorcriticreinforcementlearning AT liuyi optimalpolicyofmultiplayerpokerviaactorcriticreinforcementlearning AT fanwenhui optimalpolicyofmultiplayerpokerviaactorcriticreinforcementlearning

Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning

Ejemplares similares