Cargando…

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training ti...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Jung In, Lee, Young Jae, Heo, Jongkook, Park, Jinhyeok, Kim, Jaehoon, Lim, Sae Rin, Jeong, Jinyong, Kim, Seoung Bum
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501567/
https://www.ncbi.nlm.nih.gov/pubmed/37708154
http://dx.doi.org/10.1371/journal.pone.0291545
_version_ 1785106137895403520
author Kim, Jung In
Lee, Young Jae
Heo, Jongkook
Park, Jinhyeok
Kim, Jaehoon
Lim, Sae Rin
Jeong, Jinyong
Kim, Seoung Bum
author_facet Kim, Jung In
Lee, Young Jae
Heo, Jongkook
Park, Jinhyeok
Kim, Jaehoon
Lim, Sae Rin
Jeong, Jinyong
Kim, Seoung Bum
author_sort Kim, Jung In
collection PubMed
description Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training times and large amounts of data to learn optimal policies. These limitations are more pronounced in the context of multi-agent reinforcement learning (MARL). To address these limitations, various studies have been conducted to improve DRL. In this study, we propose an approach that combines a masked reconstruction task with QMIX (M-QMIX). By introducing a masked reconstruction task as an auxiliary task, we aim to achieve enhanced sample efficiency—a fundamental limitation of RL in multi-agent systems. Experiments were conducted using the StarCraft II micromanagement benchmark to validate the effectiveness of the proposed method. We used 11 scenarios comprising five easy, three hard, and three very hard scenarios. We particularly focused on using a limited number of time steps for each scenario to demonstrate the improved sample efficiency. Compared to QMIX, the proposed method is superior in eight of the 11 scenarios. These results provide strong evidence that the proposed method is more sample-efficient than QMIX, demonstrating that it effectively addresses the limitations of DRL in multi-agent systems.
format Online
Article
Text
id pubmed-10501567
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-105015672023-09-15 Sample-efficient multi-agent reinforcement learning with masked reconstruction Kim, Jung In Lee, Young Jae Heo, Jongkook Park, Jinhyeok Kim, Jaehoon Lim, Sae Rin Jeong, Jinyong Kim, Seoung Bum PLoS One Research Article Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training times and large amounts of data to learn optimal policies. These limitations are more pronounced in the context of multi-agent reinforcement learning (MARL). To address these limitations, various studies have been conducted to improve DRL. In this study, we propose an approach that combines a masked reconstruction task with QMIX (M-QMIX). By introducing a masked reconstruction task as an auxiliary task, we aim to achieve enhanced sample efficiency—a fundamental limitation of RL in multi-agent systems. Experiments were conducted using the StarCraft II micromanagement benchmark to validate the effectiveness of the proposed method. We used 11 scenarios comprising five easy, three hard, and three very hard scenarios. We particularly focused on using a limited number of time steps for each scenario to demonstrate the improved sample efficiency. Compared to QMIX, the proposed method is superior in eight of the 11 scenarios. These results provide strong evidence that the proposed method is more sample-efficient than QMIX, demonstrating that it effectively addresses the limitations of DRL in multi-agent systems. Public Library of Science 2023-09-14 /pmc/articles/PMC10501567/ /pubmed/37708154 http://dx.doi.org/10.1371/journal.pone.0291545 Text en © 2023 Kim et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kim, Jung In
Lee, Young Jae
Heo, Jongkook
Park, Jinhyeok
Kim, Jaehoon
Lim, Sae Rin
Jeong, Jinyong
Kim, Seoung Bum
Sample-efficient multi-agent reinforcement learning with masked reconstruction
title Sample-efficient multi-agent reinforcement learning with masked reconstruction
title_full Sample-efficient multi-agent reinforcement learning with masked reconstruction
title_fullStr Sample-efficient multi-agent reinforcement learning with masked reconstruction
title_full_unstemmed Sample-efficient multi-agent reinforcement learning with masked reconstruction
title_short Sample-efficient multi-agent reinforcement learning with masked reconstruction
title_sort sample-efficient multi-agent reinforcement learning with masked reconstruction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501567/
https://www.ncbi.nlm.nih.gov/pubmed/37708154
http://dx.doi.org/10.1371/journal.pone.0291545
work_keys_str_mv AT kimjungin sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction
AT leeyoungjae sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction
AT heojongkook sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction
AT parkjinhyeok sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction
AT kimjaehoon sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction
AT limsaerin sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction
AT jeongjinyong sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction
AT kimseoungbum sampleefficientmultiagentreinforcementlearningwithmaskedreconstruction