Cargando…
Intrinsic fluctuations of reinforcement learning promote cooperation
In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms p...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873645/ https://www.ncbi.nlm.nih.gov/pubmed/36693872 http://dx.doi.org/10.1038/s41598-023-27672-7 |
_version_ | 1784877642119380992 |
---|---|
author | Barfuss, Wolfram Meylahn, Janusz M. |
author_facet | Barfuss, Wolfram Meylahn, Janusz M. |
author_sort | Barfuss, Wolfram |
collection | PubMed |
description | In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects. |
format | Online Article Text |
id | pubmed-9873645 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-98736452023-01-26 Intrinsic fluctuations of reinforcement learning promote cooperation Barfuss, Wolfram Meylahn, Janusz M. Sci Rep Article In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects. Nature Publishing Group UK 2023-01-24 /pmc/articles/PMC9873645/ /pubmed/36693872 http://dx.doi.org/10.1038/s41598-023-27672-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Barfuss, Wolfram Meylahn, Janusz M. Intrinsic fluctuations of reinforcement learning promote cooperation |
title | Intrinsic fluctuations of reinforcement learning promote cooperation |
title_full | Intrinsic fluctuations of reinforcement learning promote cooperation |
title_fullStr | Intrinsic fluctuations of reinforcement learning promote cooperation |
title_full_unstemmed | Intrinsic fluctuations of reinforcement learning promote cooperation |
title_short | Intrinsic fluctuations of reinforcement learning promote cooperation |
title_sort | intrinsic fluctuations of reinforcement learning promote cooperation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873645/ https://www.ncbi.nlm.nih.gov/pubmed/36693872 http://dx.doi.org/10.1038/s41598-023-27672-7 |
work_keys_str_mv | AT barfusswolfram intrinsicfluctuationsofreinforcementlearningpromotecooperation AT meylahnjanuszm intrinsicfluctuationsofreinforcementlearningpromotecooperation |