Cargando…

Intrinsic fluctuations of reinforcement learning promote cooperation

In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms p...

Descripción completa

Detalles Bibliográficos
Autores principales: Barfuss, Wolfram, Meylahn, Janusz M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873645/
https://www.ncbi.nlm.nih.gov/pubmed/36693872
http://dx.doi.org/10.1038/s41598-023-27672-7
_version_ 1784877642119380992
author Barfuss, Wolfram
Meylahn, Janusz M.
author_facet Barfuss, Wolfram
Meylahn, Janusz M.
author_sort Barfuss, Wolfram
collection PubMed
description In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.
format Online
Article
Text
id pubmed-9873645
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-98736452023-01-26 Intrinsic fluctuations of reinforcement learning promote cooperation Barfuss, Wolfram Meylahn, Janusz M. Sci Rep Article In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects. Nature Publishing Group UK 2023-01-24 /pmc/articles/PMC9873645/ /pubmed/36693872 http://dx.doi.org/10.1038/s41598-023-27672-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Barfuss, Wolfram
Meylahn, Janusz M.
Intrinsic fluctuations of reinforcement learning promote cooperation
title Intrinsic fluctuations of reinforcement learning promote cooperation
title_full Intrinsic fluctuations of reinforcement learning promote cooperation
title_fullStr Intrinsic fluctuations of reinforcement learning promote cooperation
title_full_unstemmed Intrinsic fluctuations of reinforcement learning promote cooperation
title_short Intrinsic fluctuations of reinforcement learning promote cooperation
title_sort intrinsic fluctuations of reinforcement learning promote cooperation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873645/
https://www.ncbi.nlm.nih.gov/pubmed/36693872
http://dx.doi.org/10.1038/s41598-023-27672-7
work_keys_str_mv AT barfusswolfram intrinsicfluctuationsofreinforcementlearningpromotecooperation
AT meylahnjanuszm intrinsicfluctuationsofreinforcementlearningpromotecooperation