Cargando…

Intrinsic fluctuations of reinforcement learning promote cooperation

In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barfuss, Wolfram, Meylahn, Janusz M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873645/ https://www.ncbi.nlm.nih.gov/pubmed/36693872 http://dx.doi.org/10.1038/s41598-023-27672-7

_version_	1784877642119380992
author	Barfuss, Wolfram Meylahn, Janusz M.
author_facet	Barfuss, Wolfram Meylahn, Janusz M.
author_sort	Barfuss, Wolfram
collection	PubMed
description	In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.
format	Online Article Text
id	pubmed-9873645
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-98736452023-01-26 Intrinsic fluctuations of reinforcement learning promote cooperation Barfuss, Wolfram Meylahn, Janusz M. Sci Rep Article In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text] -greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner’s dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents’ action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects. Nature Publishing Group UK 2023-01-24 /pmc/articles/PMC9873645/ /pubmed/36693872 http://dx.doi.org/10.1038/s41598-023-27672-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Barfuss, Wolfram Meylahn, Janusz M. Intrinsic fluctuations of reinforcement learning promote cooperation
title	Intrinsic fluctuations of reinforcement learning promote cooperation
title_full	Intrinsic fluctuations of reinforcement learning promote cooperation
title_fullStr	Intrinsic fluctuations of reinforcement learning promote cooperation
title_full_unstemmed	Intrinsic fluctuations of reinforcement learning promote cooperation
title_short	Intrinsic fluctuations of reinforcement learning promote cooperation
title_sort	intrinsic fluctuations of reinforcement learning promote cooperation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873645/ https://www.ncbi.nlm.nih.gov/pubmed/36693872 http://dx.doi.org/10.1038/s41598-023-27672-7
work_keys_str_mv	AT barfusswolfram intrinsicfluctuationsofreinforcementlearningpromotecooperation AT meylahnjanuszm intrinsicfluctuationsofreinforcementlearningpromotecooperation

Intrinsic fluctuations of reinforcement learning promote cooperation

Ejemplares similares