Cargando…

Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lim, Hyun-Kyo, Kim, Ju-Bong, Heo, Joo-Seong, Han, Youn-Hee
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085801/ https://www.ncbi.nlm.nih.gov/pubmed/32121671 http://dx.doi.org/10.3390/s20051359

_version_	1783509016296554496
author	Lim, Hyun-Kyo Kim, Ju-Bong Heo, Joo-Seong Han, Youn-Hee
author_facet	Lim, Hyun-Kyo Kim, Ju-Bong Heo, Joo-Seong Han, Youn-Hee
author_sort	Lim, Hyun-Kyo
collection	PubMed
description	Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.
format	Online Article Text
id	pubmed-7085801
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-70858012020-03-25 Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices Lim, Hyun-Kyo Kim, Ju-Bong Heo, Joo-Seong Han, Youn-Hee Sensors (Basel) Article Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved. MDPI 2020-03-02 /pmc/articles/PMC7085801/ /pubmed/32121671 http://dx.doi.org/10.3390/s20051359 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Lim, Hyun-Kyo Kim, Ju-Bong Heo, Joo-Seong Han, Youn-Hee Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices
title	Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices
title_full	Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices
title_fullStr	Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices
title_full_unstemmed	Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices
title_short	Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices
title_sort	federated reinforcement learning for training control policies on multiple iot devices
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085801/ https://www.ncbi.nlm.nih.gov/pubmed/32121671 http://dx.doi.org/10.3390/s20051359
work_keys_str_mv	AT limhyunkyo federatedreinforcementlearningfortrainingcontrolpoliciesonmultipleiotdevices AT kimjubong federatedreinforcementlearningfortrainingcontrolpoliciesonmultipleiotdevices AT heojooseong federatedreinforcementlearningfortrainingcontrolpoliciesonmultipleiotdevices AT hanyounhee federatedreinforcementlearningfortrainingcontrolpoliciesonmultipleiotdevices

Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

Ejemplares similares