Cargando…

A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition

The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environme...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Wei, Liu, Zhiming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Data Mining and Machine Learning
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455281/ https://www.ncbi.nlm.nih.gov/pubmed/36091983 http://dx.doi.org/10.7717/peerj-cs.1094

_version_	1784785548395675648
author	Zhao, Wei Liu, Zhiming
author_facet	Zhao, Wei Liu, Zhiming
author_sort	Zhao, Wei
collection	PubMed
description	The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving.
format	Online Article Text
id	pubmed-9455281
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-94552812022-09-09 A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition Zhao, Wei Liu, Zhiming PeerJ Comput Sci Data Mining and Machine Learning The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving. PeerJ Inc. 2022-09-05 /pmc/articles/PMC9455281/ /pubmed/36091983 http://dx.doi.org/10.7717/peerj-cs.1094 Text en © 2022 Zhao and Liu https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Data Mining and Machine Learning Zhao, Wei Liu, Zhiming A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_full	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_fullStr	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_full_unstemmed	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_short	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_sort	learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
topic	Data Mining and Machine Learning
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455281/ https://www.ncbi.nlm.nih.gov/pubmed/36091983 http://dx.doi.org/10.7717/peerj-cs.1094
work_keys_str_mv	AT zhaowei alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT liuzhiming alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT zhaowei learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT liuzhiming learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition

A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition

Ejemplares similares