Cargando…

Optimistic Value Iteration

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” v...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hartmanns, Arnd, Kaminski, Benjamin Lucien
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7363440/ http://dx.doi.org/10.1007/978-3-030-53291-8_26

_version_	1783559655715241984
author	Hartmanns, Arnd Kaminski, Benjamin Lucien
author_facet	Hartmanns, Arnd Kaminski, Benjamin Lucien
author_sort	Hartmanns, Arnd
collection	PubMed
description	Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.
format	Online Article Text
id	pubmed-7363440
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-73634402020-07-16 Optimistic Value Iteration Hartmanns, Arnd Kaminski, Benjamin Lucien Computer Aided Verification Article Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset. 2020-06-16 /pmc/articles/PMC7363440/ http://dx.doi.org/10.1007/978-3-030-53291-8_26 Text en © The Author(s) 2020 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
spellingShingle	Article Hartmanns, Arnd Kaminski, Benjamin Lucien Optimistic Value Iteration
title	Optimistic Value Iteration
title_full	Optimistic Value Iteration
title_fullStr	Optimistic Value Iteration
title_full_unstemmed	Optimistic Value Iteration
title_short	Optimistic Value Iteration
title_sort	optimistic value iteration
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7363440/ http://dx.doi.org/10.1007/978-3-030-53291-8_26
work_keys_str_mv	AT hartmannsarnd optimisticvalueiteration AT kaminskibenjaminlucien optimisticvalueiteration

Optimistic Value Iteration

Ejemplares similares