Cargando…

Optimistic Value Iteration

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” v...

Descripción completa

Detalles Bibliográficos
Autores principales: Hartmanns, Arnd, Kaminski, Benjamin Lucien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7363440/
http://dx.doi.org/10.1007/978-3-030-53291-8_26
_version_ 1783559655715241984
author Hartmanns, Arnd
Kaminski, Benjamin Lucien
author_facet Hartmanns, Arnd
Kaminski, Benjamin Lucien
author_sort Hartmanns, Arnd
collection PubMed
description Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.
format Online
Article
Text
id pubmed-7363440
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73634402020-07-16 Optimistic Value Iteration Hartmanns, Arnd Kaminski, Benjamin Lucien Computer Aided Verification Article Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset. 2020-06-16 /pmc/articles/PMC7363440/ http://dx.doi.org/10.1007/978-3-030-53291-8_26 Text en © The Author(s) 2020 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
spellingShingle Article
Hartmanns, Arnd
Kaminski, Benjamin Lucien
Optimistic Value Iteration
title Optimistic Value Iteration
title_full Optimistic Value Iteration
title_fullStr Optimistic Value Iteration
title_full_unstemmed Optimistic Value Iteration
title_short Optimistic Value Iteration
title_sort optimistic value iteration
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7363440/
http://dx.doi.org/10.1007/978-3-030-53291-8_26
work_keys_str_mv AT hartmannsarnd optimisticvalueiteration
AT kaminskibenjaminlucien optimisticvalueiteration