Value Iteration for Long-Run Average Reward in Markov Decision
Processes

D 2017

Value Iteration for Long-Run Average Reward in Markov Decision Processes

ASHOK, Pranav; Krishnendu CHATTERJEE; Przemyslaw DACA; Jan KŘETÍNSKÝ; Tobias MEGGENDORFER et al.

Základní údaje

Originální název

Value Iteration for Long-Run Average Reward in Markov Decision Processes

Autoři

ASHOK, Pranav; Krishnendu CHATTERJEE; Przemyslaw DACA; Jan KŘETÍNSKÝ a Tobias MEGGENDORFER

Vydání

Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, od s. 201-221, 21 s. 2017

Nakladatel

Springer

Další údaje

Typ výsledku

Stať ve sborníku

Utajení

není předmětem státního či obchodního tajemství

Impakt faktor

Impact factor: 0.402 v roce 2005

Označené pro přenos do RIV

Organizační jednotka

Fakulta informatiky

ISBN

978-3-319-63386-2

ISSN

DOI

https://doi.org/10.1007/978-3-319-63387-9_10

Změněno: 17. 3. 2025 15:17, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks.

Citovat

ASHOK, Pranav; Krishnendu CHATTERJEE; Przemyslaw DACA; Jan KŘETÍNSKÝ a Tobias MEGGENDORFER. Value Iteration for Long-Run Average Reward in Markov Decision Processes. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I. Springer, 2017, s. 201-221. ISBN 978-3-319-63386-2. Dostupné z: https://doi.org/10.1007/978-3-319-63387-9_10.

@inproceedings{2484815,
   author = {Ashok, Pranav and Chatterjee, Krishnendu and Daca, Przemyslaw and Křetínský, Jan and Meggendorfer, Tobias},
   booktitle = {Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I},
   doi = {https://doi.org/10.1007/978-3-319-63387-9_10},
   isbn = {978-3-319-63386-2},
   pages = {201-221},
   publisher = {Springer},
   title = {Value Iteration for Long-Run Average Reward in Markov Decision Processes},
   year = {2017}
}

TY  - CONF
ID  - 2484815
AU  - Ashok, Pranav - Chatterjee, Krishnendu - Daca, Przemyslaw - Křetínský, Jan - Meggendorfer, Tobias
PY  - 2017
TI  - Value Iteration for Long-Run Average Reward in Markov Decision Processes
PB  - Springer
SN  - 9783319633862
N2  - Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks.
ER  -

ASHOK, Pranav; Krishnendu CHATTERJEE; Przemyslaw DACA; Jan KŘETÍNSKÝ a Tobias MEGGENDORFER. Value Iteration for Long-Run Average Reward in Markov Decision Processes. In \textit{Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I}. Springer, 2017, s.~201-221. ISBN~978-3-319-63386-2. Dostupné z: https://doi.org/10.1007/978-3-319-63387-9\_{}10.

Přehled o publikaci