Multi-objective Discounted Reward Verification in Graphs and
MDPs

CHATTERJEE, Krishnendu, Vojtěch FOREJT and Dominik WOJTCZAK. Multi-objective Discounted Reward Verification in Graphs and MDPs. In Kenneth L. McMillan and Aart Middeldorp and Andrei Voronkov. Logic for Programming, Artificial Intelligence, and Reasoning. Berlin, Heidelberg: Springer, 2013, p. 228-242. ISBN 978-3-642-45220-8. Available from: https://dx.doi.org/10.1007/978-3-642-45221-5_17.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Multi-objective Discounted Reward Verification in Graphs and MDPs
Authors	CHATTERJEE, Krishnendu (356 India), Vojtěch FOREJT (203 Czech Republic, guarantor, belonging to the institution) and Dominik WOJTCZAK (616 Poland).
Edition	Berlin, Heidelberg, Logic for Programming, Artificial Intelligence, and Reasoning, p. 228-242, 15 pp. 2013.
Publisher	Springer

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Germany
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
Impact factor	Impact factor: 0.402 in 2005
RIV identification code	RIV/00216224:14330/13:00072859
Organization unit	Faculty of Informatics
ISBN	978-3-642-45220-8
ISSN	0302-9743
Doi	http://dx.doi.org/10.1007/978-3-642-45221-5_17
Keywords in English	multi-objective verification; markov decision processes; graphs
Tags	core_A, firank_A
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 29/4/2014 20:09.

Abstract

We study the problem of achieving a given value in Markov decision processes (MDPs) with several independent discounted reward objectives. We consider a generalised version of discounted reward objectives, in which the amount of discounting depends on the states visited and on the objective. This definition extends the usual definition of discounted reward, and allows to capture the systems in which the value of different commodities diminish at different and variable rates. We establish results for two prominent subclasses of the problem, namely state-discount models where the discount factors are only dependent on the state of the MDP (and independent of the objective), and reward-discount models where they are only dependent on the objective (but not on the state of the MDP). For the state-discount models we use a straightforward reduction to expected total reward and show that the problem whether a value is achievable can be solved in polynomial time. For the reward-discount model we show that memory and randomisation of the strategies are required, but nevertheless that the problem is decidable and it is sufficient to consider strategies which after a certain number of steps behave in a memoryless way. For the general case, we show that when restricted to graphs (i.e. MDPs with no randomisation), pure strategies and discount factors of the form 1/n where n is an integer, the problem is in PSPACE and finite memory suffices for achieving a given value. We also show that when the discount factors are not of the form 1/n, the memory required by a strategy can be infinite.

Links
LG13010, research and development project	Name: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
LG13010, research and development project	Investor: Ministry of Education, Youth and Sports of the CR
MUNI/33/IP1/2013, interní kód MU	Name: Podpora perspektivních výzkumných týmů Fakulty informatiky a vynikajících vědeckých pracovníků z jiných institucí působících na Fakultě informatiky (Acronym: PVT-VVPZ)
MUNI/33/IP1/2013, interní kód MU	Investor: Masaryk University

PrintDisplayed: 18/9/2024 09:28

Multi-objective Discounted Reward Verification in Graphs and MDPs

Other applications