Detailed Information on Publication Record
2020
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
BRÁZDIL, Tomáš, Krishnendu CHATTERJEE, Petr NOVOTNÝ and Jiří VAHALABasic information
Original name
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Authors
BRÁZDIL, Tomáš (203 Czech Republic, belonging to the institution), Krishnendu CHATTERJEE (356 India), Petr NOVOTNÝ (203 Czech Republic, guarantor, belonging to the institution) and Jiří VAHALA (203 Czech Republic, belonging to the institution)
Edition
Palo Alto, California, USA, The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, p. 9794-9801, 8 pp. 2020
Publisher
AAAI Press
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10200 1.2 Computer and information sciences
Country of publisher
United States of America
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
References:
RIV identification code
RIV/00216224:14330/20:00114279
Organization unit
Faculty of Informatics
ISBN
978-1-57735-823-7
UT WoS
000668126802030
Keywords in English
reinforcement learning; Markov decision processes; Monte Carlo tree search; risk aversion
Tags
Tags
International impact, Reviewed
Změněno: 15/5/2024 01:27, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.
Links
GA18-11193S, research and development project |
| ||
GA19-15134Y, interní kód MU |
| ||
GJ19-15134Y, research and development project |
| ||
MUNI/A/1050/2019, interní kód MU |
| ||
MUNI/G/0739/2017, interní kód MU |
|