Reinforcement Learning of Risk-Constrained Policies in Markov
Decision	       Processes

BRÁZDIL, Tomáš, Krishnendu CHATTERJEE, Petr NOVOTNÝ and Jiří VAHALA. Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes. Online. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020. Palo Alto, California, USA: AAAI Press, 2020, p. 9794-9801. ISBN 978-1-57735-823-7. Available from: https://dx.doi.org/10.1609/aaai.v34i06.6531.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Authors	BRÁZDIL, Tomáš (203 Czech Republic, belonging to the institution), Krishnendu CHATTERJEE (356 India), Petr NOVOTNÝ (203 Czech Republic, guarantor, belonging to the institution) and Jiří VAHALA (203 Czech Republic, belonging to the institution).
Edition	Palo Alto, California, USA, The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, p. 9794-9801, 8 pp. 2020.
Publisher	AAAI Press

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10200 1.2 Computer and information sciences
Country of publisher	United States of America
Confidentiality degree	is not subject to a state or trade secret
Publication form	electronic version available online
WWW	URL
RIV identification code	RIV/00216224:14330/20:00114279
Organization unit	Faculty of Informatics
ISBN	978-1-57735-823-7
Doi	http://dx.doi.org/10.1609/aaai.v34i06.6531
UT WoS	000668126802030
Keywords in English	reinforcement learning; Markov decision processes; Monte Carlo tree search; risk aversion
Tags	best4, core_A, firank_1, formela-dec
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 15/5/2024 01:27.

Abstract

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.

Links
GA18-11193S, research and development project	Name: Algoritmy pro diskrétní systémy a hry s nekonečně mnoha stavy
GA18-11193S, research and development project	Investor: Czech Science Foundation
GA19-15134Y, interní kód MU	Name: Verifikace a analýza pravděpodobnostních programů
GA19-15134Y, interní kód MU	Investor: Czech Science Foundation
GJ19-15134Y, research and development project	Name: Verifikace a analýza pravděpodobnostních programů
MUNI/A/1050/2019, interní kód MU	Name: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace IX (Acronym: SV-FI MAV IX)
MUNI/A/1050/2019, interní kód MU	Investor: Masaryk University, Category A
MUNI/G/0739/2017, interní kód MU	Name: Pushing the limits in automated NMR structure determination using a single 4D NOESY spectrum and machine learning methods
MUNI/G/0739/2017, interní kód MU	Investor: Masaryk University, INTERDISCIPLINARY - Interdisciplinary research projects

PrintDisplayed: 9/10/2024 19:48

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Other applications