Reinforcement Learning of Risk-Constrained Policies in Markov
Decision	       Processes

BRÁZDIL, Tomáš, Krishnendu CHATTERJEE, Petr NOVOTNÝ a Jiří VAHALA. Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes. Online. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020. Palo Alto, California, USA: AAAI Press, 2020, s. 9794-9801. ISBN 978-1-57735-823-7. Dostupné z: https://dx.doi.org/10.1609/aaai.v34i06.6531.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Autoři	BRÁZDIL, Tomáš (203 Česká republika, domácí), Krishnendu CHATTERJEE (356 Indie), Petr NOVOTNÝ (203 Česká republika, garant, domácí) a Jiří VAHALA (203 Česká republika, domácí).
Vydání	Palo Alto, California, USA, The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, od s. 9794-9801, 8 s. 2020.
Nakladatel	AAAI Press

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10200 1.2 Computer and information sciences
Stát vydavatele	Spojené státy
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	elektronická verze "online"
WWW	URL
Kód RIV	RIV/00216224:14330/20:00114279
Organizační jednotka	Fakulta informatiky
ISBN	978-1-57735-823-7
Doi	http://dx.doi.org/10.1609/aaai.v34i06.6531
UT WoS	000668126802030
Klíčová slova anglicky	reinforcement learning; Markov decision processes; Monte Carlo tree search; risk aversion
Štítky	best4, core_A, firank_1, formela-dec
Příznaky	Mezinárodní význam, Recenzováno
Změnil	Změnil: RNDr. Pavel Šmerk, Ph.D., učo 3880. Změněno: 6. 4. 2023 14:29.

Anotace

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.

Návaznosti
GA18-11193S, projekt VaV	Název: Algoritmy pro diskrétní systémy a hry s nekonečně mnoha stavy
GA18-11193S, projekt VaV	Investor: Grantová agentura ČR, Algoritmy pro diskrétní systémy a hry s nekonečně mnoha stavy
GA19-15134Y, interní kód MU	Název: Verifikace a analýza pravděpodobnostních programů
GA19-15134Y, interní kód MU	Investor: Grantová agentura ČR, Verifikace a analýza pravděpodobnostních programů
GJ19-15134Y, projekt VaV	Název: Verifikace a analýza pravděpodobnostních programů
MUNI/A/1050/2019, interní kód MU	Název: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace IX (Akronym: SV-FI MAV IX)
MUNI/A/1050/2019, interní kód MU	Investor: Masarykova univerzita, Rozsáhlé výpočetní systémy: modely, aplikace a verifikace IX, DO R. 2020_Kategorie A - Specifický výzkum - Studentské výzkumné projekty
MUNI/G/0739/2017, interní kód MU	Název: Pushing the limits in automated NMR structure determination using a single 4D NOESY spectrum and machine learning methods
MUNI/G/0739/2017, interní kód MU	Investor: Masarykova univerzita, Pushing the limits in automated NMR structure determination using a single 4D NOESY spectrum and machine learning methods, INTERDISCIPLINARY - Mezioborové výzkumné projekty

VytisknoutZobrazeno: 11. 5. 2024 10:42

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Další aplikace