Reinforcement Learning of Risk-Constrained Policies in Markov
Decision	       Processes

D 2020

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

BRÁZDIL, Tomáš; Krishnendu CHATTERJEE; Petr NOVOTNÝ a Jiří VAHALA

Základní údaje

Originální název

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Autoři

BRÁZDIL, Tomáš; Krishnendu CHATTERJEE; Petr NOVOTNÝ a Jiří VAHALA

Vydání

Palo Alto, California, USA, The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, od s. 9794-9801, 8 s. 2020

Nakladatel

AAAI Press

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10200 1.2 Computer and information sciences

Stát vydavatele

Spojené státy

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

elektronická verze "online"

Odkazy

URL

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/20:00114279

Organizační jednotka

Fakulta informatiky

ISBN

978-1-57735-823-7

Klíčová slova anglicky

reinforcement learning; Markov decision processes; Monte Carlo tree search; risk aversion

Štítky

best4, core_A, firank_1, formela-dec

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 15. 5. 2024 01:27, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 10^6 states.

Návaznosti

GA18-11193S, projekt VaV

Název: Algoritmy pro diskrétní systémy a hry s nekonečně mnoha stavy

Investor: Grantová agentura ČR, Algoritmy pro diskrétní systémy a hry s nekonečně mnoha stavy

GA19-15134Y, interní kód MU

Název: Verifikace a analýza pravděpodobnostních programů

Investor: Grantová agentura ČR, Verifikace a analýza pravděpodobnostních programů

GJ19-15134Y, projekt VaV

Název: Verifikace a analýza pravděpodobnostních programů

MUNI/A/1050/2019, interní kód MU

Název: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace IX (Akronym: SV-FI MAV IX)

Investor: Masarykova univerzita, Rozsáhlé výpočetní systémy: modely, aplikace a verifikace IX, DO R. 2020_Kategorie A - Specifický výzkum - Studentské výzkumné projekty

MUNI/G/0739/2017, interní kód MU

Název: Pushing the limits in automated NMR structure determination using a single 4D NOESY spectrum and machine learning methods

Investor: Masarykova univerzita, Pushing the limits in automated NMR structure determination using a single 4D NOESY spectrum and machine learning methods, INTERDISCIPLINARY - Mezioborové výzkumné projekty

Přehled o publikaci