Detailed Information on Publication Record
2016
Policy Learning for Time-Bounded Reachability in Continuous-Time Markov Decision Processes via Doubly-Stochastic Gradient Ascent
BRÁZDIL, Tomáš, Ezio BARTOCCI, Dimitrios MILIOS, Guido SANGUINETTI, Luca BORTOLUSSI et. al.Basic information
Original name
Policy Learning for Time-Bounded Reachability in Continuous-Time Markov Decision Processes via Doubly-Stochastic Gradient Ascent
Authors
BRÁZDIL, Tomáš (203 Czech Republic, guarantor, belonging to the institution), Ezio BARTOCCI (380 Italy), Dimitrios MILIOS (300 Greece), Guido SANGUINETTI (380 Italy) and Luca BORTOLUSSI (380 Italy)
Edition
Quebec City, Proceedings of QEST 2016, p. 244-259, 16 pp. 2016
Publisher
Springer
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Canada
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
Impact factor
Impact factor: 0.402 in 2005
RIV identification code
RIV/00216224:14330/16:00088513
Organization unit
Faculty of Informatics
ISBN
978-3-319-43424-7
ISSN
UT WoS
000389063800017
Keywords in English
continuous-time Markov decision processes; reachability; gradient descent
Tags
Změněno: 13/5/2020 19:26, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. The statistical approach has several advantages over conventional approaches based on uniformisation, as it can also be applied when the model is replaced by a black box, and does not suffer from state-space explosion. The use of a stochastic gradient to guide our search considerably improves the efficiency of learning policies. We demonstrate the method on a proof-of-principle non-linear population model, showing strong performance in a non-trivial task.
Links
GA15-17564S, research and development project |
|