M A S A R Y K U N I V E R S I T Y FACULTY OF INFORMATICS Control of Parametrised Boolean Networks P H . D . THESIS PROPOSAL Eva Šmijáková Advisor: prof. RNDr. Luboš Brim, CSc. Brno, Fall 2020 Signature of Thesis Advisor Contents 1 Introduction 1 2 State of the Art 4 2.1 Preliminaries 4 2.1.1 Regulatory Networks 4 2.1.2 Boolean Networks 5 2.1.3 Parametrised Boolean Networks 8 2.2 Controlling Boolean Networks 9 2.2.1 Stable Motifs Detection 15 2.2.2 Strong Basins Identification 16 2.2.3 Trap Spaces 17 2.3 Controlling Parametrised Boolean Networks 17 2.4 Cell Reprogramming 19 3 Aim of the Thesis 21 3.1 Objectives and Expected Results 21 3.1.1 Formal Problem Definition 21 3.1.2 Development of Efficient Methods 22 3.1.3 Making Developed Methods Available 22 3.1.4 Demonstration of Applicability 22 3.2 Progression Schedule 22 4 Achieved Results 23 4.1 Parameter Synthesis for Hybrid Systems 23 4.2 Definiton of Parametrised Boolean Network Control Problem 24 4.3 Solution of Parametrised Control 24 5 Author's Publications 27 Bibliography 28 A Attached publications 33 Chapter 1 Introduction An attractive challenge for many scientific fields is finding ways to influence systems, so they behave as desired. In reality, this can be very difficult as there might be many different ways to interfere with the systems and at the same time, some of the interventions can be more efficient than others. Thanks to formal methods such as control or parameter synthesis we are able to use abstractions of the systems (models) and compute the the solution to the original real-world problem. The focus of this work is on control of biological systems. In biology, the state into which a system evolves is commonly referred to as phenotype [13]. It is a set of observable characteristics of an organism. Therefore, in biology, the goal of control is to drive a system into the desired phenotype. The current focus of systems biology is on the cellular level [12] as it is the basis of biological systems and even this basic level is still considered difficult to model as there might be hundreds of interactions in the play. The task of controlling cells is called cell reprogramming, and it is one of the most significant challenges of the regenerative medicine [14]. Systems biology uses a variety of models ranging from discrete (e.g. Boolean networks [35]), through continuous (e.g. O D E [18]), to hybrid models [19]. A very efficient model to capture numerous interactions at the cellular level is Boolean network (BN) as this model is both simple and expressive. Moreover, BNs have applications in many other areas, including circuit theory and computer science. BNs are composed of two parts: a finite set of Boolean variables representing genes or other biochemical substances and update functions which specify the way variables dynamically change their value based on influences from other variables. BNs generate a finite discrete state-space what makes them convenient for computational analyses. When starting from any initial state, the B N eventually establishes in some single state or in a set of states. These components of B N state space are called attractors. This notion is an abstraction for phenotypes of the modelled biological systems. 1 A computational model is considered controllable if we can assure that from some initial state, it reaches a desired final state in a finite amount of steps. In the context of BNs, we are interested in B N reaching the target attractor. This property is well-studied in the context of BNs without parameters [47, 28, 40, 15]. However, the problem did not receive a lot of attention in the parametrised setting yet. The definition of control is rather broad and allows many variations of the problem in the context of BN. To change the normal behaviour of BN, logically, we need to interfere with the model in some way. We call these interventions perturbations. However, we can consider many types of perturbations. They vary in three aspects. Firstly, we can either fix variable values to some fixed value or change the rules their evolution. Secondly, we can apply the perturbation just as a single transition step, or we can hold the perturbations for more transition steps, even permanent perturbations are considered. Lastly, the perturbation can be applied only once, or we consider applying perturbations and releasing perturbations at any time. Apart from different kinds of perturbations, control problem also varies in terms of what we are trying to achieve. For example, two elementary types of control problems are: (i) achieving a single desired target attractor irrespective of the current state (target control) [22], (ii) achieving control between every pair of attractors (full-control) [2]. The full-control variant of the problem is much more complicated, but can hugely benefit from the intermediate solutions of the target-control. In practice, it is often complicated to precisely infer Boolean functions from biological data. In contrast, there is typically good evidence of the fact that a variable regulates another one. This issue can be addressed by extending the BNs with logical parameters representing the unknown parts. Parametrised Boolean networks (ParBNs) [50, 5] allow to specify only regulators of a variable and thus capture multiple variants of the possible actual behaviour of variables without conducting many more expensive experimental observations. Every parametrisation can be understood as an instance of a standard BN. The number of these instances can be doubly-exponential in the worst case [44]. This makes the analysis of ParBNs much more complicated compared to the BNs. The parametrised setting brings many new challenges to the control problem. First, the nature of B N state transition graph may change dramatically according to the chosen parametrisation [5, 6]. Even the original goal of the standard control - to stabilise in some target attractor - might not be feasible in some parametrisations as the given attractor is not present in them. Secondly, the B N control problem is optimising - we are trying to minimise the number of perturbations done to a network, while we must guarantee that the target attractor is reached in all B N non-deterministic runs. It was shown that for many biological models, controlling values of only a few variables is sufficient to drive the network into any attractor of 2 the B N [47]. In contrary, this is usually not a case for the ParBNs. Due to the parameters, we would often need to fix the value of many variables to guarantee ParBN to reach the given attractor in all parametrisations. Such an approach is not applicable in practice as each perturbation can be very costly in the real environment. Therefore, the optimisation criteria should be adjusted. Finally, it is not defined how to treat the unknown knowledge about the ParBN. For other parametrised formalisms, often a problem of parameter synthesis is solved by setting the parameters, so they fulfil the given specification [37]. For the control problem, the specification is the system reaching the given attractor. Therefore, technically, if we allowed fixing parameters as a type of perturbation, we might achieve control. However, new questions for formalising this problem would arise, e.g., what is the difference between control problem and parameter synthesis? Can be these perturbations combined? Moreover, it is also unclear whether it is possible to conduct function perturbations to functions which are not fully defined. In our current research line, we decided to treat the parameters as some unknown knowledge and consider only perturbations fixing the value of variables. As the target attractor might not be present in some parametrisations, we restrict the domain of parametrisations to contain only relevant parametrisations - ones which contain the target attractor. To cope with minimisation criteria, the output of control problem for ParBN is a complete mapping from the domain of perturbations to the domain of relevant parametrisations. This way, it is possible to manually explore the available perturbations and consider both cost of perturbation and expected probability perturbation working in the real environment - the proportion of parametrisations for which the perturbation ensures ParBN stabilisation in the given attractor. The first main goal of the thesis is to clarify how the control problem for ParBN should be formally defined and formulate this problem. The second aim is to develop efficient methods working fast even in the parametrised setting. Last but not least, the developed methods should be demonstrated to be applicable for the real ParBN models as a helpful tool for cell-reprogramming. 3 Chapter 2 State of the Art This chapter is split into four sections. Section 2.1 introduces both parameterless and parametrised BNs and all notations related to them. Then, Section 2.2 defines the control problem for parameterless BNs and describes methods developed for this problem. After that, Section 2.3 explains how the control problem was extended from BNs to ParBNs. Finally, Section 2.4 describes how controlling BNs can be applied in practice. 2.1 Preliminaries We start this section by introducing a regulatory network which is an essential concept for modelling biological systems. Then we introduce BNs and concepts related to them. Finally, we describe how are these concepts extended in ParBNs. 2.1.1 Regulatory Networks When modelling biological systems, we commonly start by identifying influences between proteins, genes, and other biochemical substances. We call these influences regulations. The regulations are either activating (positive impact) or inhibiting (negative impact). Such regulations are monotonous. Non-monotonous regulations appear in biological models rarely. In this work we consider only observable regulations meaning that it was proven by experimentation that in some setting where a regulating substance (regulator) is present the regulated substance behaves differently compared to the lack of the regulator. A collection of studied biochemical substances and their regulations is called regulatory network (RN) [16]. The regulatory network can be viewed as a directed graph, where vertices are the substances and edges are the observable regulations. A n example of a R N is depicted in Figure 2.1. 4 Figure 2.1: Example of a R N . The network has three variables (CtrA, DnaA, GcrA), seven regulations from which three are activating (green arrows), three are inhibiting (red arrows) and one is not monotonous (black arrow). All regulations are observable. 2.1.2 Boolean Networks After a R N of a biological system is identified, the concentration of substances is captured by variables and the regulations are translated into mathematical functions. There are many types of models, varying in complexity and precision. Selection of a suitable model and encoding regulations into functions is an extensive part of systems biology [23]. In our work, we use Boolean networks to represent relationships of a R N . Definition 2.1 Boolean network A Boolean network is a tuple B = (V, R, F) such that: • V = {A, B,...} is a finite set of Boolean state variables. • RCVxVisa set of regulations. For A £ V, we say that T(A) = {B € V | (B, A) G R} is the context of A, i.e. the subset ofV regulating A. • T = \Fk | A G V} is a family of logical update functions. The signature of each F A is given by the context of A as Fk : {0, l } 7 " ^ —> {0,1}. A state s of a BN B is a valuation of its Boolean variables s : V —>• {0,1}. The set of all possible states is called state space ofB. We denote state space as 11(B) = {0,1}V . Given a state s, s[A i-> b] denotes a copy s where the value of A is set to b G {0,1}. Finally, for a state s and an update function Fk, we use the abbreviated notation Fk(s) to denote F A applied to s restricted to the context of A. The states of B N are Boolean (binary) configuarations of variables. Because of this, we can conduct standard Hamming operations on them. Given two states s,s', their Hamming difference hdif (s,s') is a set of all variables in which the two states differ: hdif (s, a') = {A G V | a(A) = V(A)} 5 . The Hamming distance is then the cardinality of this set: hdis(a,a') = |hdif(a,a')| Dynamics The complete dynamical behaviour of a Boolean network B is captured by a directed state-transition graph (STG) Q = (S,T), where S = 11(B) and T C S x S. The definition of the transition relation T depends on the updating scheme that defines the way variables are updated in the transitions. The first predominant updating scheme is synchronous which updates all values of variables at once. This generates a deterministic update-scheme which is very convenient to analyse. Nonetheless, these dynamics are less realistic compared asynchronous updating scheme [48]. Through this work, we will always consider asynchronous updating scheme unless explicitly stated otherwise. In asynchronous dynamics, at a discrete time step, the system non-deterministically applies some F A G T to a state a. This way we obtain a transition relation —>• which is defined as follows: (a, t) G—>• if and only if a ^ t A 3Fk G T : a[A i-> FA(a)] = < Note that for every (a, t) G—>• the Hamming distance hdis(a, t) = 1 as during one step only one variable can change its value. For (s,t) G—?•, we simply write a —>• t. We use — t o denote a reflexive and transitive closure of —>•, also writing a —>* t when (s,t) G—>•*. Also note that the transition relation is non-deterministic. We denote Async(B) the state-transition graph of B under asynchronous updating scheme. Attractors When studying the long-term behaviour of a BN, we typically consider only fair infinite paths in the state-transition graph. In a fair path, if a transition is enabled infinitely often, it has to be taken infinitely often [1]. Therefore, the system cannot infinitely delay the available transitions, and it is not possible to cycle forever in a non-terminal strongly connected component. The set of states in which a dynamical system stabilises is called an attractor. Definition 2.2 Attractor Let B = (V,R, F) be a BN. An attractor of B is a terminal strongly connected component (TSCC) in Async{B), i.e. a maximal subset A C 11(B) such that for all s,t G A, s —t, and for all s G A and t G 11(B), a —>• t implies t G A. We denote a set of all attractors of B as A(B) or simply as A if B is clear from the context. For a fixed state a, we define Att(s) to be an attractor A G A such that a G A, or an empty set if a does not belong to any attractor. G Figure 2.2: Attractors, weak basins and strong basins in a B N . The B N contains two attractors: single-state Attractor 1 (lightred area) and cyclic two-states Attractor 2 (light-blue area). Both these attractors have strong basins of size three (solid red area for attractor 1, solid blue for attractor 2 resp.). Note that states of attractors are also parts of their basins. Moreover, the strong basins never have any intersections as given by definition. Finally, the redlined and blue-lined areas contain weak basin states of Attractor 1 and Attractor 2. The strong basin is always a subset of a weak basin. The weak basins are overlapping. Basins For an attractor A G A(B) we also define the notion of weak and strong basins. Definition 2.3 Weak and strong basin Let B = (V, R, F) be a BN and A its attractor. A weak basin of A is the set of states from which it is possible to reach A: WB(JB, A) = {s e 11(B) \s-**t for some t G A} A strong basin is a set of states from which it is not possible to reach any other attractor than attractor A: SB(B, A) = WM(B, A)\ [j WM(B, A') A'sA[A'^A] Notice that due to the fairness property, once a strong basin of an attractor A is reached, the system eventually stabilises in the given attractor. For better illustration, Figure 2.2 depicts weak and strong basins of a BN. 7 2.1.3 Parametrised Boolean Networks As it was already mentioned, in practice, it can be challenging to translate regulations into specific Boolean functions T. This problem can be addressed by extending B N using logical parameters. This way, parametrised logical update functions either return a Boolean value (they behave normally) or a logical parameter representing the uncertainty of the consequent behaviour [50, 5]. Definition 2.4 Parametrised Boolean network We define a parametrised Boolean network (ParBN) to be a tuple B = (V, V, R, P, 5)- Here, V and R are the same as in Definition 2.1 and • V = {P, Q,...} is a finite set of Boolean logical parameters; • P C j O , 1}^ is a subset of valid parametrisations; • # = {Fk | A G V} is a family of parametrised logical update functions. The signature of each Fk is given as Fk : {0, l}7 "^) —>• ({0,1}U?). The notion of the state space of a ParBN is identical to that of a BN. By fixing p G P, we obtain Bp = (V,R,$(p)) (a standard BN), Ap (the set of attractors of Bp), and Attp(s) (the attractor of state s in the parametrisation p). Dynamics The asynchronous semantics of a ParBN B is represented using an edgelabelled state-transition graph Async(B), where each transition s —>• t is labelled with a subset of parametrisations P(s, t) C P for which it is enabled. That is, p G P(s,t) if and only if s —>• t in Async(Bp). In Figure 2.3, we show a small example of ParBN dynamics. In general, the size of the set of all possible parametrisations (parameter space) can be even doubly-exponential in the number of Boolean variables. In particular, the number of Boolean functions in a model with n variables is 22 ™ [44]. It is thus critical to restrict the parameter space as much as possible. In many biological models, regulations are usually supplemented with static constraints limiting their outcomes [24, 38]. We already introduced observability, activation and inhibition as specific properties of regulations in Subsection 2.1.1. In a parametrised setting, these properties can be used as constraints to restrict the parametrisation space. We assume that every regulation in a ParBN can be marked with a subset of these three constraints. Then for all p G P of B, Bp must adhere to these constraints, e.g., a regulation marked activating or inhibiting in B must be activating or inhibiting respectively in the Bp. 8 Figure 2.3: A n example of ParBN STG. In the left, there is a complete state space of a complete edge-labelled state-transition graph of the ParBN. Every edge is labelled by a color which determines its belonging to a parametrisation. In the right, there is depicted a distinct STG for each of these parametrisations. Attractors and Basins Notice that the standard notion of an attractor cannot be directly transferred to a ParBN, because a state can belong to an attractor only in certain parametrisations (for different parametrisations, the attractors do not have to overlap). We say that a subset A C 11(13) is an attractor in a parametrisation p G P if A is an attractor of Bp. Furthermore, given a state s, we can define Ap(s) as the subset of P, such that for each p G Ap(s), Attp(s) / 0 where Attp(s) is Att(s) in Bv. Intuitively, set Ap(s) are the parametrisations in which s is a part of some attractor. Similarly to an attractor, in ParBN basins vary between parametrisations. Moreover, in some parametrisations there might be even completely different attractors, and therefore for some attractors, it might not make sense to compute basins in every parametrisations. Illustration of how can basins vary between parametrisations is provided in Figure 2.4. 2.2 Controlling Boolean Networks A computational model is controllable if we can assure that from some initial state, it reaches a desired final state in a finite amount of steps. This property is well-studied in the context of synchronous BNs [26, 22] and is currently a very attractive challenge for asynchronous BNs. In this section, we first explain what kind of interventions are done to a B N in order to control it. Then we explain the formal objectives of the control problem. Last but not least, we introduce methods developed for controlling asynchronous BNs in subsections 2.2.1, 2.2.2, 2.2.3. 9 Figure 2.4: Attractors, weak basins and strong basins in a ParBN. All parametrisations have the same attractors in this example (orange and pink). In left, there are two same ParBNs. However, in the top one, the pink attractor is depicted, and vertices which are its strong basins are marked with the corresponding parametrisation colour. Similarly, the orange attractor and its strong basin are depicted in the left bottom. Finally, in the right, the attractors (marked vertices), their strong basins (solid lines) and weak basins (dashed-lines) are illustrated for each parametrisation separately. Perturbations Perturbations are the means of controlling a BN. In order to ensure that the B N always reaches some attractor which it does not reach in some of its normal runs, logically, we need to interfere with the system in some way. This is exactly what perturbations are - a forced intervention to the dynamics of a system. B N perturbations are mainly differentiated between state and function perturbations [27]. State perturbations force the system to change its state and restrict the state-space of the BN. Therefore, some edges and nodes are discarded from the state space. The second kind of perturbations is function perturbations that adjust the update function(s) and, therefore, modify the S T G edges. Typically, changing the processes in a cell (function perturbation) is more difficult then artificially adding or extracting a biochemical substance (state perturbation) [47]. That is why most of the research is focused on the state perturbations. Temporal properties of perturbations The perturbations can be also differ based on their temporal characteristics. The first classification is based on the length of the applied perturbation. We consider one-step, temporary, and permanent perturbations. Figure 2.5 shows an intuition on differences between these types of perturbations. 10 One-step Temporary Permanent Initial state Infinity I > Control beginning Figure 2.5: Difference between one-step, temporary and permanent control. The black dots represent initial states. Red dots represent states which were reached during the time while B N was perturbed and finally, blue dots represent states to which B N transitioned with its natural behaviour. The run of B N would continue in the same nature as its last transition. One-step perturbations (OSPs) are typically state perturbations and we consider only this type of OSP in our work. The OSP sets values of the controlled variables once, and then the network is left to evolve in its natural dynamics. If the resulting OSP transition is present in the original BN, it is forced and no other transition can be done from the given state. Alternatively, even if the perturbing transition was not present in the original BN, the state change is driven nonetheless. This is always the case if the size of perturbation is more than one (only one variable can change at a timestep in natural B N dynamics). The notion of one-step state perturbation is defined in the following way: Definition 2.5 One-step perturbation of Boolean network Given BN B, a one-step perturbation OSP is a tuple (1, 0) where 1, 0 C V, 0 and 1 are mutually disjoint (possibly empty) subsets of variables of B. These variables are called perturbed variables. The set of all possible onestep perturbations is denoted OSP. An application of perturbation OSP to a state s is denoted 5(s, 1, 0) = s', results in a state s' defined as: (l v el o v e o s(v) otherwise The size of OSP is defined as Size{OSP) = |1| + |0|. 11 Apart from one-step perturbations, we introduced temporary and permanent perturbations. When applying temporary perturbation, there exists a finite amount of computational steps after which the perturbation is released, and B N is left to evolve in its natural dynamics. Temporary state perturbation applied for only a single time step is equivalent to OSP. Finally, the most intrusive perturbations are permanent. Nonetheless, this kind of perturbation is rather unrealistic in practice, especially for the state perturbations, as the given substance would need to be added or extracted forever. Moreover, the permanent perturbation might disrupt the network's original long-term behaviour and causing the target attractor to be no longer present in the BN. When applying state perturbation (SPs) for more than one time-steps, the STG of the B N is adjusted in the following way: 1. To all states with variable values differ form the perturbed variables, we apply the OSP with the same perturbed variables. 2. The transitions to all states which do not comply with the perturbed variables are removed. These interventions ensure that only available transition from unfit states are the minimal changes which drive the network into a valid state. Thanks to the second STG change, the values of perturbed variables remains preserved while the SP is held. Formally, SP of B N are defined as: Definition 2.6 State perturbation of Boolean network Given a BN B, a state perturbation SP is a tuple (1,0) where 1,0 C V, 0 and 1 are mutually disjoint (possibly empty) subsets of variables of B. The set of all possible state perturbations is denoted SP. An application of transition system perturbation SP to Async(B) is defined in the following way: -^SP = {s ->• 5(s, 1,0) I s e SA3v : (s(v) = 1 A v G 0) V (s(v) = 0 A v G 1)} U { s ^ í £ Async(B) | Vw : (v G 1 => (s(v) = 1 A t(v) = 1)) A (v G 0 (s(v) = 0 A t(v) = 0))} The size of SP is defined as Size(SP) = |1| + |0|. As a result, when controlling B N via SP, we make n transitions in the —»SP, obtaining state s'. After that (unless we consider permanent perturbation) we let B N evolve in its original transition system —>. Finally, the target attractor must be reached after a finite amount of transitions. 12 Figure 2.6: Difference between initial, sequential, and sequential attractor-based perturbations. The figure is only illustrative and does not show precise transitions or perturbations done to the BN. The black points are non-attractor states of B N . The colourful points are attractors, the black arrows are transitions conducted by natural B N dynamics, and the red arrows are transitions done under the effect of perturbations. Types of control: a) initial perturbation, b) sequential perturbation, c) sequential attractor-based perturbation. Usually, state perturbations are very expensive. Therefore, we want to keep the amount of performed perturbations as small as possible. Thus, control is an optimising problem and is typically measured by a number of perturbed variables, respectively size of the perturbations. In some cases, we may also choose when the perturbations are applied. In initial or also referred to as immediate control, the perturbations are applied to the given initial state. Alternatively, during sequential control, (possibly different) perturbations can be applied multiple times at different states. This way, we can potentially control the system using fewer perturbations. However, this approach brings new issues, such as dealing with the nondeterminism of the B N (different perturbations may be required for different branches of the non-deterministic network's behaviour). Moreover, we would need to be able to precisely observe the current state of the BN, which can be very problematic to obtain in practice. These issues can be addressed by using attractor-based sequential approach [28]. Differences between these approaches are depicted in Figure 2.6. 13 c) d) Figure 2.7: Difference between control objectives. This figure is only illustrative and does not show precise transitions or perturbations done to the BN. The black points are non-attractor states of BN, the colourful points are attractors, and the red arrows shows what we are trying to achieve by applying perturbations. Types of control: a) source-target control, b) target control, c) full-control, d) all-pairs control. Control Objective Finally, we consider different control objectives, defined in [4] as follows: 1. Source-target control: Given a source state s G 11(23), and a target attractor T G A , find a set of perturbations that when applied, the B N always converges from the state s to the attractor T. 2. Target control: Given a target attractor T G A, find a set of perturbations so that for every source attractor S G A (such that S ^ T) when some subset of perturbations is applied, the B N converges from S to T. 3. Full control: Find a set of perturbations such that for all pairs of distinct attractors S, T G A , applying a subset of this set guarantees that the B N converges from S to T. 4. All-pairs control: Given a subset of source attractors S C A and target attractors T C A, find a set of perturbations such that for every pair S G S and T G T, applying a subset of the perturbations guarantees that the B N converges from S to T. The differences between control objectives are depicted in Figure 2.7. 14 Summary In summary, the control problems for BNs differ in the following aspects: • What do we want to control; goal: What is the initial state of the BN? Where we want to end? Do we want to control only one scenario or multiple scenarios? • What perturbations we apply: We can either perturb states functions of variables. • For how long is perturbation applied: We can perturb B N only during a single computational step, or we can hold it temporarily, or even forever • When we apply the perturbations: Only once from an initial state in contrast with applying control to an arbitrary state and any amount of times. Now we will describe methods developed to solve the control problem for asynchronous BNs. 2.2.1 Stable Motifs Detection When analysing even as simple mathematical model as a BN, we often face state-explosion problem. Although B N state space can be fully enumerated in the binary environment of the current computational technology, it grows exponentially with the number of variables. Some techniques address this issue by analysing only R N along with the update functions at first. The obtained information is then used to dramatically reduce the size of explicit state space, which needs to be explored to solve the B N control problem. One such method is based on the identification of stable motifs [47]. A stable motif is a subset of variables and their values which once reached: they never change. These components are identified using only the R N and update functions. After identification of all stable motifs, we obtain something like a decision scheme for all non-deterministic runs of a B N . With this knowledge, we can navigate the non-deterministic runs of B N and decide what interventions should be applied to reach the target attractor. The strength of this approach resides in its global-level analysis. Once the stable motifs are identified for a BN, we can easily use it for computing control for any attractor of the BN, thus obtaining a solution to a full-control variant of the problem. This also gives us an insight into the robustness of the underlying system. 15 2.2.2 Strong Basins Identification Another branch of methods to solve the B N control is based on an efficient identification of the target attractor strong basin (see Subsection 2.1.2). The strong basin is by definition a maximal set of states from which only the given attractor is reachable. Therefore, after a strong basin of the attractor is identified, we just need to determine the smallest perturbation which drives B N into the strong basin to solve the source-target control problem with onestep perturbations. The situation gets more complicated when considering other types of perturbations (i.e. temporary and permanent), but identifying a strong basin is crucial when using these perturbations. However, here, we will not go into more technical details about these algorithms. The trivial method to identify a strong basin is a fixed-point approach. We iteratively remove nodes from the weak basin which have transitions outside the basin. When there are no nodes which can leave the basin, we obtained the strong basin. This method is, however, not efficient enough for big biological networks, and a more efficient method was developed [33]. The method is based on a state-graph decomposition to blocks. This algorithm's high-level idea is the following: Every basic block is composed of a strongly-connected component of the STG and its parent states. These blocks are considered in partial-projections of BN, meaning only some part of variables are considered. Once basic blocks are identified, they are topologically sorted. Only then the blocks are unfolded, starting from the block containing the attractor until all states of the strong basin are identified. These methods are mostly focused on source-target control, but identification of a strong basin is a good basis for solving target control as we can easily re-use the knowledge of strong basin for different sources [40]. In contrast to the previous stable-motifs method, we need to run strong basin identification for every target attractor separately to obtain the solution to control problems with more advanced objective (full-control or all-pair control). On small BNs, the decomposition-based method was demonstrated to be more effective even when solving full-control [32]. For bigger networks, the stable motifs based method remained faster. The strong basin identification method was utilized for solving sourcetarget control problem with all types of perturbations: initial one-step [33, 32], temporary and permanent [42], and also sequential (attractor-based) one-step [28], temporary and permanent [41]. The research group is currently shifting its focus to the target control and successfully solved this problem with all types of initial perturbations [40]. All these methods were comprised into a software toolkit C A B E A N [39]. 16 2.2.3 Trap Spaces Another, very recent method towards B N control is also based on the analysis of B N STG structural properties [15]. Whereas previously, we always considered a specific attractor which has to be reached, this approach also allows specification of a phenotype given by mapping a subset of variables to some value. Of course, if we map all variables to some value, we obtain one specific state (e.g. steady-state attractor) what is a standard target control problem. Trap spaces are the parts of the BNs which can not be left once reached. Compared to the strong basin, however, if we find some trap spaces in synchronous BNs, the trap spaces are also contained in asynchronous BNs. Therefore we can use the knowledge obtained from analysing relatively simpler dynamics of synchronous BNs and use it for computing control of asynchronous BNs. The perturbations used in this approach are temporary state perturbations. The approach was demonstrated to work well on both phenotype and attractor control on fairly big models (models with 60 variables and 150 attractors). 2.3 Controlling Parametrised Boolean Networks In this section, we first explain why the definition of ParBN control is considered problematic. Then, we provide formal definitions of the problem. Finally, we briefly discuss advances in solving this problem. To the best of our knowledge, the problem of ParBN control was not solved by anyone before our work. Therefore, our first task was to formally define this problem and formalise the perturbations we allow to be done to the network. As previously mentioned, we need to decide whether parametrised functions can be influenced in some way. In the current research, we decided to treat parameters as some unknown knowledge and not influence functions. We decided to go this way, mainly because the literature suggests that engineering proteins to behave differently are not very common and are usually very expensive [47]. We consider two types of perturbations, same as in the B N case: OSP (see Definition 2.5) and SP (see Definition 2.6). Compared to parameterless BNs, for ParBNs it is typically impossible to find a perturbation small in size which would work in all parametrisations. This is because there are two things to consider: Firstly, the target state might not even be part of an attractor in some parametrisations. That is why we consider only parametrisations where t is part of some attractor: Ap{t) (see Subsubsection 2.1.2) for solving the control problems. Secondly, also strong basin might dramatically vary between parametrisations, and there might be no intersection between the parametrisations. 17 That is why we define the solution to ParBN control problem as a complete mapping from all possible perturbations which can be done to a ParBN to all parametrisations, for which the said perturbation ensures, that after applying the perturbation to an initial state s, B N (with the fixed parametrisation) passes t infinitely many times in its all possible runs. Formally: Definition 2.7 One-step source-target control of a ParBN B Given a ParBN B, source state s, target state t and a set of parametrisations in which t is a part of some attractor Ap(t), find the maximal mapping OC : QSP -> Ap(t), such that when OSP G QSP is applied to Bp at the source state s, Bp passes infinitely many times through the given target state t. This holds for every p G OC{OSP). Definition 2.8 Temporary source-target control of a ParBN B Given a ParBN B, source state s, target state t and a set of parametrisations in which t is a part of some attractor Ap(t), find the maximal mapping TC : SP -+Ap(t), such thatjor every SP G SP and p G TC(sp) if SP is applied to Bp, the perturbed Bp after a finite amount of steps surely evolves from s to some transient state x for which if x is an initial state of Bp, Bp passes infinitely many times through the given target state t. This holds for every p G TC(SP). Definition 2.9 Permanent source-target control of a ParBN B Given a ParBN B, source state s, target state t and a set of parametrisations in which t is a part of some attractor Ap(t), find the maximal mapping PC : SP —>• Ap{t), such that when SP G SP is applied to B, the perturbed Bp evolves from s into the t and passes the t infinitely many times (the t is still an attractor after TSP). This holds for every p G PC(TS). Notice, that source-target problems in the context of ParBN are aiming to drive the network into a target state of some attractor instead of an attractor itself. This is because parameterisations might contain attractors which are considered similar as all of them contain the given state. Therefore, in all parameterisations Ap(t) the given state t is entered infinitely often by the controlled BN. If this is not the case and some ParBN parameterisations contain attractors which are out of the interest, the set of parameterisations Ap(t) might be replaced with an arbitrary one. We can observe, that it is always possible to bring the system into the given target state by setting the ParBN variables to values of the given target state. We call this kind of state perturbation trivial. However, we want to minimise the number of controlled variables, and the trivial solution is usually not optimal. In a non-parametrised setting, the size of the control is typically the only considered optimisation criterion. 18 In a ParBN, the situation is further complicated due to the dependence on parameters. To reach some attractor, it is sufficient to reach its strong basin after the application of control. Nonetheless, the strong basin of an attractor can vary according to the parametrisation, and the control thus "works" only for a subset of parametrisations Ap(t). We want this subset to be as big as possible. To that end, we consider the notion of robustness that normalises the number parametrisations for which the given perturbation fulfills the goal of the control. Definition 2.10 Robustness of perturbation Given a ParBN B, a target state t and a control mapping C, the robustness of a perturbation P is defined as the ratio between the number of parametrisations for which the perturbation always achieves control objective and the number of all relevant parametrisations: \Ap(t)\ We solved one-step source-target control for ParBN in [9]. The method is based on a fixed-point strong basin identification approach (see Subsection 2.2.2), but is adapted to the ParBN setting. The implementation is semisymbolic; the parameterisations are represented symbolically while the TSG of ParBN is explicit. The symbolic representation of parametrisations is realised using reduced ordered binary decision diagrams (BDDs) [10]. This compact representation also allows formal representation of R N constraints (activation, inhibition, observability) and efficient probing of ParBN edgelabelled TSG. The method was shown to be more effective than computing source-target for every parametrisation distinctly. 2.4 Cell Reprogramming This section explains cell reprogramming in deeper detail. First, we describe the natural processes which cell reprogramming is aiming to replicate. Then we list concrete applications of cell reprogramming that are currently researched in regenerative medicine. The most usual process of a cell changing into another phenotype is called cell differentiation. A n example of this is stem cells which differentiate into daughter cells according to the organism needs. The process of cell specialisation is also often referred to as a cell-fate. Another natural mechanism is cell dedifferentiation where already differentiated cells revert to an earlier developmental stage. Dedifferentiation is common for more basal life forms like worm or amphibians [21]. Similarly, some factors can allow a differentiated cell to become the other kind of differentiated cell. 19 Such a process is known as transdifferentiation. These possibilities observed in some organisms provided inspiration for the emerging field of regenerative medicine. The organism's need to differentiate cells is communicated using signalling pathways [8]. The signals are transmitted via a cascade of biochemical reactions, mostly protein phosphorylation catalysed by a protein kinase. We can model these signals using BN. If we are able to identify which signals nurture the cell's dedifferentiation of transdifferentiation, we can simulate these signals and artificially force the cell to undergo this process. This procedure is called cell reprogramming. Finding efficient perturbations for controlling BNs is the abstraction of identification of these signals. T-cells are a primary immune mechanism of a human body. These cells must differentiate very quickly to fight the disease successfully. However, the diseases of immunity (e.g. leukaemia) can disrupt T-cells differentiation's signalling pathways and are very difficult to treat. Cell reprogramming can identify the broken signals and thus help to fight this diseases [29, 34, 47]. Another target of interest for cell reprogramming is embryonic stem cells. These cells can differentiate into many other types of cells and are promising vast clinical uses such as treatment of diabetes and heart disease, or contrariwise - degenerating tumours and reverting unwanted immune responses [20, 36, 43, 46, 45]. There are many more potential use-cases for cell reprogramming. To mention a few more: curing obesity by converting fat cells to the brown type [49], recovering malfunctioning cells [30], or even organs [17]. That is why we consider ParBN control problem very up to date and potentially widely applicable. 20 Chapter 3 Aim of the Thesis 3.1 Objectives and Expected Results The main aim of this thesis is to define and solve the control problem for ParBN. Solving this problem should be primarily applicable for the biological task - cell reprogramming. In order to achieve this, we plan to do the following steps: 1. Formally define control problem for ParBN 2. Develop efficient methods for solving the ParBN control problem 3. Bundle the developed methods to a software toolkit 4. Demonstrate applicability for biological systems 3.1.1 Formal Problem Definition The problem of ParBN control has not been formalised prior to our work. We identified several crucial issues which need to be considered when defining the problem. Firstly, attractors of the ParBN vary according to the parametrisations. Therefore, searching for perturbations that drive the network into a target attractor does not make sense for parametrisations, where the given attractor is not present. Secondly, the practical realisation of perturbations requires non-trivial effort; therefore, in normal B N setting the number of perturbations is sought to be minimal. However, there is rarely a solution with a small number of perturbations in the parametrised setting, which would work for a big portion of parametrisations. Thus, optimisation criteria should be adjusted accordingly. Last but not least, the control problem can be considered related to the parameter synthesis problem. Therefore it is important to make clear first how we can treat the parametrisations. Whether we can influence their values somehow or treat parameters as some unknown knowledge only. 21 3.1.2 Development of Efficient Methods As we described in the previous chapter, the control problem for asynchronous B N is well-studied. Therefore, after control for ParBN is formally defined, we could probably use the existing methods for asynchronous B N control to solve the problem for every parametrisation distinctly. Nonetheless, the number of these parametrisations can be doubly-exponential, meaning even efficient method for B N control would need to be run too many times to be handled in a realistic time. That is why a novel, efficient approach needs to be invented for the parametrised setting. 3.1.3 Making Developed Methods Available All developed methods should be bundled into a publicly available software as a part of the BioDivine toolkit [3]. Depending on the methods, they will be accessible in some console application or a web application with intuitive UI, i.e. incorporated into Aeon tool [6]. 3.1.4 Demonstration of Applicability The developed methods should be demonstrated as applicable to real-world problems. In particular, we are interested in the cell-reprogramming task, but finding other use-cases of the developed methods would be definitely gainful. 3.2 Progression Schedule Plan for future study and research activities: • Spring 2021: Further refining of formal definitions of ParBN control. • Spring 2021: Doctoral exam and defence of this thesis proposal. • Spring 2021 - Fall 2021: Develop effective solutions for ParBN control. • Fall 2021 - Fall 2022: Development of the software tool. • Spring 2022: Research biological applications of developed methods. • Fall 2022 - Spring 2023: Work on the thesis text. • Spring 2023: Thesis defence. Relevant theoretical results should be submitted for publication at appropriate conferences and journals, such as Computer Aided Verification, International Symposium on Automated Technology for Verification and Analysis, Symposium on Formal Methods, Journal of Theoretical Computer Science, PLOS ONE, or Transactions on Computational Biology and Bioin- formatics. 22 Chapter 4 Achieved Results The focus of my research is on developing formal methods applicable to systems biology. One of the very attractive challenges of the systems biology is finding ways to influence the systems, so they behave in the desired way. This is typically done by solving either parameter synthesis where appropriate parameters are sought or control where some other kinds of perturbations are done to the system in order to reach the desired state. In this chapter, I first present results from the beginning of my studies when I was researching parametrised hybrid systems (Section 4.1). Later, I shifted my attention to ParBNs as this model is more convenient for capturing numerous interactions of a cell. Moreover, we have seen a lot of potential for applying results from the current focus of Sybila research group (bifurcation of discrete systems [31]) into problem control of ParBNs. These two research lines share the need for efficient identification of components that change with parametrisations of the system. Sections 4.2 and 4.3 summarise my progress done towards the fulfilling aims of my thesis, introduced in the previous chapter. 4.1 Parameter Synthesis for Hybrid Systems At the beginning of my research, I was focused on studying parametrised hybrid systems. I followed up the previous research of the Sybila research group - parameter synthesis for piecewise multi-affine dynamical systems from the specification of hybrid C T L (HCTL) based on the rectangularisation [7]. I extended this method to work for hybrid systems. Rectangularisation is an effective framework which over-approximates multi-affine dynamical system into a discrete transition system. I used this method to obtain discrete abstractions of the continuous parts of the hybrid system. Therefore, my main task was inventing a way to join the multiple discrete systems into one compact discrete system representing a correct over-approximating abstraction of the given hybrid system. I have published 23 and presented this article at the CMSB conference [37]. Apart from deriving a model based on a specification, the parameter synthesis can also be used to study a parametrised model's properties. Thanks to the expressiveness of HCTL, the parameter synthesis can be, for example, used to identify how components of the system change with different parametrisations. These components include attractors (see Definition 2.2) and basins (see Definition 2.3). Therefore, we see H C T L parameter synthesis to have also potential in aiding the ParBN control. 4.2 Deflniton of Parametrised Boolean Network Control Problem The first objective of my thesis is to define the control problem for ParBN formally. I have formulated three variations of this problem: one-step source-target control (Definition 2.7), temporary source-target control (Definition 2.8), and permanent source-target control (Definition 2.9). These definitions can be extended to other types of control objectives (see Section 2.2). Other space for extending the formal definition of this problem is including sequential or attractor-based sequential types of perturbations to the definitions or generalise the notions of perturbations to include various perturbations in one definition. In summary, these are the ways we dealt with the three issues of ParBN control problem definition mentioned in Subsection 3.1.1: 1. Some parametrisations do not contain the target attractor: We consider only a subset of parametrisations Ap{t) where the target state t is part of some attractor. 2. Inconclusive optimisation criteria: In the parametrised setting there is not only a single criterion regarding the perturbation optimality. That is why the ParBN control is not optimising problem as opposed to normal B N control. Instead, the output of ParBN control is a complete mapping from every possible perturbation to a set of parametrisations for which the given perturbation works. 3. Relation to parameter synthesis: We treat parameters only as an unknown knowledge and allowed (state) perturbations do not interfere with the parameters as they fix the value of variables irrespective of the function parameters. 4.3 Solution of Parametrised Control The first type of control I solved was one-step source-target control (see Deflniton 2.7). I chose the representation of explicit state space, while 24 parametrisations are represented symbolically via BDDs. This allows computing the reachability procedures in parallel. For the underlying implementation, I worked with internal libraries of the tool Aeon [6] which provides significant part of the necessary functionality, including a convenient format for specifying ParBNs and parallel reachability procedures. My algorithm is based on the fixed-point approach for strong basin computation in non-parametrised BNs [32]. However, compared to the "ordinary" strong basin computation, I work with the parametrised structure and similarly to control, I compute mapping from states to parametrisations where the given state belongs to the strong basin. Therefore, instead of states, only parametrisations are iterativelly removed from the mapping. Currently, I am developing solution to temporary and permanent controls (see Defintions 2.8 and 2.9) based on algorithms from [41]. The prototype implementations of all my solutions are available as an open-source Rust library at http://github.com/sybila/biodivine-pbn-control. I evaluated performance of parametrised strong basin identification on two real-life B N models: a cell-fate decision model [11] and a myeloid differentiation network [25]. I compared the performance of my approach using a different number of parameters implanted into the models, resulting in different sizes of the relevant parameter space. All measurements were conducted using a machine equipped with A M D Ryzen Threadripper 2990WX 32-Core Processor and 64GB of memory. The results of strong-basin computations are shown in Table 4.1. It is possible to "virtually" compare my approach to the naive parameterscan approach. In [4], a strong basin of (non-parametrised) asynchronous BNs is computed using a block decomposition method with 4 ms needed to finish the computation for the myeloid model. Even if the reported hardware was slower than in our case and we assume that the strong basin computation for one parametrisation would last only 1 ms, the fully parametrised myeloid model contains an attractor which is present in 3.7 x 109 parametrisations. Therefore, the expected time for computing a strong basin for all parametrisations with 32-fold parameterisation is more than a day compared to less than 27 seconds achieved using our parameter-based semi-symbolic approach. As the developed algorithm enables parallelisation, I evaluated the scalability of my method on the fully parametrised myeloid model. The results are shown in Table 4.2. The computation was restricted to the specified amount of CPUs. The final speed-up achieved on our machine, when using 32 CPUs compared to a non-parallel C P U usage was about 10-fold. 25 Table 4.1: Results of strong basins computation. The values are stated as ranges because we computed strong basins of all attractors considered in the given models. The second column shows the number of model's parameters. The third column shows count ranges of parametrisations which contain the given attractor. The fourth column displays ranges of the number of states in the strong basin. The last column contains ranges of times needed to compute strong basins. Model \V\ \Ap(t)\ # SB states Time 1 1 32 - 352 4.4 - 9.19 s Cell-Fate 8 1 - 4 21 - 79 4.63 - 13.42 s 20 7 - 5 6 1632 - 262,144 5.82 - 26.59 s 1 1 64 - 384 8 - 30 ms Myeloid 32 63 - 2,052 64 - 1,472 14 - 214 ms Myeloid 70 5.9 x 104 - 1.8 x 107 256 - 2,048 147 - 1717 s 94 3.4 x 106 - 3.7 x 109 1,024 - 2,048 0.6 - 15.38 s Table 4.2: Scalability of strong basin computation. The strong basins are computed on attractors of myeloid model. # CPUs Attr. 1 Attr. 2 Attr. 3 Attr. 4 Attr. 5 Attr. 6 1 6.13 s 99.31 s 71.32 s 130.17 s 45.84 s 136.65 s 2 3.34 s 54.32 s 38.87 s 71.31 s 24.95 s 74.29 s 4 1.86 s 31.3 s 21.83 s 40.31 s 13.71 s 42.26 s 8 1.11 s 19.34 s 13.31 s 24.68 s 8.4 s 26.49 s 16 0.87 s 14.03 s 9.32 s 17.56 s 5.77 s 18.86 s 32 0.6 s 10.98 s 6.87 s 13.22 s 4.54 s 15.38 s 26 Chapter 5 Author's Publications • Šmijáková, E., Pastva, S., Šafránek, D., Brim, L.: Parallel parameter synthesis for multi-affme hybrid systems from hybrid C T L specifications. In: Computational Methods in Systems Biology, pp. 280-297. Springer (2020) Personal contribution: 75% (Technical implementation of the algorithm, writing the most of the paper text, presenting results at the conference) • Brim, L., Pastva, S., Šafránek, D., Šmijáková, E.: Parallel one-step control of parametrised Boolean networks, (submitted to Mathematics journal) Pre-print available at: https://arxiv.org/abs/2009.00359 Personal contribution: 85% (Conceptualisation, technical implementation of the solution, case-studies, writing the most of the paper text) 27 Bibliography [1] Abadi, M., Lamport, L.: Conjoining specifications. A C M Transactions on Programming Languages and Systems 17(3), 507-534 (1995) [2] Fiedler et al., B.: Dynamics and control at feedback vertex sets, i: Informative and determining nodes in regulatory networks. J. Dyn. Differ. Equ. 25(3), 563-604 (2013) [3] Barnát, J., Brim, L., Černá, L, Dražan, S., Fabriková, J., Láník, J., Šafránek, D., Hongwu, M .: BioDiVinE: A framework for parallel analysis of biological models. In: Proceedings of 2nd International Workshop on Computational M odels for Cell Processes, pp. 31­45. E P T C S (2009) [4] Baudin, A., Paul, S., Su, C , Pang, J.: Controlling large Boolean networks with single­step perturbations. Bioinformatics 35(14), Í558­Í567 (2019) [5] Beneš, N., Brim, L., Pastva, S., Poláček, J., Šafránek, D.: Formal analysis of qualitative long­term behaviour in parametrised Boolean networks. In: Formal M ethods and Software Engineering, pp. 353­369. Springer (2019) [6] Beneš, N . , Brim, L., Pastva, S., Šafránek, D.: A E O N : attractor bifurcation analysis of parametrised Boolean networks. In: Computer Aided Verification. Springer International Publishing (2020) [7] Beneš, N . , Brim, L., Demko, M . , Pastva, S., Šafránek, D.: Pithya: A parallel tool for parameter synthesis of piecewise multi­affme dynamical systems. In: Computer Aided Verification. CAV 2017. pp. 591­598. Springer International Publishing (2017) [8] Bradshaw, R., Dennis, E.: Handbook of Cell Signaling. Handbook of Cell Signaling, Elsevier/Academic Press (2010) [9] Brim, L., Pastva, S., Šafránek, D., Šmijáková, E.: Parallel one­step control of parametrised Boolean networks. arXiv preprint arXiv:2009.00359 (2020) 28 [10] Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE Trans. Computers 35(8), 677-691 (1986) [11] Calzone, L., Tournier, L., Fourquet, S., Thieffry, D., Zhivotovsky, B., Barillot, E., Zinovyev, A.: Mathematical modelling of cell-fate decision in response to death receptor engagement. PLOS Computational Biology 6(3), 1-15 (2010) [12] Cardelli, L.: Abstract machines of systems biology. In: Transactions on Computational Systems Biology III. pp. 145-168. Springer Berlin Heidelberg (2005) [13] Cheng, K., Katz, S., Lin, A., Xin, X., Ding, Y.: Chapter four - wholeorganism cellular pathology: A systems approach to phenomics. In: Genetics, Genomics and Fish Phenomics, vol. 95, pp. 89-115. Academic Press (2016) [14] Cherry, A.B.C., Daley, G.Q.: Reprogramming cellular identity for regenerative medicine. Cell 148(6), 1110-1122 (2012) [15] Cifuentes Fontanals, L., Tonello, E., Siebert, H.: Control strategy identification via trap spaces in Boolean networks. In: Computational Methods in Systems Biology, pp. 159-175. Springer (2020) [16] Davidson, E., Levin, M.: Gene regulatory networks. Proceedings of the National Academy of Sciences 102(14) (2005) [17] Goligorsky, M.S.: New trends in regenerative medicine: reprogramming and reconditioning. Journal of the American Society of Nephrology 30(11), 2047-2051 (2019) [18] Gratie, D.E., Iancu, B., Petre, I.: O D E Analysis of Biological Systems, pp. 29-62. Springer Berlin Heidelberg (2013) [19] Grossman, R.L., Nerode, A., Ravn, A.P., Rischel, H.: Hybrid systems, vol. 736. Springer (1993) [20] Herberts, C.A., Kwa, M.S., Hermsen, H.P.: Risk factors in the development of stem cell therapy. Journal of Translational Medicine 9(1) (2011) [21] Jopling, C , Boue, S., Belmonte, J.C.I.: Dedifferentiation, transdifferentiation and reprogramming: three routes to regeneration. Nature Reviews Molecular Cell Biology 12(2), 79-89 (2011) [22] Kim, J., Park, S.M., Cho, K.H.: Discovery of a kernel for controlling biomolecular regulatory networks. Nature Scientific Reports 3(2223) (2013) 29 [23] Kitano, H.: Computational systems biology Nature 420(6912), 206-210 (2002) [24] Klarner, H.: Contributions to the Analysis of Qualitative Models of Regulatory Networks. Ph.D. thesis, Free University of Berlin (2015) [25] Krumsiek, J., Marr, C , Schroeder, T., Theis, F.J.: Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network. PLOS O N E 6(8), 1-10 (2011) [26] Liu, Y.Y., Slotine, J.J., Barabasi, A.L.: Controllability of complex networks. Nature 473(7346), 167-173 (2011) [27] Mandon, H.: Algorithms for Cell Reprogramming Strategies in Boolean Networks. Theses, Universite Paris-Saclay (2019) [28] Mandon, H., Su, C , Haar, S., Pang, J., Pauleve, L.: Sequential reprogramming of Boolean networks made practical. In: Computational Methods in Systems Biology, pp. 3-19. Springer (2019) [29] Miskov-Zivanov, N . , Turner, M.S., Kane, L.P., Morel, P.A., Faeder, J.R.: The duration of T cell stimulation is a critical determinant of cell fate and plasticity. Science Signaling 6(300) (2013) [30] Motter, A.E., Gulbahce, N . , Almaas, E., Barabasi, A.L.: Predicting synthetic rescues in metabolic networks. Molecular Systems Biology 4(1) (2008) [31] Pastva, S.: Discrete Bifurcation Analysis. Thesis proposal, Masarykova univerzita, Fakulta informatiky, Brno (2019) [32] Paul, S., Pang, J., Su, C : On the full control of Boolean networks. In: International Conference on Computational Methods in Systems Biology, pp. 313-317. Springer (2018) [33] Paul, S., Su, C , Pang, J., Mizera, A.: A decomposition-based approach towards the control of Boolean networks. In: Proceedings of the 2018 A C M International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 11—20. Association for Computing Machinery (2018) [34] Saadatpour, A., Wang, R.S., Liao, A., Liu, X., Loughran, T.P., Albert, I., Albert, R.: Dynamical and structural analysis of a t cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia. PLOS Computational Biology 7(11), 1-15 (2011) [35] Schwab, J.D., Kuhlwein, S.D., Ikonomi, N . , Kiihl, M . , Kestler, H.A.: Concepts in Boolean network modeling: What do they all mean? Computational and Structural Biotechnology Journal 18, 571-582 (2020) 30 [36] Singh, A.M., Dalton, S.: The cell cycle and myc intersect with mechanisms that regulate pluripotency and reprogramming. Cell Stem Cell 5(2), 141-149 (2009) [37] Smijáková, E., Pastva, S., Šafránek, D., Brim, L.: Parallel parameter synthesis for multi-affme hybrid systems from hybrid C T L specifications. In: Computational Methods in Systems Biology, pp. 280-297. Springer (2020) [38] Streck, A.: Toolkit for reverse engineering of molecular pathways via parameter identification. Ph.D. thesis, Free University of Berlin (2016) [39] Su, C , Pang, J.: C A B E A N : a software for the control of asynchronous Boolean networks. Bioinformatics (2020) [40] Su, C , Pang, J.: A dynamics-based approach for the target control of Boolean networks. In: Proceedings of the 11th A C M International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 1-8 (2020) [41] Su, C , Pang, J.: Sequential temporary and permanent control of Boolean networks. In: Computational Methods in Systems Biology, pp. 234-251. Springer (2020) [42] Su, C , Paul, S., Pang, J.: Controlling large Boolean networks with temporary and permanent perturbations. In: International Symposium on Formal Methods, pp. 707-724. Springer (2019) [43] Takahashi, K., Yamanaka, S.: Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126(4), 663-676 (2006) [44] Wang, R.S., Saadatpour, A., Albert, R.: Boolean modeling in systems biology: an overview of methodology and applications. Physical biology 9(5) (2012) [45] Wernig, M . , Meissner, A., Foreman, R., Brambrink, T., K u , M . , Hochedlinger, K., Bernstein, B.E., Jaenisch, R.: In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448(7151), 318- 324 (2007) [46] Young, R.: Control of the embryonic stem cell state. Cell 144(6), 940- 954 (2011) [47] Zaňudo, J.G.T., Albert, R.: Cell fate reprogrammi ng by control of intracellular network dynamics. PLOS Computati onal Biology 11(4), 1- 24 (2015) 31 [48] Zheng, D., Yang, G., L i , X . , Wang, Z., Liu, F., He, L.: A n efficient algorithm for computing attractors of synchronous and asynchronous Boolean networks. PLOS O N E 8 (2013) [49] Zhu, Y., Yang, R., McLenithan, J., Yu, D., Wang, H., Wang, Y., Singh, D., Olson, J., Sztalryd, O , Zhu, D., et al.: Direct conversion of human myoblasts into brown-like adipocytes by engineered super-active PPAR7. Obesity 23(5), 1014-1021 (2015) [50] Zou, Y . M . : Boolean networks with multiexpressions and parameters. I E E E / A C M Transactions on Computational Biology and Bioinformatics 10, 584-592 (2013) 32 Appendix A Attached publications 33 Check for updates Parallel Parameter Synthesis for Multi-affine Hybrid Systems from Hybrid C T L Specifications Eva Šmijáková, Samuel Pastva, David Šafránek/ ), and Luboš Brim Faculty of Informatics, Masaryk University, Brno, Czech Republic {xsmijakl,xpastva,šafránek,brim}@fi.muni.cz Abstract. We consider the parameter synthesis problem for multi-affine hybrid systems and properties specified using a hybrid extension of C T L ( H C T L ) . The goal is to determine the sets of parameter valuations for which the given hybrid system satisfies the desired H C T L property. As our main contribution, we propose a shared-memory parallel algorithm which efficiently computes such parameter valuation sets. We combine a rectangular discretisation of the continuous dynamics with the discrete transitions of the hybrid system to obtain a single over-approximating semi-symbolic transition system. Such system can be then analysed using a fixed-point parameter synthesis algorithm to obtain all satisfying parametrisations. We evaluate the scalability of the method and demonstrate its applicability in a biological case study. Keywords: Hybrid systems • Parameter synthesis • Rectangular abstraction • Semi-symbolic • Hybrid C T L 1 Introduction In real-world dynamical systems, one encounters a complex interplay of both continuous and discrete dynamics. This type of behaviour appears in cyberphysical systems, biochemical or biophysical systems (systems biology), economic and social interaction models, or in the infectious disease control (epidemic systems). In many cases, the continuous part reflects the natural phenomena and the discrete part arises due to some (not necessarily digital) embedded control mechanism. Such systems are formalised by means of hybrid systems (also hybrid automata (HA)). These typically consist of several modes, each describing the continuous evolution of the system using ordinary differential equations (ODE). The modes are then connected using conditional discrete jumps. Parameters often need to be introduced into the continuous ODE flow or the conditions L. Brim-This work has been supported by the Czech Science Foundation grant No. 18-00178S. © Springer Nature Switzerland A G 2020 A. Abate et al. (Eds.): CMSB 2020, LNBI 12314, pp. 280-297, 2020. https://doi.org/10.1007/978-3-030-60327-4_15 Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 281 of the discrete jumps to represent an unknown or uncertain behaviour of the system. We focus on multi-affine hybrid systems, which have a multi-affine vector field in every mode. Such systems have a large set of applications including infectious disease models [37], altitude and velocity control systems [5], models of gene regulation [32], or models of other biological systems with mixed continuous and discrete variables [18,34]. In typical scenarios, hybrid systems have too many mutually dependent variables and parameters to be studied analytically. To examine a proposed hypothesis, one commonly relies on computational methods, as these exhibit better scalability and often do not require expert knowledge. To rigorously represent a hypothesis about some abstract observable sequence of events (or event branching) in the behaviour of a hybrid system, we use temporal logic. We employ an expressive hybrid extension of the computation tree logic (HCTL) [2]. HCTL extends CTL with first-order quantifiers as well as specialised operators (bind j. and at @) for reasoning about properties of states. Given an HCTL formula and a parametrised hybrid system, the goal of the parameter synthesis problem is then to determine the set of parametrisations for which the system satisfies the given HCTL specification. Paper Contributions. We introduce a novel approach to the parameter synthesis of multi-affine hybrid systems that addresses the state space explosion problem using parallelism and symbolic parameter representation. - As a specification language, we utilise the expressive HCTL logic. HCTL enables properties such as un-reachability, general oscillation and stability. - We consider a wide family of multi-affine hybrid automata (MHA) with parameters in the continuous flow as well as mode invariants and jumps. - For such an MHA, we construct a compound parametrised Kripke structure that over-approximates its behaviour. The continuous modes of the automaton are discretised using rectangular abstraction [4]. To efficiently represent this structure, we extend the semi-symbolic approach (i.e. an explicit state space with symbolic parameter sets) proposed in [8]. - We propose a parallel, shared-memory parameter synthesis algorithm based on [10]. Given a parametrised hybrid automaton H and an HCTL property (/?, we compute the set of parametrisations of H for which H \= cp. - We evaluate the scalability of the method and demonstrate its applicability in a case study based on a complex biological system. In general, in this paper we give a significant extension of our existing framework for piece-wise multi-affine continuous-time systems [8,10,11] by making it working with a more general class of systems—multi-affine hybrid automata. First, we adapt the rectangular abstraction to correctly capture MHA. Second, we provide novel algorithms working with this class of hybrid systems. 282 E . Smijakova et al. Related Work. Hybrid systems are rather ubiquitous in systems biology. A comprehensive overview appears in [14,39]. The problem of parameter identification for HA has also been targeted from several different perspectives: The closest work considering multi-affine hybrid automata is implemented in the tool Hydentify [12]. Hydentify considers parameters only in the continuous flow function (we allow parametrised jump guards and invariants as well). The most significant difference is the abstraction—Hydentify employs several abstractions that simultaneously over-approximate and under-approximate the MHA by linear hybrid automata (LHA). In our case, rectangular abstraction is employed to explicitly discretise the vector field. On the one hand, LHA abstraction has the advantage of preserving the timing information, and it also enables the use of efficient symbolic reachability analysis algorithms, such as in [26]. On the other hand, Hydentify has to iteratively decompose the parameter space by repeating the (non-parametrised) reachability task. In our case, a coarser (rectangular) abstraction is not limited to reachability (we use HCTL as the specification language) and the analysis of parameter space can be performed symbolically in a single iteration of the parameter synthesis algorithm, without the need for explicit decomposition. Rectangular abstraction [6] for parameter synthesis of non-linear (piece-wise multi-affine) dynamical systems has been originally used in [4] and implemented in the tool RoVerGeNe (for LTL specifications). The extension to MHA provided by Hydentify has significantly improved scalability and precision of RoVerGeNe, but it restricts specifications to reachability and safety questions. Our work allows more flexibility in the parametrisation of the MHA and extends the set of supported specification formalisms. Counterexample-guided abstraction refinement (CEGAR) [19] is also applicable to parameter synthesis of LHAs [25]. In this case, the counterexamples are paths to the bad states, and the model is refined by restricting the domains of parameters. The main advantage of CEGAR is efficiency. Compared to the standard reachability analysis, it is often faster and requires smaller state space. Breach [22,23] uses simulation-based techniques to analyse hybrid systems. It performs parameter synthesis of general, non-linear HA with respect to properties in signal temporal logic. A similar approach is used in Biocham [17] for LTL [38] and C T L [24]. Simulation-based methods use numerical solvers, and therefore their precision significantly relies on the quality of parameter space sampling (unlike abstraction based approaches, there are no global formal guarantees). Breach minimises this parameter space sampling error using sensitivity analysis. Meanwhile, U-Check [13] combines statistics and machine learning to address the problem of good sampling with statistical guarantees [3,15]. These approaches are limited to time-bounded behaviour, whereas our approach allows time-unbounded analysis. If some parametrisations with the desired behaviour are already known, it is possible to synthesise the whole set of parametrisations for which the system preserves the same behaviour using the inverse method [1]. Tool HYMITATOR implements this approach for LHAs [27]. Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 283 Parameter synthesis problem can also be encoded into first-order logic formulae and solved using (^-complete decision procedures [28]. In particular, a formula is used to describe the states reachable within a finite number of steps. Parameter synthesis is then reduced to computing a satisfying valuation of the parameters for such formula [33]. While mostly limited to time-bounded analysis, this framework was successfully used to obtain parameter ranges indicating disorders in cardiac cells models [31,35]. 2 Parameter Synthesis of H y b r i d Automata In this section, we first define the studied class of hybrid automata with parameters, and then show how such automata can be transformed into discrete parametrised Kripke structures (PKS). The construction proceeds in two steps. First, each continuous mode of the hybrid automaton is translated into a partial PKS. Then, a compound PKS is created as a combination of the partial PKSs with jump and reset conditions of the hybrid automaton. We argue that this Kripke structure over-approximates the behaviour of the original automaton. We then define hybrid CTL and its semantics over Kripke structures and finally, parameter synthesis problem for hybrid automata and hybrid CTL. Preliminaries. We use M and N to denote the set of real and natural numbers respectively. V(A) denotes the power set (set of all subsets) of A. In general, we write AB to denote the set of all possible functions B —>• A. When B is a set of variables or parameters of the system, such function is often referred to as valuation (and AB is thus a set of all valuations). To describe the semantics of a discrete system with parameters, we use the notion of parametrised Kripke structure [16]: Definition 1. Let AP be a set of atomic propositions. A parametrised Kripke structure (PKS) over AP is a tuple K = (P, S, /, —L) where: - P is a finite set of parametrisations; - S is a finite set of states; - I Q S is the set of initial states; - L : S —>• V(AP) is a labelling of states with atomic propositions; - —>-C S x P x S is a parametrised transition relation. We write s —>• t instead of (s,p,t) G—K We assume that the PKS is total, i.e. TP for all s G S and p G P , there exists at least one t such that s —> t. By fixing a parametrisation p G P we obtain an ordinary (i.e. non-parametrised) Kripke structure (KS) [20] Kp = (S,I,^P,L). Here, S, I and L are the same as in K and —)>p= {(s,t) | (s,p,t) G—>•}. A path in a non-parametrised KS, Kp, is an infinite sequence of states a : N —> S, such that for all i G N, (u{i — l),u(i)) G — W e use Ui instead of u{i) to denote the i-th element on the path. The set of all paths in a KS starting at s G S is denoted Es = {a \ a is a path in the KS and CTQ = s}. 284 E . Smijakova et al. 2.1 Parametrised Hybrid Automata Hybrid systems combine the behaviour of discrete and continuous systems. The typical example of such a system is a physical system whose behaviour depends on a discrete controller. Hybrid systems can be modelled as hybrid automata [30]. In practice, some parts of a hybrid system might be unknown or customizable. To describe such systems, we consider parametrised hybrid automata which allow both parameters in the continuous flow functions as well as parameters influencing the switching of modes [23,27]. Definition 2. A parametrised hybrid automaton (PEA) 77 is an ordered tuple 77 = (77, Q, X, F, J, D, E, G, R) where: - II is a finite set of real-valued parameters; - Q is a finite set of discrete modes; - X is a finite set of real-valued variables; - F : Q x Wx x IR7 7 —> Wx is a parametrised vector field defining the local continuousflowvia a differential equation x = F ( g , i , 7 r ) ; - J C Q x I R X is a set of initial states of the PEA; - D : Q x IR7 7 —>• V(M. ) is a parametrised domain of a mode, sometimes also called an invariant of a mode; -ECQxQisa set of edges (transitions, jumps) between modes; - G : E x IR7 7 —y V(M. ) gives a parametrised guard condition for each jump; - R: E ->• V(RX x Rx ) specifies for each jump a reset relation which describes possible variable valuations before and after a jump. In this paper, we further restrict PEA to represent a so called piece-wise multi-affine automaton. Specifically, we assume F to be piece-wise multi-affine in both variables and parameters. Furthermore, the invariants, jump conditions and reset conditions of the automaton have to be described using a Boolean combination of inequalities which are affine in both variables and parameters as well. Finally, we assume the automaton to be bounded. That is, the domain of every real-valued parameter p £ 77 is an interval [pmimPmax] and for every variable x £ X, we also have an interval [xmin,xmax\ such that it covers all values satisfying the related invariant conditions. The requirement of multi-affinity and boundedness is necessary in order to enable efficient abstraction techniques for the continuous flow of the automaton and efficient manipulation of invariants and jumps in general. If one can supply such operations for a broader class of PHAs, the restrictions can be lifted. We will further discuss such possible extensions of our method where appropriate. A state of a P H A is a pair (q, x) where q £ Q is the current discrete mode and x £ 1RX is the current valuation of X. The state is valid in a P H A for a parameter valuation TT £ 1R77 if it fulfils the invariant condition, i.e. x £ D(q,ir). There are two types of flows in a PHA. The first type is given by the trajectories of the continuous vector field. This flow is called local and is relevant as long Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 285 as the evolved state fulfils the invariant. The second type of flow is the jump flow which corresponds to transitions between the individual discrete modes. Jump j £ E is allowed between (q,x) and (q',x') for a parameter valuation TT only if: - both (q,x) and (q',x') are valid (fulfil the corresponding mode invariants); - the guard condition of j is satisfied, i.e., x £ G(j, 7r); - the reset relation of j is such that x resets to x', i.e., (x,x') £ -R(j)Note that in many practical cases, the reset relation is simply an identity, in which case one naturally considers only jumps where x = x' and the third condition thus becomes unnecessary. It is important to emphasise that the parameters can be present at three different places in a PHA: flow function parameters, predicates defining an invariant of a mode, and predicates defining a guard condition of a jump. By fixing a parameter valuation TT G R 7 7 of a PHA H, we obtain an automaton Hw which has the same semantics as a standard non-parametrised hybrid automaton specified in [30]. 2.2 Rectangular Abstraction of Parametrised Hybrid Automata In order to interpret hybrid CTL formulae over a PHA, we first need to transform the PHA into a PKS. In our case, the PKS over-approximates the behaviour of the original multi-affine PHA. The construction proceeds in two steps. First, a partial PKS of the continuous (local) flow is constructed for each discrete mode of the PHA. Second, the partial PKSs are merged into a single compound PKS that also incorporates the jumps between individual modes. Local Mode Abstraction. In order to construct a PKS describing the continuous behaviour of a single discrete mode, we employ rectangular abstraction [4] for piece-wise multi-affine continuous models. This abstraction was chosen since it can work with parametrised systems and in the past has been successfully used for parameter synthesis in ODE models [8]. The choice of abstraction method greatly influences the class of automata the approach can handle. If a more general abstraction method is available, the restriction to piece-wise multi-affine systems can be lifted. Here, we first give a high-level overview of the abstraction method and then a brief technical summary. For full technical explanation, the reader is referred to [4]. The abstraction divides the continuous state space into a set of n-dimensional rectangles. Transitions are only introduced between adjacent rectangles, i.e. rectangles that share an (n— l)-dimensional facet. A transition is introduced between two rectangles, r\ and r^, if there exists a continuous trajectory which flows from r\ to r2 through their connecting facet (or more precisely, when the absence of such trajectory cannot be decided). A self-loop on a rectangle is introduced whenever it cannot be decided that eventually every trajectory escapes the rectangle. As argued in [4], this over-approximates the behaviour of the original continuous flow. In our case, we further extend the method by removing the discrete states 286 E . Smijakova et al. (rectangles) which do not contain any points satisfying the invariant conditions of the PHA. This does not influence the over-approximation, since these states would not appear in the original PHA neither. When dealing with parameters, instead of a yes-no answer, the abstraction procedure computes for each transition a description of the parameter valuation set for which it is enabled (i.e. the above mentioned conditions are met), thus yielding a parametrised Kripke structure. In our case, these parameter valuation sets can be described using combinations of affine inequalities (due to the chosen abstraction method). In case of a more general class of hybrid automata, a more general representation such as semi-algebraic sets or SMT formulae are necessary. The procedure assumes a PHA H as defined above and a fixed discrete mode q G Q. As a result, we obtain a PKS Kq = (Pq , Sq , Iq , Lq ) which describes the local behaviour of H in mode q. Furthermore, for each continuous variable v G X we assume a sequence of thresholds {0?,... C E ordered such that Q\ < 6% < ••• < 6v nv and #i = vmin and 9% = vmax. These thresholds partition the continuous state space of the automaton into n-dimensional intervals , x • • • x , (here, v\ through vn are the variables of the PHA). These intervals are referred to as rectangles. Each rectangle is uniquely identified via an n-tuple of indices: • ( j i , . . . , j n ) = x ••• x where the range of each jt is {1,..., nVi — 1}. In each rectangle, the flow F of the PHA must be multi-affine. Notice that for such rectangle, we can easily perform basic set operations (G, C, PI, ...) since it is essentially a set of real valued tuples (each specifying a single valuation of variables of H). Using this rectangular partitioning, we can construct Kq as follows: - The parameter valuations of Kq are given by all possible valuations of parameters of H, i.e., Pq :=Rn . - The state space of the PKS is created by taking all n-dimensional rectangles that contain at least one valid state of H (that is, 3p G Pq : D(q,p) n • ( j i , . . . ,jn) 7^ 0) and extending them with q to indicate the mode which the rectangle belongs to: Sq •= {(q,r = • ( j i , . . . , j n ) ) | 3p £ Pq ,3x e r : x e D(q,p)} Note that the validity check for each rectangle can be performed due to the imposed restrictions on the guards of the PHA. - Initial states are given by the rectangles that contain at least one initial state of the original PHA, i.e., Iq := {(q,r) G Sq | 3x G r : (q, x) G J}. - The labelling function assigns a proposition a G AP to a rectangle r if at least one point in r satisfies the proposition in H. Formally, Lq ((q,r)) := {a G AP | (q, r) G Sq A 3x G r : x |= a}. Here, a is usually some inequality (or a Boolean combination of inequalities) defined over variables, e.g., v\ > 3. - Finally, the procedure for constructing the transition relation —^g is described in [4]—since it only requires the knowledge of Sq and the differential equations F assumed in mode g, it remains largely the same as for general ODE models. Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 287 The main difference is that here, some rectangles may not be included in the state space since they do not contain any valid states of H. Such rectangles are excluded from the abstraction. Compound Abstraction. Assuming the PKS Kq for each mode q £ Q is available, we can construct the compound PKS (CPKS) that extends the model with jump transitions between individual modes. CPKS C = (P, S, L) is constructed as follows: - P := M.n —all Kq share the same set of parameter valuations; - S := UgeQ ^q —individual states already contain the mode label and thus the sets of states of local PKSs are disjoint; - Proposition labelling function follows individual mode labellings as well. Additionally, we introduce artificial proposition A(q) that allows us to explicitly reference all states of a mode q: L((q,r)) :=L*((q,r))U{A(q)} - The transition relation —> consists of two parts, —^ (local) and —>j (jump), defined as follows: -+3-= U {((qi,s),p,(q2,t))eSxPxS\ e=(qi,q2)eE 3x £ s,x' £ t : (x,x') £ R(e) A x G G(e,p)} As we can see, a jump transition is created for every pair of rectangles where some combination of continuous values satisfies the guard and reset conditions of some original jump in H. Due to the restrictions imposed on jump and reset conditions, we can explicitly compute the set of parametrisations for which this relation holds. The CPKS C created using this procedure over-approximates the behaviour of the original multi-afiine PHA. The over-approximation of the continuous flows follows from the correctness of the rectangular abstraction. The discrete jumps of the automaton are then conservatively re-created in the Kripke structure as described above. That is, whenever a jump is possible from a rectangle (jump guard is satisfied in some subset of the rectangle), a transition in the CPKS is created. Figure 1 illustrates an example of a rectangular abstraction of a hybrid system. On the left, an original hybrid automaton with three modes is depicted. The black areas do not fulfil the modes invariant conditions. The grey areas represent guard and reset relations of individual jumps. The dashed arrows represent the 288 E . Smijakova et al. mapping of a jump. Here, first jump is mapped to a singular point whereas the second jump does not override the coordinates of the variables (V = x). The solid arrows depict the vector fields. On the right side, the resulting rectangular abstraction of the hybrid automaton is shown. The semantics of black and grey rectangles are the same as on the left side. The small arrows represent local (flow) transitions and dashed arrows represent jump transitions. 2.3 Hybrid C T L We use the hybrid extension of the computational tree logic (HCTL) [2] to reason about properties of interest at the level of CPKS. HCTL allows to express timeunbounded properties at a very general level. For example, reachability of a single stable state (or a cycle) can be expressed without addressing a concrete state (or states on a concrete cycle). This is achieved by state variable quantification and state binding operators. Branching operators of CTL are used to fully reflect the non-determinism present in the CPKS. The formulae of HCTL are defined using the following abstract syntax: if ::= true \ q \ ->

S. We use h[x H> s] to denote a valuation which maps the variable x to Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 289 state s and is otherwise defined as the valuation h. Formally, h[x H> s](x) = s,h[x i—y s](y) = h(y) for all Second, the definition of the satisfaction of an HCTL formula in a state of the KS is stated in the following way. Definition 3. Let K be a KS and h : V —>• S be a valuation of state variables. The satisfaction relation for states and paths of K w.r.t. HCTL formulae is defined as follows: (K, h, s) (K, h, s) (K, h, s) (K, h, s) (K, h, s) true P -,(p p G L(s) (K, h,s)\£

s'], s) \= ip (K,h,h(x)) |= (p f2 Usually, we are interested in formulae without free variables. In such case we write (M, s) \= ip instead of (M, h, s) \= ip as then the choice of h is not relevant. We also use standard syntactic extensions of HCTL such as universal quantification Wx.ip meaning ->3x.-np and CTL operators E F (existential finally), A G (for all globally), and A X (for all successors). The following three operators make the core of the hybrid extension: 3x (exists), 4- x (bind), and @x (at). The 3x operator has the same meaning of existential quantification as in the first-order logic. The 4- x operator provides a more specialised alternative to exists as it allows to assign the current state to the variable x. Finally, the @x operator points the subformula to the state that has been stored in the variable x. Some examples of properties of CPKSs expressible in HCTL: - Reachability of a mode q: E F A(q) - Unreachability of a mode q: -iEF A(q) - Stable steady state (sink): | s.AX s - Cycle: | s.EX E F s 290 E . Smijakova et al. As it has been discussed in Sect. 2.2, paths of CPKS over-approximate the (uncountable) set of all hybrid trajectories of the abstracted multi-affine PHA. As a consequence, at the level of PHA we can interpret only those properties the validity of which cannot be violated by the abstraction. For example, if a universally quantified formula A F tp is satisfied at state s in the CPKS then it essentially refers to all paths starting in s. These paths over-approximate all PHA trajectories starting at any point of the rectangle represented by s (there is also a universal quantification at the level of points covered by the rectangle). In fact, these paths may include some spurious path—a path that do not correspond to the behaviour of the PHA. In this (completely universally quantified) situation there is no problem—the corresponding property can be considered valid also at the level of the abstracted PHA. However, if a formula E F ip is checked true at some state s of the CPKS then it might be (existentially) true just for some spurious path and therefore it cannot be guaranteed to be satisfied in the PHA. As a consequence, only properties containing universal quantification (or negation of existential quantification) can be correctly interpreted at the level of PHA (this is the case of all the example properties mentioned above). 2.4 Parameter Synthesis Problem The goal of parameter synthesis is to find parameter valuations for which the required properties hold. Given a PHA H, we first transform it to a CPKS C. Then the constraints on C using HCTL are specified, obtaining a HCTL formula (p. Afterwards, we solve a parameter synthesis problem for C and

• V(P) such that F^{s) = {p e P \ (Cp, s) \= V(P) for every property. 292 E . Smijakova et al. 4 Experimental Evaluation Our experimental evaluation consists of two parts. We demonstrate the applicability of our approach to real-world biological systems by analysing a known diauxic shift model [36]. We then also explore scalability of our parallel approach with increasing number of available processors. 4.1 Diauxic Shift Model We consider a hybrid model from [36], representing a diauxic shift in an organism with a carbon catabolite repression. Such an organism prefers a specific carbon source (Ci) and grows rapidly while the source is available. With C\ depleted, the organism switches to another source (C2) and regulates its growth aggressively. The model has two regulated proteins, RP and T^. The presence of RP is regulated based on the carbon source C\\ if the amount of C\ falls below the threshold 7, RP is suppressed. Similarly, T2 is regulated by RP; if RP falls below the threshold a, T2 is suppressed. The model thus consists of four hybrid modes—the combinations of RP and T2 being suppressed or not. We use [on, on], [on, off], [off, on], and [off, off] to denote these hybrid modes. The values indicate the status of RP and T2 respectively (on is active, o f f is suppressed). In its original form, the model is too simplistic for our experiments. The carbon sources in the model are never replenished. Eventually, all carbon is exhausted and all activity dies out. This is sufficient for short term simulations but not for long-term analysis using temporal properties. We thus extend the model with artificial sources of C\ and RP (inc\ and IURP) considered as parameters. Complete model is shown in Fig. 3. We pre-process the model as described in [29], approximating the non-linear dynamics with piece-wise multi-affine func- tions. 4.2 Parameter Synthesis Based on a few simple simulations, we observe that, depending on the values of inc1 and in^p, the model can stabilise in different hybrid modes as well as oscillate between modes indefinitely. Our goal will be to identify the parametrisations where these different types of long-term behaviour occur. We use the formula A G A(mode) to identify parametrisations where the system stabilises in a specific mode (here, A(mode) is a proposition satisfied in all states of the specific mode). Furthermore, we use 4- x : A X x to synthesise parametrisations where the system stabilises in exactly one state. To detect oscillations between two modes, we use the formula A G (AF {A{m\) A A {A{m\) U (.4.(7712) A A {A{m2) U A{m\))))) where m\ and 777-2 are different modes. The formula can be extended by adding additional modes (and until operators) to describe more complex oscillations. Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 293 Ci = -kca^Ti — — — + inCl Ci = —kcat2T2Kc2 + C2 M RP = k R P R — —— - kdRPRP + i n R P K R P + JV1 M R = k R R r s I U ~ M R R KR+ M M = kcatlTl - C \ - + kcat2T2KCl + Ci 2 Kc2 + C:2 = 0.3 = 0.2 kR = 0.03 kRP = 0.05 kTl = kT2 = 0.1 KCl = Kc2 = 1.0 KTl = KT2 = 1.0 K R = KRP = 1.0 = kdT2 = 0.1 kdp = 0.001 kdRP = 0.1 a = 7 = 1.0 M M k R p R K a 7 T ^ " k T ^ K ^ M m c ^ G [ °' 2 ] M k T 2 R - T 7 — — kpRM i n R P e [o, 0.4] K T 2 + M KR + M Fig. 3. Differential equations of the extended diauxic shift model. The bold expressions are only present in the corresponding hybrid modes. That is, kppR K r ^ + m is removed in modes where RP is suppressed. The same happens for kr2 RK t M + m when T2 is suppressed. The abstraction procedure considers C\ G [0,50], C2 £ [0,35], M G [0,20], RP e [0,3.5], T i G [0,8], T 2 G [0,2.5], i? G [0,40] (bounds obtained by simulations and sampling in the parameter space) and roughly 8 thresholds per variable, producing ^2 million valid discrete states. The results of the analysis are shown in Fig. 4. Notice that the system cannot stabilise in the mode [on, on] and there is no possibility of oscillation between three different modes. Also observe that not all behaviour stabilising in a specific mode reaches a sink state. These only appear for the [off, off] mode. 4.3 Scalability To evaluate the scalability of our approach, we again use the diauxic shift model together with an HCTL property specifying that the model oscillates between all four discrete modes (green areas in Fig. 4). We chose four variants of the model with varying number of continuous parameters and size of the discrete state space (depending on the number of thresholds selected during abstraction). We conducted all measurements using a machine equipped with AMD Ryzen Threadripper 2990WX 32-Core Processor and 64 GB of memory. The runtime of each experiment is shown in Table 1. 294 E . Smijakova et al. 1off,off | § [ o f f , on] _ [on, o f f ] 1 1.2 1.4 inCl 1.6 1.8 Fig. 4. Parameter synthesis results for the extended diauxic shift model. The white areas depict stable regions ( A G mode) with striped areas indicating the presence of sinks (4- x : A X x). Coloured areas depict different types of oscillation between modes indicated in that area. (Color figure online) Table 1. Scalability results for several variants of the diaux shift model. Parameter count State count 1 cpu 2 cpu 4 cpu 8 cpu 16 cpu 32 cpu 1 «1.9e6 123s 95s 70s 55s 43s 38s ?«4.8e4 5.1s 3.9s 2.7s 1.8s 1.5s 1.3s 2 ?«1.9e6 145s I l l s 83s 66s 50s 43s ?«4.8e4 5.9s 4.3s 3.2s 2.2s 1.6s 1.4s 5 Conclusion In this work, we have presented a novel approach towards parameter synthesis of a significant class of hybrid systems with an expressive logic HCTL. Our approach uses rectangular abstraction to transfer the problem to the domain of state transition systems with a parametrised transition relation. For such an abstract system, we show how to compute the set of parametrisations for which the given HCTL specification holds. We evaluate our approach on a non-trivial real-world biological model. The first advantage of our approach is the possibility of addressing timeunbounded properties of the systems dynamics. The second advantage is the variety of parameters (flow dynamics, jump guards, invariants) our approach can accommodate. Finally, our algorithm relies on the ideas of parallel parameter synthesis for CTL, thus enabling full utilisation of high-performance hardware. Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 295 The disadvantage of the method appears to be the employed abstraction. While very fast and useful in many cases, its exact error (due to the overapproximation) is not precisely quantifiable. Additionally, this approach still suffers from state space explosion with respect to the number of system variables. As future work, we aim to refine this proof of concept implementation into a stable tool. M oreover, the tool can benefit from future improvements in Pithya, such as performance and scalability optimisations as well as advanced data structures for manipulating symbolic parametrisation sets, thus allowing even wider range of hybrid automata. References 1. André, É.: I M I T A T O R : a tool for synthesizing constraints on timing bounds of timed automata. In: Leucker, M . , Morgan, C. (eds.) I C T A C 2009. L N C S , vol. 5684, pp. 336­342. Springer, Heidelberg (2009). https://doi.org/10.1007/978­3­ 642­03466­4_22 2. Arellano, G., et al.: "Antelope": a hybrid­logic model checker for branching­time Boolean G R N analysis. B M C Bioinform. 12(1), 490 (2011) 3. Bartocci, E., Bortolussi, L . , Nenzi, L . , Sanguinetti, G.: On the robustness of temporal properties for stochastic models. In: Dang, T., Piazza, C. (eds.) Proceedings Second International Workshop on Hybrid Systems and Biology, H S B 2013, Taormina, Italy, 2nd September 2013. E P T C S , vol. 125, pp. 3­19 (2013) 4. Batt, G., Belta, C , Weiss, R.: M odel checking genetic regulatory networks with parameter uncertainty. In: Bemporad, A . , Bicchi, A . , Buttazzo, G. (eds.) H S C C 2007. L N C S , vol. 4416, pp. 61­75. Springer, Heidelberg (2007). https://doi.org/10. 1007/978­3­540­71493­4_8 5. Belta, C : On controlling aircraft and underwater vehicles. In: Proceedings of the I E E E International Conference on Robotics and Automation, vol. 5, pp. 4905­4910 (2004) 6. Belta, C , Habets, L.: Controlling a class of nonlinear systems on rectangles. I E E E Trans. Autom. Control 51(11), 1749­1759 (2006) 7. Beneš, N . , Brim, L . , Demko, M . , Pastva, S., Šafránek, D.: A model checking approach to discrete bifurcation analysis. In: Fitzgerald, J., Heitmeyer, C , Gnesi, S., Philippou, A . (eds.) F M2016. L N C S , vol. 9995, pp. 85­101. Springer, Cham (2016). https://doi.org/10.1007/978­3­319­48989­6_6 8. Beneš, N . , Brim, L . , Demko, M . , Pastva, S., Šafránek, D.: Parallel SM T­based parameter synthesis with application to piecewise multi­amne systems. In: Artho, C , Legay, A., Peled, D. (eds.) A T V A 2016. L N C S , vol. 9938, pp. 192­208. Springer, Cham (2016). https://doi.org/10.1007/978­3­319­46520­3_13 9. Beneš, N . , Brim, L . , Demko, M . , Pastva, S., Šafránek, D.: Pithya: a parallel tool for parameter synthesis of piecewise multi­amne dynamical systems. In: M ajumdar, R., Kunčak, V . (eds.) C A V 2017. L N C S , vol. 10426, pp. 591­598. Springer, Cham (2017). https://doi.org/10.1007/978­3­319­63387­9_29 10. Beneš, N . , Brim, L . , Pastva, S., Šafránek, D.: Parallel parameter synthesis algorithm for hybrid C T L . Sci. Comput. Program. 185, 102321 (2019) 11. Beneš, N . , Brim, L . , Demko, M . , Pastva, S., Šafránek, D.: Pithya: a parallel tool for parameter synthesis of piecewise multi­amne dynamical systems. Int. J. Struct. Stab. Dyn. (2017) 296 E . Šmijáková et al. 12. Bogomolov, S., Schilling, C , Bartocci, E., Batt, G., Kong, H . , Grosu, R.: Abstraction­based parameter synthesis for multiaffine systems. Hardw. Softw.: Verif. Test. 11, 19­35 (2015) 13. Bortolussi, L . , Milios, D., Sanguinetti, G.: U­Check: model checking and parameter synthesis under uncertainty. In: Campos, J., Haverkort, B . R . (eds.) Q E S T 2015. L N C S , vol. 9259, pp. 89­104. Springer, Cham (2015). https://doi.org/10.1007/978­ 3­319­22264­6.6 14. Bortolussi, L . , Policriti, A.: Hybrid systems and biology. In: Bernardo, M . , Degano, P., Zavattaro, G. (eds.) S F M 2008. L N C S , vol. 5016, pp. 424­448. Springer, Heidelberg (2008). https://doi.org/10.1007/978­3­540­68894­5_12 15. Bortolussi, L . , Sanguinetti, G.: Smoothed model checking for uncertain continuous time M arkov chains. C o R R abs/1402.1450 (2014) 16. Brim, L . , Češka, M . , Demko, M . , Pastva, S., Šafránek, D.: Parameter synthesis by parallel coloured C T L model checking. In: Roux, O., Bourdon, J. (eds.) C M S B 2015. L N C S , vol. 9308, pp. 251­263. Springer, Cham (2015). https://doi.org/10. 1007/978­3­319­23401­4_21 17. Calzone, L . , Fages, F., Soliman, S.: B I O C H A M : an environment for modeling biological systems and formalizing experimental knowledge. Bioinformatics 22(14), 1805­1807 (2006) 18. Chiang, H.K., Fages, F., Jiang, J.R., Soliman, S.: Hybrid simulations of heterogeneous biochemical models in S B M L . A C M Trans. M odel. Comput. Simul. 25(2), 14:1­14:22 (2015) 19. Clarke, E., Grumberg, O., Jha, S., L u , Y . , Veith, H . : Counterexample­guided abstraction refinement. In: Emerson, E . A . , Sistla, A . P . (eds.) C A V 2000. L N C S , vol. 1855, pp. 154­169. Springer, Heidelberg (2000). https://doi.org/10.1007/ 10722167.15 20. Clarke Jr., E . M . , Grumberg, O., Peled, D.A.: M odel Checking. M I T Press, Cambridge (1999) 21. Demko, M . , Beneš, N . , Brim, L . , Pastva, S., Šafránek, D.: High­performance symbolic parameter synthesis of biological models: a case study. In: Bartocci, E., Lio, P., Paoletti, N . (eds.) C M S B 2016. L N C S , vol. 9859, pp. 82­97. Springer, Cham (2016). https://doi.org/10.1007/978­3­319­45177­0_6 22. Donzé, A . : Breach, a toolbox for verification and parameter synthesis of hybrid systems. In: Touili, T., Cook, B . , Jackson, P. (eds.) C A V 2010. L N C S , vol. 6174, pp. 167­170. Springer, Heidelberg (2010). https://doi.org/10.1007/978­3­642­14295­ 6_17 23. Donzé, A . , Krogh, B . , Rajhans, A . : Parameter synthesis for hybrid systems with an application to Simulink models. In: M ajumdar, R., Tabuada, P. (eds.) H S C C 2009. L N C S , vol. 5469, pp. 165­179. Springer, Heidelberg (2009). https://doi.org/ 10.1007/978­3­642­00602­9_12 24. Fages, F., Rizk, A . : From model­checking to temporal logic constraint solving. In: Gent, L P . (ed.) C P 2009. L N C S , vol. 5732, pp. 319­334. Springer, Heidelberg (2009). https://doi.org/10.1007/978­3­642­04244­7_26 25. Frehse, G., Jha, S.K., Krogh, B.H.: A counterexample­guided approach to parameter synthesis for linear hybrid automata. In: Egerstedt, M . , M ishra, B . (eds.) H S C C 2008. L N C S , vol. 4981, pp. 187­200. Springer, Heidelberg (2008). https://doi.org/ 10.1007/978­3­540­78929­1.14 26. Frehse, G., et al.: SpaceEx: scalable verification of hybrid systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) C A V 2011. L N C S , vol. 6806, pp. 379­395. Springer, Heidelberg (2011). https://doi.org/10.1007/978­3­642­22110­l_30 Parameter Synthesis for Hybrid Systems from Hybrid C T L Specifications 297 27. Fribourg, L . , Kiihne, U . : Parametric verification and test coverage for hybrid automata using the inverse method. In: Delzanno, G., Potapov, I. (eds.) R P 2011. L N C S , vol. 6945, pp. 191-204. Springer, Heidelberg (2011). https://doi.org/10. 1007/978-3-642-24288-5.17 28. Gao, S., Kong, S., Clarke, E . M . : dReal: an S M T solver for nonlinear theories over the reals. In: Automated Deduction - CADE-24, pp. 208-214 (2013) 29. Grosu, R., et al.: From cardiac cells to genetic regulatory networks. In: Gopalakrishnan, G., Qadeer, S. (eds.) C A V 2011. L N C S , vol. 6806, pp. 396-411. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-l_31 30. Henzinger, T . A . : The theory of hybrid automata. In: Proceedings 11th Annual I E E E Symposium on Logic in Computer Science, pp. 278-292, July 1996 31. Islam, M . A . , et al.: Bifurcation analysis of cardiac alternans using ^-decidability. In: Bartocci, E., Lio, P., Paoletti, N . (eds.) C M S B 2016. L N C S , vol. 9859, pp. 132-146. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45177-0_9 32. de Jong, H . : Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. 9(1), 67-103 (2002) 33. Kong, S., Gao, S., Chen, W., Clarke, E.: dReach: ^-reachability analysis for hybrid systems. In: Baier, C , Tinelli, C. (eds.) T A C A S 2015. L N C S , vol. 9035, pp. 200- 205. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_15 34. Lincoln, P., Tiwari, A . : Symbolic systems biology: hybrid modeling and analysis of biological networks. In: Alur, R., Pappas, G . J . (eds.) H S C C 2004. L N C S , vol. 2993, pp. 660-672. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3- 540-24743-2.44 35. Liu, B . , Kong, S., Gao, S., Zuliani, P., Clarke, E . M . : Parameter synthesis for cardiac cell hybrid models using <5-decisions. In: Mendes, P., Dada, J.O., Smallbone, K . (eds.) C M S B 2014. L N C S , vol. 8859, pp. 99-113. Springer, Cham (2014). https:// doi.org/10.1007/978-3-319-12982-2_8 36. Liu, L . , Bockmayr, A . : Formalizing metabolic-regulatory networks by hybrid automata. bioRxiv (2019) 37. Liu, X . , Stechlinski, P.: Infectious Disease Modeling. Springer, Cham (2020). https://doi.org/10.1007/978-3-319-53208-0 38. Rizk, A . , Batt, G., Fages, F., Soliman, S.: Continuous valuations of temporal logic specifications with applications to parameter optimization and robustness measures. Theor. Comput. Sci. 412(26), 2827-2839 (2011). Foundations of Formal Reconstruction of Biochemical Networks 39. Stephanou, A . , Volpert, V . : Hybrid modelling in biology: a classification review. Math. Model. Nat. Phenom. 11(1), 37-48 (2016)