Is There a Best Biichi Automaton for Explicit Model Checking? František Blahoudek Masaryk University Brno, Czech Republic xblahoud@fi.muni.cz Alexandre Duret-Lutz LRDE, EPITA Le Kremlin-Bicetre, France adl@lrde.epita.fr Jan Strejček Masaryk University Brno, Czech Republic strejcek@fi.muni.cz Mojmír Křetínský Masaryk University Brno, Czech Republic kretinsky@fi.muni.cz ABSTRACT LTL to Biichi automata (BA) translators are traditionally optimized to produce automata with a small number of states or a small number of non-deterministic states. In this paper, we search for properties of Biichi automata that really influence the performance of explicit model checkers. We do that by manual analysis of several automata and by experiments with common LTL-to-BA translators and realistic verification tasks. As a result of these experiences, we gain a better insight into the characteristics of automata that work well with Spin. Categories and Subject Descriptors F.4.1 [Mathematical Logic and Formal Languages]: Mathematical Logic—temporal logic; D.2.4 [Software Engineering]: Software/Program Verification—formal methods, model checking General Terms Theory, Algorithms, Verification Keywords Linear temporal logic, Biichi automata, explicit model checking 1. INTRODUCTION The automata-theoretic approach to explicit model checking of Linear-time Temporal Logic (LTL) [25] can be broken down into four steps: (1) build the state space, i.e., an automaton S representing all the possible executions of the system to be verified, (2) translate an LTL formula ip representing a desired property of the system into a Biichi SPIN ' 14, July 21-23, 2014, San Jose, CA, USA This is the authors' version of the work. It is posted here for your personal use. The definitive version was published by ACM, http://dx.doi.org/10.1145/2632362.2632377. Automaton (BA) A^v that accepts all words violating ip, (3) build the synchronous product S ® A^^, of these two systems, and finally (4) check this product for emptiness. If S ® A-^

GF(P0@NCS) meaning that if a process Po spends infinitely many steps in a critical section, then it also spends infinitely many steps in a non-critical section, 2. GF(P0@NCS) -> GF(P0@CS) meaning that if a process Po spends infinitely many steps in a non-critical section, then it also spends infinitely many steps in a critical section, 3. FG^((P0@CSAPi@CS)v(Po@CSAP2@CS)v(Pi@CSA P2@CS)) meaning that after finitely many steps, it never happens that two of the processes Po, Pi, and P2 are in a critical section at the same time. To sum up, we consider 769 + 3-23 = 838 verification tasks. All the benchmarks and measurements presented in this section are available at http://fi.muni.cz/~xstrejc/ publications/spin2014. tar .gz. Software. We use five LTL-to-BA translators presented in Table 1: Spin and LTL2BA are well established and popular translators, MoDeLLa was the first translator focusing on determinism of produced automata, and LTL3BA and Spot represent contemporary translators. The last two translators are used in several settings: the settings denoted by LTL3BA (det) and Spot (det) aim to produce more deterministic automata, while the setting called Spot (no jump) is explained in Section 4. The same version of Spin (with its default settings and the maximal search depth set to 100 000 000) is also used in all our experiments to perform all model checking steps except the LTL-to-BA translation. In particular, the partial-order reduction, which severely limits the exploration of the state-space, is enabled. Hardware. All computations are performed on an HP DL980 G7 server with 8 eight-core 64-bit processors Intel Xeon X7560 2.26GHz and 448 GiB DDR3 RAM. Each execution of Spin has been restricted by 30 minutes timeout and a memory limit of 20GiB. Table 1: Considered LTL-to-BA translators, for reference. tool version command Spin [10, 16] 6.2.5 spin -f LTL2BA [12] 1.1 ltl2ba -f MoDeLLa [21] 1.5.9 mod2spin -f LTL3BA [1] 1.0.2 ltl3ba -S -f LTL3BA (det) ltl3ba -S -M -f Spot [7] 1.2.4 ltl2tgba -s Spot (det) ltl2tgba -s -D Spot (no jump) lt!2tgba -s -x degen-lskip=0 Originally, we have measured the impact of Biichi automata on Spin by its running time. Unfortunately, our computation server is shared with other users and its variable workload has led to enormous dispersion of measured running times. We have observed a running time difference of over 300% on the same input. Hence, instead on running times, we focus on the count of visited transitions, which is a stable statistic produced directly by Spin. The number of visited transitions accumulates the numbers of product transitions explored in depth-first searches executed during a run of the NDFS algorithm (see Section 4 for a brief description of NDFS). Hence, the number of visited transitions should be proportional to the running time on a dedicated machine. For each of the 838 considered verification tasks, we translate the negation of the formula by all the mentioned translators and we run Spin on the model with each of the obtained automata. Translation of the negated formula to an automaton is instantaneous (it takes less than 0.1s) in nearly all cases: there is only one formula for which the translator built in Spin needs a couple of seconds to finish. For 823 tasks, Spin successfully finishes the computation within the given limits for at least two automata obtained by different translation tools. For each such verification task, we find the maximal and the minimal numbers of visited transitions and we compute their ratio. Intuitively, the ratio represents how many times slower Spin can be if we choose the worst of the produced automata compared to the best of those. Out of the 823 tasks, the ratio is exactly 1 only in 35 cases. In other words, in more than 95% of the considered verification tasks, the choice of an LTL-to-BA translator has an influence on running time of Spin. In fact, the ratios significantly differ for verification tasks where the model satisfies a given formula and for those with a counterexample. Out of the 823 tasks, 731 tasks contain counterexamples while 92 tasks do not. The ratios for these two sets are presented by box-plots in Figure 1. One can clearly see that the selection of a Biichi automaton has a bigger impact on the verification tasks with counterexamples (median ratio is over 5.6) than on the tasks without counterexamples (median ratio is 1.4). Both sets contain extreme cases where the ratios exceed 106. Spin also provides statistics for stored states, which is the total count of constructed and stored product states and should be proportional to the memory consumed by Spin. If we compute ratios of maximal and minimal numbers of stored states, we get the ratio 1 in 68 out of the 823 tasks. 731 tasks 92 tasks with counterexample without counterexample Figure 2: Two BA for GFa and a state space. Ai ®a, T stands for true, and ab used later means a A ->b. Formally, an edge labelled with a formula p represents all the transitions that are labelled with a subset M of atomic propositions such that M \= p. transitions states transitions states Figure 1: Impact of the Biichi automata on model checking. For each verification task, we compute ratios between the maximum and minimum number of transitions (or unique states) visited by Spin using all available Biichi automata. In each column, a box spans between the first and third quartiles, and is split by the median (whose value is given). The whiskers show the range of ratios below the first and above the third quartile that are not further away from the quartiles than 1.5 times the interquartile range. Other values are shown as outliers using circles. On Figure 1 one can see that the situation is analogous to ratios of visited transitions, but the ratios of stored states are slightly lower. To sum up, the choice of a Buchi automaton is an important issue substantially affecting both running time and memory needed for the explicit model checking process implemented in Spin. 3. STANDARD APPROACH TO OPTIMIZATION: HELPING THE PRODUCT Most of the work on optimizing the translation of LTL formulae to Buchi automata has focused on building Buchi automata with the smallest possible number of states [e.g. 4, 12, 22, 15, 24]. This is motivated by the observation that the synchronous product of a Biichi automaton A with a state space S can have the same number of states as their Cartesian product in the worst case: \S ® A\ < \S\ x \A\. Therefore, decreasing \ A\ lowers the upper bound on |.A|. However it is possible to build contrived examples where a smaller \A\ yield a larger product. For instance, removing one state in the automaton Ai of Figure 2 doubles the size of its product with the state space S of the same figure from 3 to 6 states. Of course, if S was a similar cycle of 2 states, the smaller automaton Ai would give a smaller product. Hence, one cannot hope to build an optimal property automaton A without a priori knowledge of the system S. With the introduction of LBTT [23], a tool that checks the output of different LTL-to-BA translators by doing many cross-comparisons, including some products with random state spaces, tool designers started to evaluate not only the size of the produced automata, but also the size of their products with random state spaces [e.g. 21, 8]. A recent clone of LBTT called ltlcross [6] computes multiple products with random state spaces to lessen the luck factor. Sebastiani and Tonetta [21] used this "product with a random state space" measurement to benchmark their translator MoDeLLa against other available translators to support the claim that producing "more deterministic" Biichi automata might be more important than producing small Biichi automata. Benchmarks based on the size of products may look like Table 2. The table shows that MoDeLLa generates automata that are slightly bigger than LTL2BA (its competitor in 2003) but when looking at the product, MoDeLLa causes fewer transitions to be built. If the number of transitions is proportional to the running time of a model checker and the number of states is proportional to its memory consumption, MoDeLLa has effectively traded memory for speed. MoDeLLa's results do not appear to hold today: more recent translators such as LTL3BA or the translator of Spot can produce automata that are significantly smaller and yield smaller products with random state spaces. These translators also have options to produce more deterministic automata, but the resulting products are not always better. The right part of Table 2 compares the translators by the sizes of products of produced automata with a fixed set of random systems. For instance, one can observe that even though Spot (6) produces the lowest accumulated number of product transitions in this benchmark, there are 30 formulae where the generated products have more transitions than those obtained by LTL3BA (det) (5). Conversely, automata from LTL3BA (det) produce products with more transitions than those of Spot for 76 formulae. It should be noted that optimizing A to minimize \S ® A\ is not equivalent to optimizing A for the model checking procedure, because the product S ® A is constructed on-the-fiy by most emptiness check algorithms. An emptiness check may explore a part of the product, and may explore it several times. Ultimately, any change to A should really be measured only by its effect on the model checker used. Such an evaluation was done for instance by Dax et al. [5]: in addition to explaining how to build minimal weak deterministic Biichi automata (WDBA) for a subclass of LTL, Table 2: Translation of 178 formulae from the literature [9, 22, 11] using different LTL-to-BA translators, with a timeout of 60 seconds. Column n indicates how many translations are successful within the allocated time. The automata columns show accumulated values of standard automata characteristics for all successful translations. Column ndst gives the number of non-deterministic states in the automata. All produced automata are synchronized with the same 100 random systems, and the median number of states and transitions of these products is kept. The products columns represent the medians accumulated over all successful translations. The right-most part of the table counts the number of formulae for which the translator on the row produces an automaton with higher median number of transitions in the products that the translator of the column._ automata products cases with product trans bigger than... n states ndst edges trans states trans (1) (2) (3) (4) (5) (6) (7) (8) (1) Spin 161 1739 1474 9318 46252 260934 8892105 0 102 143 107 150 150 150 146 (2) LTL2BA 178 1003 802 3360 30159 191668 5556159 5 0 137 49 161 157 156 142 (3) MoDeLLa 178 1297 647 4311 23874 216938 4193567 15 33 0 41 110 116 114 91 (4) LTL3BA 178 795 595 2209 21240 151373 4273646 0 23 126 0 149 153 152 140 (5) LTL3BA (det) 178 830 326 2405 14414 155716 2901474 0 0 10 5 0 76 75 63 (6) Spot 178 657 94 1615 10304 127792 2326271 1 6 15 5 30 0 1 1 (7) Spot (det) 178 662 88 1639 10414 128178 2328422 1 7 17 6 33 4 0 0 (8) Spot (no jump) 178 785 104 1874 12273 152592 2719360 12 28 40 27 70 61 57 0 they showed that their minimal WDBA are smaller than the non-deterministic BA produced by other translators. They also show that they improved the running times of Spin on a few verification tasks.1 We study how Spin's emptiness check can be helped by changing A in the next section. Improving the size of the product is one way to improve the performance of Spin (as the example of Section 4.5 illustrates), but there are also other aspects. For example, the location of accepting states have an influence too. 4. ANOTHER VIEW TO OPTIMIZATION: HELPING THE EMPTINESS CHECK 4.1 Emptiness Checks with Nested DFS To check the emptiness of S ® A-^v, one should search for a cycle that is reachable from the initial state and that contains at least one accepting state. The emptiness check procedure used in Spin by default is based on two nested depth-first searches [17]: the main DFS, which we shall call blue, explores the product (on-the-fly) and every time it would backtrack from an accepting state s (i.e., all successors of s have been explored by the blue DFS) it starts a second, red DFS from s. If the red DFS reaches any state on the blue DFS search stack then a reachable and accepting cycle is found (since s is reachable from all states on the blue DFS search stack) and the algorithm reports it as a counterexample. Otherwise, the red DFS terminates and the blue DFS can continue. The two DFS always ignore states that have been completely explored by an instance of the red DFS, so a state is never visited more than twice. As an extra optimization, if the blue DFS hits its own search stack by following a transition that is either going to or coming from an accepting state [13, 20], then an accepting 1We omitted their tool from our benchmark because (1) it only supports a subset of LTL, and (2) their optimization is implemented in Spot and both tools would therefore return the same automata. Besides, the subset of LTL does not include the formulae studied in Sections 4.3 and 4.5. a a a a (Bi) o (B2) s Figure 3: Automata for a a G(a —>• X(a a X(a a Xa))). Bi is inherently weak, B2 is weak. cycle can be reported without even starting any red DFS. This can be effectively applied only on products with an accepting cycle. When a counterexample exists in the product, the emptiness check may report it more or less rapidly depending on the order in which it has explored the transitions of the product. With any luck, the first transition selected at each step of the DFS will lead to an accepting cycle. Conversely, the first transitions followed might lead to a huge component of the product that just turns out to be a dead-end, and from which the emptiness check has to backtrack before finding the counterexample. As the selected transition order in S ® A^v depends on the order of the transitions in the property automaton A^v, this explains some of the huge differences noticed in Figure 1. Note that previous attempts to explore reordering of the transitions of A to help the emptiness check have been inconclusive [14], so we did not pursue this direction. (Furthermore the swarming techniques [18] used nowadays makes this topic even less attractive: in these approaches several threads compete to find a counterexample in S®A^

z of Figure 6 where so and S2 were the original initial states. 4.4 Translation Differences Most LTL-to-BA translators follow a multi-steps procedure where they first translate a given LTL formula into a generalized Biichi automaton, often with transition-based acceptance (TGBA), such as those of Figure 5. Translators then degeneralize these automata to obtain a BA. Other simplification procedures may be applied to these automata, but it turns out that the last three automata of Figure 4 were all obtained by degeneralizing Q\ in Figure 5, and their differences are due to choices made in the degeneralization procedure. When degeneralizing a TGBA Q with m acceptance sets Fi,..., Fm (the • and O on the Figure 5), the structure of Q is cloned m + 1 times. Let us call each of these clones a level. For each state of level i < m, all transitions that were originally in Fi have their destination redirected to the next level, the destination of all transitions in level m + 1 are redirected to level 1. Finally, all the states of the level m + 1 are made accepting. The initial state can be put on any level. This procedure ensures that words accepted by the degen-eralized automaton correspond to words recognized by runs of Q that visit all acceptance sets infinitely often. Accepting cycles in products involving these degeneralized automata will always involve at least m + 1 states. The degeneralization applied to Q\ with the initial state on the last level and the acceptance sets ordered as O, then •, produces the automaton Ce, of Figure 4. Recall that the edge labelled with T corresponds to the four edges labelled by ab, ab, ab, and ab in the original automaton Q\. An optimization introduced by Gastin and Oddoux [12] consists in jumping levels. If a transition of a level i < m belongs to Fi n ... D Fj■, its destination can be redirected directly to the level j + 1. Similarly, if a transition from the level m +1 is in Fi D... D Fj■, it can be redirected to the level j + 1. Implementing this optimization gives automaton c5. Changing the degeneralization order to •, then O, and putting the initial states on the first level would give automaton d. Often (but not in this example), jumping levels is a way to effectively avoid creating useless copies of some states. Another side effect of this optimization is that some accepting cycles may be shorter than m +1: the change effectively keeps the automaton as close to the accepting level as possible. If we are looking for counterexamples, C5 appear better than Ce because its accepting cycles are shorter on the average. We recall that the initial state of a degeneralized automaton can be put on any level. For example, Giannakopoulou and Lerda [15] noticed that by changing the initial level, they could sometimes save some states, so they try to use both the first and the last level and keep the smallest automaton. In our example, C4 and C5 differ only by the choice of the initial level (and degeneralization order but this is negligible as a and b are symmetric in our problem), there is no size difference, and yet it makes a huge difference in the running time of Spin, as discussed in the previous section. Another translation difference evidently comes from the difference between the generalized automata obtained from the LTL formula. In our case C4, C5, and Ce, were obtained from Q\ while C\ and C2 were obtained from Q2. (The difference with Spin (Ci) is that it does no level jumping from the accepting state.) The difference between Q\ and Q2 is caused TTTTafrffl & J) a J) a (Ci) (C2) (C3) (C4) (C5) (C6) Spin LTL2BA & LTL3BA MoDeLLa LTL3BA (det) Spot & Spot (det) Spot (no jump) Figure 4: Automata for GFa a GF& generated by different tools and options. Table 3: Statistics about generated automata and Spin's run on model bakery.7.pm and formula GFa a GF& where neither a nor b ever occurs in the model. The corresponding automata are shown in Fig. 4. automaton size statistics from Spin's execution states ndst edges trans stored states visited trans time Ci Spin 3 2 6 17 27531713 95071k 88s C2 LTL2BA & LTL3BA 3 3 8 20 27531713 95071k 99s C3 MoDeLLa 4 0 6 16 27531714 95071k 109s C4 LTL3BA (det) 3 0 8 12 27531713 95071k 101s C5 Spot & Spot (det) 3 0 8 12 27531714 190143k 211s C% Spot (no jump) 3 0 5 12 27531714 190143k 191s Figure 5: Two TGBA for GFa a GF&. Accepting runs must visit • and O infinitely often. by choices made during the translation to favor deterministic states in the case of Q\. In our example of Table 3, this improved determinism makes no difference since a and b are never true in the model. 4.5 Automata for -i(GFa GF&) We now focus on another concrete case: -i(GFa —> GF&) on mutex protocols. The formula without negation describes that if some process visits infinitely often the critical section, it infinitely often leaves it—this property holds in model peter son. 4. pm and therefore Spin has to build the whole product to find that it contains no accepting cycle. Table 4 shows a series of experiments of verification of the model peter son. 4. pm against this formula, using different tools to obtain a Buchi automaton. In this case, each tool produces a different automaton, as shown in the first part of Figure 6. Note again that automata T>2 and T>4 cannot be distinguished only by determinism and size metrics (see Table 4). They differ only in the target of the outgoing edge of so, yet we observe a significant difference in Spin's behaviors. We actually use 12 different automata for this formula. The first seven of the table are generated by the considered tools. The other are handwritten by modifying the previous automata to explore which aspects of the automata make a significant difference in Spin's behavior as described further. T>s is adapted from T>6 by changing the degeneralization level on which we enter the SCC. T>g keeps the strong initial guard of T>6 but then uses the accepting SCC of T>2. £>io is a mix of X>6 and T>2 to observe the influence of the guards ab compared to b. T>\\ is a version of T>2 in which the SCC is made deterministic as in T)§. Finally, T>\2 fixes T>5 by removing the spurious Sj. Based on Table 4 we can group these automata in three categories, listed from the best to the worst with respect to Spin's performance. Before we discuss these categories, it is important to notice that in a model where a means the process is in the critical section and b means the process leaves the critical section, we can expect most of the state space to be labelled by ab. T>6, T>7, T>8, T>o, Automata with the smallest number of transitions. Note that the no jump version (XV) and the one with a non-deterministic SCC (T>g) both yields a few more states and transitions in the product, but the difference is not significant. The key property of these automata is that they can leave state so only by reading ab, whereas other automata are more permissive. T>i,T>2,T>3,T>io,T>n All these automata exhibit more non-determinism on state so and will enter the accepting SCC even after reading ab. However when this happens, they do not reach the accepting state before ab is read, so this limits the number of red DFS. T>4,T>5,T>i2 These automata go from so to the accepting state si each time they read ab. This both makes the product unnecessarily large, but it also forces many calls to the red DFS every time a product state with property automaton state s1 is backtracked. The non-determinism in accepting SCC of T>4 causes it to visits only slightly more states than the other two automata. A comparison of automata T>q and T>n and their impact on Spin's performance show that the hypothesis of Section 4.3 cannot be used alone to select the best automaton. Table 4: Statistics about generated automata and Spin's run on the empty product between model peter-son.4.pm and formula -i(GFa —> GF&). The corresponding automata are shown in Fig. 6. automaton size statistics from Spin's execution states ndst edges trans stored states visited trans time V1 Spin 3 2 6 12 1577846 7680k 6.04s V2 LTL2BA 3 3 6 12 1577440 7684k 5.95s Z?3 MoDeLLa 5 2 8 18 1580893 7670k 6.13s V4 LTL3BA 3 3 6 12 2299250 15583k 12.10s V5 LTL3BA (det) 4 1 7 14 2297625 15561k 12.00s V6 Spot 3 1 6 9 848641 2853k 2.26s T>7 Spot (no jump) 3 1 5 9 852094 2863k 2.34s T>s 3 1 6 9 848641 2853k 2.43s T>9 3 3 6 11 852094 2878k 2.43s T>io 3 1 7 10 1575844 7658k 7.38s X>ii 3 1 6 10 1577440 7657k 7.07s T>12 3 1 6 10 2297625 15561k 12.30s Indeed, T>q outperforms T>n even if the distance from the initial to the accepting state is shorter in T>q. Here the more restrictive label of transition (so, in T>q plays an important role as well. To sum up, if we suppose that there is no accepting cycle in the product, the automaton should 1. keep accepting states as far as possible from the initial state (compare T>n to 1*12) and 2. use more restrictive labels (compare T>q to 1*12) in order to make the accepting states as hard to reach as possible. Moreover, making use of more restrictive labels can also help to reduce the product. An appropriate metric taking these two factors into account, as well as an LTL-to-BA translation reflecting these hypotheses, are topics for our future research. 5. CONCLUSION LTL-to-BA translators have several degrees of freedom when producing automata. Some of these choices have effects on the product with a system to be verified and also to the emptiness check of the product. However, these effects are difficult to predict. So far, most authors of LTL-to-BA translation tools have measured the performance of their tools by looking at the size of the output, sometimes also by looking at the size of products with random state spaces. While building a small product generally helps the emptiness check, we have provided evidence that the size of A-^v and even the size of S ® A^v does not always correlate to the performance of the emptiness check of S ® A-^v. For instance, as Spin uses a Nested DFS, the locations of accepting states of A-^