Verifying Dynamic Pointer-Manipulating Threads Thomas Noll and Stefan Rieger RWTH Aachen University Software Modeling and Verification Group 52056 Aachen, Germany {noll,rieger}@cs.rwth-aachen.de Abstract. We present a novel approach to the verification of concurrent pointer-manipulating programs with dynamic thread creation and memory allocation as well as destructive updates operating on arbitrary (possibly cyclic) singly-linked data structures. Correctness properties of such programs are expressed by combining a simple pointer logic for specifying heap properties with linear-time (LTL) operators for reasoning about system executions. To automatically solve the corresponding model-checking problem, which is undecidable in general, we abstract from non-interrupted sublists in the heap, resulting in a finite-state representation of the data space. We also show that the control flow of a concurrent program with unbounded thread creation can be characterized by a Petri net, making LTL model checking decidable (though not feasible in practice). In a second abstraction step we also derive a finitestate representation of the control flow, which then allows us to employ standard LTL model checking techniques. 1 Introduction Techniques for the verification of elementary properties of pointer programs are highly desirable. Programming with pointers is error-prone with potential pitfalls such as dereferencing null pointers and the emergence of memory leaks. So far, the field of pointer analysis has primarily focused on sequential programs. But pointer programming becomes even more vulnerable in a concurrent setting where threads can be dynamically created, and where data structures such as linked lists are shared between several threads. We present an approach to model checking concurrent programs that operate on singly-linked data structures. It stays within the realm of traditional (linear-time) model checking. This facilitates the usage of standard model checkers for validating temporal properties addressing absence of memory leaks, dereferencing of null pointers, dynamic creation of cells, and simple and position-dependent aliasing. Our approach is illustrated by considering a simple concurrent programming language that besides the usual control structures offers primitives for thread creation, pointer manipulation, cell creation and destruction, and (guarded) atomic regions that allow to implement concurrency control constructs such as test-andset primitives, semaphores, and monitors. J. Cuellar and T. Maibaum (Eds.): FM 2008, LNCS 5014, pp. 84­99, 2008. c Springer-Verlag Berlin Heidelberg 2008 Verifying Dynamic Pointer-Manipulating Threads 85 The operational semantics of our language is defined in a modular way. The control-flow semantics is given by a (finite) Petri net whose places correspond to the control locations of the program. The heap semantics is specified by transformation rules which describe the effect of executing single commands. The combination of both yields a labeled transition system (modeled by a Petri net) which is generally infinite due to the unbounded creation of both control threads and heap cells. Its desirable properties are expressed in a firstorder linear-time temporal logic (LTL) that is enriched with assertions on pointer structures such as reachability and freshness of cells, or pointer aliasing. Since the model-checking problem is generally undecidable in this setting we introduce a first abstraction, which addresses the data space of the program. Our list abstraction exploits a variant of summary nodes [7] to obtain a finite representation of the heap and thus eliminates one potential source of undecidability. In fact, known results then allow us to conclude that the data abstract model-checking problem is decidable even though the underlying transition system is still infinite (see Thm. 5.7). However, its intractability forces us to apply a second abstraction step in which we also derive a finite-state representation of the control flow, which altogether yields a finite transition system. As a result, standard LTL model-checking algorithms can be employed. Both abstractions are obtained in a fully mechanized manner. Moreover they are sound in the sense that they over-approximate the concrete program behavior. 2 Related Work Related work on the topic of analyzing pointer-manipulating programs can be classified into the following (often overlapping) categories, which mainly focus on sequential programming languages: predicate abstraction [1,8,23], shape analysis [2,26,27], regular model checking [3,5], dataflow analysis [21,33,34], Hoare-style approaches [6,18], and separation logic [22,24]. In summary, many of the characterizing features of our approach are already present in earlier papers: the restriction to singly-linked lists without data fields [1,3,11,16,19,20] which still allows to model many practical applications such as device drivers, the introduction of abstract entities which represent a potentially unbounded number of heap cells (called "summary nodes" in [7]), and the observation that, in this setting, the number of sharing points in heap structures is bounded by the number of program variables [1,4,20]. Pointer analysis in connection with concurrency is only considered in rather few places. Most publications concentrate on specific questions such as aliasing or escape analysis [25,28] or the analysis of safety properties [13,30], or particular applications such as concurrent garbage collection are studied [9,10,29]. To our knowledge, the only pointer logics allowing to specify liveness properties of concurrent systems are ETL [31] and NTL [11]. In contrast to these, however, we avoid the use of temporal operators inside quantification. In this way, involved mechanisms to keep track of the identities of individual heap nodes are not required. 86 T. Noll and S. Rieger Thus our approach is unique in that it supports concurrent programs with dynamic thread creation, memory allocation, and destructive updates operating on arbitrary (possibly cyclic) linked lists. Moreover it integrates both abstraction and model checking in a fully automated way and supports a linear-time logic in which both safety and liveness properties can be expressed, allowing to use standard LTL model checkers. 3 A List-Manipulating Programming Language Given sets PV of program variables and P of thread names, a dynamic listmanipulating program (DLMP) has the form (vi, v PV and pj, p P) = var v1, ..., vk; proc main(S0); p1(S1); ...; pl(Sl) Here each Si (0 i l) is of the form si1; ...; siri with sij CMD, where CMD is the set of the following commands: PExp := PExp pointer assignment new(PExp) object creation if BExp goto n conditional jump del(PExp) object destruction goto n unconditional jump spawn(p) spawn instance of thread p atc(BExp) guarded atomic region exit thread termination end atc end of atomic region Pointer expressions (PExp) comprise the special constant nil denoting an undefined pointer value, a program variable, or the (de)referencing of a program variable. Arbitrary dereferencing depths can be emulated using a sequence of atomic assignments. The Boolean expressions (BExp) are standard. PExp ::= nil | v | v | &v BExp ::= PExp = PExp | BExp BExp | BExp Note that we do not allow nesting of atomic regions. In the following we assume for simplicity that as above is globally given (if not mentioned otherwise). Fig. 1 shows a DLM-program that simulates a simple server/worker scenario. The server creates new objects in an infinite loop and inserts them into a list. For each object a new worker thread is spawned deleting one object from the list when it is executed. Without imposing fairness constraints this may lead to an infinite number of both objects and threads. Petri Nets. We use Petri nets to describe the operational semantics of DLMPs. Definition 3.1. A Petri net is a tuple P = (P, T, src, tgt, , m0) where P is a set of places, T a set of transitions, src, tgt : T 2P associate each transition with its source and target places, : T L is a transition labeling function, and m0 : P the initial marking. A state of P is a marking m : P . The set of all markings is denoted by Mark(P). Petri nets are high-level representations of (infinite) transition systems whose transitions are characterized by the token game. If in a marking m a transition t is enabled, i.e. m(p) > 0 for all p src(t), and if its firing yields m , we write m t m . m m means that there exists t T such that m t m . Verifying Dynamic Pointer-Manipulating Threads 87 Definition 3.2 (Run). Let P = (P, T, src, tgt, , m0) be a Petri net. A run of P is a (possibly infinite) sequence of markings = m0m1m2... Mark(P) Mark(P) such that mi mi+1. The set of all those runs is denoted by Runs(P). For = m0m1... Runs(P) let || {} be the length of . We write [k] to denote the suffix starting from the k-th marking, i.e., mkmk+1... Runs(P, T, src, tgt, , mk) which implies [k] = for || k, and we set i := mi. Finally we call a Petri net k-safe if at no time any place holds more than k tokens and bounded if there exists a k for which it is k-safe. Clearly only bounded Petri nets can be represented by finite transition systems. var x, y; proc main( 01 new(x); 02 spawn(server); ) server( 11 spawn(worker); 12 atc(tt); 13 y := x; 14 new(x); 15 x := y; 16 end atc; 17 goto 11; ) worker( 21 atc(x = nil); 22 y := x; 23 x := x; 24 del(y); 25 end atc; ) Fig. 1. Server/Worker Concrete Heap Semantics. Defining the semantics of DLM-programs requires a formal model of the heap. Definition 3.3. A heap configuration is a tuple H = (N, A, , F) with a set of nodes N PV , a set of abstract nodes A N \ PV , a successor function : N Nnil (where Nnil := N {nil}), and a set of flags F Flags := {err, leak, del} {newn | n N} {spawnp | p P}. H denotes the set of all heap configurations; H H the set of all concrete ones (i.e., those with A = ). The nodes represent both the dynamic objects at runtime and the static program variables (which cannot be deleted). Edges, as formalized by the -function, encode the points-to information of a specific program state. The set A of abstract nodes will later be used for our heap abstraction technique (see Sct. 4) and will be empty throughout the current section. Finally the flags give special information about a state, e.g., whether a runtime error or memory leak occurred, a node was created or deleted, or a thread has been spawned. To delete unreachable nodes that do not influence program semantics a garbage collection mapping denoted by is applied. Whenever it removes an unreachable node, it sets the leak flag to indicate a potential memory leak. Definition 3.4. Let H = (N, , , F) H. The semantics of pointer expressions is given by the partial function [[ ]] : PExp Nnil , defined as follows (where denotes the undefined value).1 [[nil ]] := nil [[v ]] := (v) [[v ]] := ([[v ]]) [[&v ]] := v The semantics of Boolean expressions, [[ ]] : BExp , is standard but strict, i.e., it becomes undefined if at least one subexpression is undefined. 1 The definition implies (nil) = and so [[ ]] can indeed yield undefined results. 88 T. Noll and S. Rieger The effect of executing a program statement is captured by a transition relation which associates the source and target heap configuration with the given statement and an indicator from the set {0, 1, }. Here 1 denotes the normal execution of a statement or the selection of the then-branch of an if-command, 0 only occurs in the else-branch of if-statements, and represents the failure of a command (e.g., dereferencing a null-pointer). Definition 3.5. The heap transformation relation, h (H\{Herr})×CMD× {0, 1, } × H, is given as follows. Here Herr := (PV , , {v nil | v PV }, {err}), H = (N, A, , F) H \ {Herr} with A = , and f[x/y] denotes a function update where y is the new value of x. (We only show some example rules.) [[ ]] = H, v := 1 h (N, A, [v/[[ ]]], ) [[ ]] = H, v := h Herr H, new(v) 1 h (N {n}, A, [v/n], {newn}) [[ ]] = nil H, del() 1 h (N \ {[[ ]]}, A, [[[ ]]/, -1 ([[ ]])/nil], {del}) [[b ]] = H, if b goto n [[b ]] h (N, A, , ) [[b ]] = H, if b goto n h Herr [[b ]] = 1 H, atc(b) 1 h ^H [[b ]] = H, atc(b) h Herr Note that the heap flags (except err) are only active in the configuration directly following the corresponding event. As our final goal is to combine the heap and control-flow semantics, we now represent the heap transformation relation by a Petri net. The labels will later be used for synchronizing the two nets. Definition 3.6. The concrete heap semantics is the (infinite, 1-safe) Petri net Ph := (P, T, src, tgt, , m0) with P H, T = {(H, H , c, x) | H, c x h H )}, src(H, H , c, x) = {H}, tgt(H, H , c, x) = {H }, (H, H , c, x) = (c, x), m0(H0) = 1 for a given H0 P (typically the empty heap), and m0(H) = 0 for H = H0. Control-Flow Semantics. In the context of concurrency and dynamic threading it does not suffice to only consider the effects of certain statements on the heap; the control flow of the program is also crucial. It can again be modeled by a Petri net. Definition 3.7. The control-flow semantics of is given by the Petri net Pc := (P, T, src, tgt, , m0) with P = {lock} l i=0 ri j=1{ij}, : T CMD×{0, 1, }, m0(01) = 1, m0(lock) = 1 and m0(p) = 0 for all p / {01, lock}. For 0 i l and 1 j ri let lockij be the singleton set containing lock if sij is not inside an atomic region and the empty set otherwise. The transitions (T , src and tgt) are then given as follows: Verifying Dynamic Pointer-Manipulating Threads 89 sij (t) src(t) tgt(t) if b goto n (sij, 0) {ij} lockij {i(j + 1)} lockij (sij, 1) {ij} lockij {in} lockij (sij, ) {ij} lockij goto n (sij, 1) {ij} lockij {in} lockij atc(b) (sij, 1) {ij, lock} {i(j + 1)} (sij, ) {ij, lock} end atc (sij, 1) {ij} {i(j + 1), lock} spawn(px) (sij, 1) {ij} lockij {i(j + 1), x1} lockij exit (sij, 1) {ij} lockij lock := , new(), del() (sij, 1) {ij} lockij {i(j + 1)} lockij (sij, ) {ij} lockij If one of the target places is not in P we omit the corresponding out-edge (e.g. in case of thread termination or a jump out of range). Example 3.8. The graph in Fig. 2 shows the Petri net modeling the control-flow semantics of the program from Fig. 1. The round nodes represent the places and the rectangles the (labeled) transitions of the net. If there are incoming and outgoing edges to the same place they are drawn as bidirectional arrows. In the initial state there are only tokens in the places 01 and lock. Concrete DLMP-Semantics. Now that we defined the heap as well as the control flow semantics of our programming language we have to combine both. Definition 3.9. Let Pc = (Pc , T c , srcc , tgtc , c , mc 0) be the control flow and Ph = (Ph , T h , srcc , tgtc , h , mh 0 ) the heap semantics of . The concrete semantics of is the Petri net P := Pc Ph := (Pc Ph , T, src, tgt, , m0) where T = {((tc , th ) T c × T h | c (tc ) = h (th )} src(tc , th ) = src(tc ) src(th ) tgt(tc , th ) = tgt(tc ) tgt(th ) (tc , th ) = c (tc ) m0(p) = mc 0(p) if p Pc mh 0 (p) otherwise As one might suspect the concrete semantics cannot be used as-is in verification techniques since DLM-programs are Turing complete2 . 4 Data Abstraction To tackle the verification problem we use heap abstraction techniques to generate a data abstract semantics that over-approximates the behavior of the concrete one, i.e., whose runs cover all concrete ones. In our setting this approach is correct but generally incomplete: although we can conclude from the satisfaction of a 2 DLM-programs can simulate a counter machine (the values of the counters are represented by lists of the corresponding length). 90 T. Noll and S. Rieger 01 ˇ 02 new(x), 1 new(x), spawn(server), 1 11 spawn(worker), 1 12 13 14 15 atc(tt), 1atc(tt), y := x, 1y := x, new(x), 1new(x), 16 x := y, 1x := y, 17 end atc, 1 goto 1, 1 21 22 23 24 atc(x = nil), 1 atc(x = nil), y := x, 1 y := x, x := x, 1 x := x, 25 del(y), 1 del(y), end atc, 1 lock ˇ Fig. 2. Control-flow semantics for the server/worker example property in the abstract state space the validity in the concrete case, the inverse is not possible anymore. Our heap abstraction is parameterized via a global constant M which allows a systematic refinement. For a given M > 0 we set := {0, 1, ..., M, }, where represents all values greater than M. x x Fig. 3. An Abstraction Morphism Chain Abstraction and Canonical Configurations. For heap abstraction we adopt the idea of summary nodes. Summary nodes are not allowed to represent arbitrary structures but only socalled chains which are non-interrupted sublists, i.e., list segments where only the head node is allowed to have more than one predecessor. This abstraction technique is well known [7,11,26]. For further details you may also refer to [15]. Based on the concept of chains one can define so-called abstraction morphisms which are surjective functions of the type h : N1 N2 for Hi = (Ni, Ai, i, Fi) H that retain the graph structure while collapsing chains of length greater than M to abstract nodes. In Fig. 3 an example is depicted. Verifying Dynamic Pointer-Manipulating Threads 91 We write H2 H1 to denote that there is an abstraction morphism that abstracts H1 to H2. In this context we will also write h(H1) = H2 lifting h to heap configurations. If |N1| = |N2|, h is an isomorphism. We then write H1 = H2. Note that a given source configuration can give rise to different abstractions. To obtain a unique canonical representation of a concrete heap configuration we collapse only maximal chains and do not abstract nodes that are closer than three -steps to a program variable. This yields a concrete expression semantics. The set of all such canonical configurations is denoted by H . It can be shown that for every concrete heap configuration a unique canonical configuration exists which is related to the former by a morphism h [15]. We will use it as abstraction function in the following. The lower graph in Fig. 3 is a canonical configuration. Abstract Heap Semantics. Regarding the expression semantics nothing needs to be modified in the data-abstract setting: in a canonical configuration, abstract nodes have a distance greater than two from the variable nodes such that every pointer expression refers to a concrete node. The expression semantics can therefore be chosen identical to the concrete case (Def. 3.4), now interpreted on canonical configurations. Definition 4.1 (Abstract Heap Transformation Relation). The abstract heap transformation relation h (H /= \{{Herr}})×CMD×{0, 1, }×H /= is depicted in Fig. 4 for H = (N, A, , F) H . We focus on assignments since the other rules are analogous to the concrete case. For simplicity we use representatives of the isomorphism classes. In Fig. 4 the semantic rules are visualized by examples. Rules 1 and 2 lead to a potential increase in the distance from variables to abstract nodes: consider an assignment of the form y := nil. If y points into a list whose head is referred to by another variable, we possibly increase the distance from that variable to abstract nodes. The assignment therefore potentially yields a non-canonical configuration making a re-abstraction necessary. In rule 3 there might be the necessity for both concretization and abstraction. The execution of the assignment yields an intermediate configuration which is generally not canonical since the variable v could now be too close to an abstract node. Therefore we have to find a more concrete configuration H whose abstraction yields the intermediate configuration. There might be more than one solution, thus this rule is nondeterministic (indicated by dashed arrows). After the concretization a re-abstraction is used to obtain the canonical form. Due to our canonical representation, H /= is finite and its size depends (linearly) on the number of program variables and the value of the precision constant M3 . This implies the finiteness of the abstract heap semantics but not the boundedness of the data abstract program semantics as defined below. 3 The number of nodes is bounded by (2M + 3) |PV |. 92 T. Noll and S. Rieger (1) /PV H,v:= 1 hh((N,A,[v/[[]]],)), (2) [[v]]=nil[[]]= H,v:= 1 hh((N,A,[(v)/[[]]],)), (3) HHs.t.(N,A,[v/[[w]]],)Handh(H)H[[w]]=nil H,v:=w 1 hh(H), (4) [[]]=[[]]= H,:= hHerr, (1,2)v:=w(analogously:v=nil,v=&w,v:=w,v=w,v=&w) w v assign x GCabstract w v x w v x w v x (3)v:=w w v assign x GC abstract concretize wv x wv x wv xwv x abstract wv xwv x Fig.4.Abstractrulesandexemplaryvisualization(M=2) Verifying Dynamic Pointer-Manipulating Threads 93 Definition 4.2. The abstract heap semantics is the Petri net Ph := (P, T, src, tgt, , m0) with P H /=, T = {(K, K , c, x) | K, c x h K }, (K, K , c, x) = (c, x), src(K, K , c, x) = {K}, tgt(K, K , c, x) = {K } and m0(K0) = 1 for a K0 P (e.g. the empty heap congruence class) and m0(K) = 0 for K = K0. The data abstract program semantics is given by the Petri net P := Pc Ph where is as in Def. 3.9. 5 A Logic for Pointer Programs In the previous sections we have defined our programming language and both its concrete and abstract semantics. In this section we will present a logic which allows us to reason about heap configurations and program behavior. In the following LV denotes a set of logical variables with LV PV = . Pointer Logic. Pointer Logic deals with single configurations, and can be used to express graph properties as well as to inspect the special heap flags. Definition 5.1. We define the set of Pointer Logic formulae (PL-formulae) as follows: NExp ::= nil | v ( PV ) | x ( LV ) | NExp Atomic ::= tt | ff | f ( Flags) | NExp = NExp | NExp NExp PL ::= Atomic | PL | PL PL | x : PL As usual we will use the logical operations , , and as abbreviations. In contrast to pointer expressions in DLM-programs, the logic supports dereferencing operations of arbitrary depth. The predicate expresses the reachability of heap objects, whereas = is true iff both expressions refer to the same object. Definition 5.2. Let : LV N be a valuation function instantiating logical variables with heap nodes and (N, , , F) H a concrete heap configuration. Then we define [[ ]] : NExp Nnil for x LV , v PV and NExp by: [[nil ]] := nil [[v ]] := v [[x ]] := (x) [[ ]] := ([[ ]]) Note the semantic difference in comparison to the programming language. In the logic a variable v is interpreted by itself and not by the node it is referencing. This allows the check for identity of program variables without introducing a reference operator. The semantics of PL with respect to concrete heap configurations is quite standard and therefore omitted. Reasoning about Abstract Computations. When switching to abstract configurations we run into several complications since logical variables can be bound to both concrete and abstract nodes. In the latter case we have to record to which concrete node, represented by the abstract node, a variable is bound. This could lead to undefinedness of PL-formulae. The problem mainly occurs in direct comparisons of the form = . To solve it we choose the global precision constant M in dependence of the formula PL, assuming from now on that M xVariables() {j + 1 | j x occurs in }, and introduce the concept of abstract valuations. 94 T. Noll and S. Rieger Given H H and PL, an abstract valuation is of the form = (, o, ) where : PV N maps logical variables to (abstract) nodes, o : PV denotes the offset of a variable "inside" an abstract node, and : PV PV is a "distance matrix" for the logical variables (referring to the same abstract node). is only defined if both arguments are mapped to the same entity, and o is only different from 1 if the corresponding variable is mapped to an abstract node. The set of all such valuations will be denoted by ValH,. Using this concept one can define a function dH, : NExp × NExp {0, 1, } measuring the "distance" of pointer expressions, where distance here means either 0 if the expressions are mapped onto the same (concrete) node, 1 if the second argument is reachable from the first, or if neither is the case (see [15] for details). Definition 5.3. Let H = (N, A, , F) H and = (, o, ) ValH,. The satisfaction relation |= for PL-formulae on canonical configurations is then given as follows (omitting the trivial cases): H, |= f iff f F, where f Flags H, |= 1 = 2 iff dH,(1, 2) = 0 H, |= 1 2 iff dH,(1, 2) 1 H, |= x : iff n N, off , dist : V () s.t. H, ([x/n], o[x/off ], [x/dist]) |= H |= iff ValH, s.t. H, |= [H]= |= iff H |= Temporal Pointer Logic. Pointer Logic enables us to express properties of single configurations. However it cannot be used to specify (ongoing) computations, i.e., configuration sequences. To this aim we extend it by temporal operators. Definition 5.4. The set of Temporal Pointer Logic formulae (TPL-formulae) is given as follows: TPL ::= PL | TPL | TPL TPL | X TPL | TPL U TPL For TPL we use the the abbreviations F := ttU and G := F. Note that it is not possible to nest PL-quantifiers and temporal operators. To do so it would be necessary to keep track of the object identities between states, which is difficult in the presence of abstract nodes. In addition it would blow up the state space and exclude the use of standard model checking algorithms. Only a few approaches support this idea [11,31]; most other works in the area consider only the shape of the heap. Clearly this restriction results in a loss of expressivity, nonetheless we can specify many interesting properties. Example 5.5. For our server/worker system from Fig. 1 it holds true: 1. GX tt (never deadlock, i.e., there is always a successor state) 2. F err (no pointer errors) 3. GF n : newn (new objects are created infinitely often) 4. GF spawnworker (infinitely often worker processes are spawned) Verifying Dynamic Pointer-Manipulating Threads 95 5. G(n : newn F spawnworker) (for every new object a worker thread is spawned) 6. G(spawnworker F del) (the creation of a worker process does not necessarily result in the deletion of a node, i.e., fairness is not guaranteed) More general correctness properties are: 7. F v = w (v and w will eventually become aliases) 8. G(x : (v x w x)) (v and w always point to disjoint heap parts) 9. G(x : (v x (y : (x y y x)))) (v always points to a non-cyclic list) 10. FG(leak) (only finitely many memory leaks can occur) 11. G(x : (v x (y : (y x v y)))) (v always points to a chain) As mentioned before TPL specifies computation paths. These are given as sequences of heap configurations according to the Petri net representing the program semantics. By construction, for each marking m there is exactly one p H /= H such that m(p) = 1. Definition 5.6. Let P = (P, T, src, tgt, , m0) be the abstract (or concrete) semantics of . For a given run Runs(P ) the satisfaction relation |= for TPL, assuming w.l.o.g. that the maximal PL-subformulae in are closed, is defined as follows (again omitting the trivial cases): |= |= ( PL) iff = p P (H /= H) : 0(p) = 1 p |=PL |= X iff [1] |= |= U iff k || : [k] |= and j < k : [j] |= We write P |= iff |= for all Runs(P ) and |= iff Pc Ph |= . Note that finite traces are included in the semantics of TPL. This implies that the equivalence X X does generally not hold. Model Checking Temporal Pointer Logic. The Turing completeness of DLM-programs implies that the model checking problem for TPL-formulae is undecidable. The following theorem shows that it suffices to employ data abstraction to obtain a positive result. Theorem 5.7. The data-abstract model checking problem is decidable, i.e., we can decide whether Pc Ph |= . Proof. The idea is to evaluate all maximal PL-subformulae on the heap configurations, to label (the transitions of) P by atomic propositions and accordingly eliminate the PL-subformulae in to obtain an LTL-formula (see Algorithm 6.4). The next step is to construct two automata A and B where A is a finite automaton recognizing the finite words, and B a nondeterministic B¨uchi-automaton accepting the infinite words satisfying . Then according to [12] the model checking problem is decidable using a formula of the type defined in [32] to formulate the B¨uchi acceptance condition for B and a reduction to the reachability problem for Petri net markings that is decidable in EXPSPACE [17]. 96 T. Noll and S. Rieger The result is important but more of theoretical interest due to the high complexity of the problem. Thus we have to apply further simplifications to obtain practically feasible results. 6 Control-Flow Abstraction The idea of the control-flow abstraction is similar to the data abstraction. Instead of recording for each Petri net place the exact number of tokens we only do this up to a certain resolution. A global constant C parameterizes the resolution bound. := {0, ..., C, } is used analogously to . What we obtain is an overapproximation Pc of the concrete control-flow semantics Pc . The first step is the modification of the Petri net semantics. Definition 6.1. An abstract Petri net is of the form P = (P, T, src, tgt, , m0) with abstract markings that are functions of the type m : P . Definition 6.2. Let P = (P, T, src, tgt, , m0) be an abstract Petri net, m, m Mark(P) and t T . Then t Mark(P) × T × Mark(P) is given by4 : m t m p srct : m(p) > 0 p P : m (p) = m(p) - 1 if p src(t) \ tgt(t) and m(p) = C or if p src(t) \ tgt(t) and m(p) = m(p) + 1 if p tgt(t) \ src(t) m(p) otherwise The abstract control-flow semantics Pc is defined as the concrete one, but using the abstract transition relation . Definition 6.3. The abstract semantics of is the Petri net P := Pc Ph . If we now want to apply model checking, i.e., verify that a TPL-formula is satisfied by P , we evaluate all maximal PL-subformulae of on the heaps in P , substitute them by atomic propositions, generate the underlying (finite) transition system, label it with atomic propositions according to the evaluation of subformulae, and solve the resulting model checking problem for LTL with finite traces [14]. Algorithm 6.4. Let P = (P, T, src, tgt, , m0) be given and TPL the formula to verify. Let := { PL | is a maximal subformula of } = {1, ..., r} and a1, ..., ar be atomic propositions. 1. Generate a finite transition system T := ({m | m0 m}, m0, , lab) with lab(m) := r i=1 {ai | p P H /= : m(p) = 1 p |= i} 2. Solve T |=? LTL [1/a1, ..., r/ar] (admitting finite traces). 4 Note that t can be nondeterministic for a given transition t. Verifying Dynamic Pointer-Manipulating Threads 97 100 1000 10000 100000 1e+06 1 10 100 1000 #states Value of M (precision of data abstraction) C=1 C=5 C=10 C=20 C=50 Fig. 5. Size of the state space for the server/worker example Limitations and Refinement. Due to the over-approximation of the state space, there may exist abstract computations falsifying the property to verify and not corresponding to concrete ones. These false negatives can be eliminated through abstraction refinement by increasing the parameters M and C. The size of the state space is a linear function wrt. M (and C). This is visualized in Fig. 5 for our server/worker example employing a prototype version of our tool which is currently under development (note the logarithmic scale of both axes). Thanks to the implicit universal quantification over paths in the LTL approach, however, the successful verification of a property in the abstract case implies its correctness in the concrete case, i.e., false positives are excluded. Note that our framework can be easily extended to three truth values, to eliminate false positives. The "don't know" answer would then only be given if the resulting transition system contains both positive and negative traces. In the other cases the answer would be an exact "yes" or "no". This would require the additional checking of a CTL formula in the case that the LTL model checker falsifies the property. If the answer is "don't know" a refinement step by increasing M and/or C is necessary. 7 Conclusions and Future Work We have presented a framework for the verification of concurrent pointer-manipulating programs with dynamic thread creation, unbounded heap size, and destructive updates. Correctness properties are specified using temporal pointer logic (TPL) which is essentially a pointer logic for expressing heap properties enriched with temporal operators. Rather than requiring dedicated algorithms, the TPL model checking problem is reduced to an LTL model checking problem by 98 T. Noll and S. Rieger appropriate abstractions. The trade-off is the restriction to list-like data structures and to static variables as well as the limitation in expressiveness of the logic because object identities are not tracked between configurations. Currently we are implementing the method to analyze more interesting examples. We are planning to support the user in handling abstract computations which violate a given property, either by deriving concrete counterexamples or by suggesting refinements to eliminate false negatives. Finally we are working on an extension to arbitrary data structures. Acknowledgments. We would like to thank Ulrich Schrempp for developing the prototype implementation of our analysis framework, which was used for computing the state spaces in the server/worker example. References 1. Balaban, I., Pnueli, A., Zuck, L.D.: Shape Analysis by Predicate Abstraction. In: Cousot, R. (ed.) VMCAI 2005. LNCS, vol. 3385, pp. 164­180. Springer, Heidelberg (2005) 2. Beyer, D., Henzinger, T.A., Th´eoduloz, G.: Lazy shape analysis. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 532­546. Springer, Heidelberg (2006) 3. Bouajjani, A., Bozga, M., Habermehl, P., Iosif, R., Moro, P., Vojnar, T.: Programs with lists are counter automata. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 517­531. Springer, Heidelberg (2006) 4. Bouajjani, A., Habermehl, P., Moro, P., Vojnar, T.: Verifying programs with dynamic 1-selector-linked list structures in regular model checking. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 13­29. Springer, Heidelberg (2005) 5. Bouajjani, A., Habermehl, P., Rogalewicz, A., Vojnar, T.: Abstract regular tree model checking of complex dynamic data structures. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 52­70. Springer, Heidelberg (2006) 6. Bozga, M., Iosif, R., Lakhnech, Y.: On logics of aliasing. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 344­360. Springer, Heidelberg (2004) 7. Chase, D.R., Wegman, M., Zadeck, F.K.: Analysis of pointers and structures. In: PLDI 1990, pp. 296­310. ACM Press, New York (1990) 8. Dams, D., Namjoshi, K.S.: Shape Analysis through Predicate Abstraction and Model Checking. In: Zuck, L.D., Attie, P.C., Cortesi, A., Mukhopadhyay, S. (eds.) VMCAI 2003. LNCS, vol. 2575, pp. 310­323. Springer, Heidelberg (2002) 9. Das, S., Dill, D.L.: Successive approximation of abstract transition relations. In: LICS 2001, pp. 51­58. IEEE Computer Society Press, Los Alamitos (2001) 10. Das, S., Dill, D.L., Park, S.: Experience with Predicate Abstraction. In: Halbwachs, N., Peled, D.A. (eds.) CAV 1999. LNCS, vol. 1633, pp. 160­171. Springer, Heidelberg (1999) 11. Distefano, D., Katoen, J.-P., Rensink, A.: Safety and liveness in concurrent pointer programs. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 280­312. Springer, Heidelberg (2006) 12. Esparza, J.: On the decidability of model checking for several -calculi and Petri nets. In: Tison, S. (ed.) CAAP 1994. LNCS, vol. 787, pp. 115­129. Springer, Heidelberg (1994) 13. Gotsman, A., Berdine, J., Cook, B., Sagiv, M.: Thread-modular shape analysis. In: PLDI 2007, pp. 266­277. ACM Press, New York (2007) Verifying Dynamic Pointer-Manipulating Threads 99 14. Havelund, K., Rosu, G.: Testing linear temporal logic formulae on finite execution traces. Technical Report TR 01-08, RIACS (2001) 15. Katoen, J.-P., Noll, T., Rieger, S.: Verifying concurrent list-manipulating programs by LTL model checking. Technical Report 2007-06, RWTH Aachen University, Dept. of Computer Science, Germany (April 2007) 16. Lahiri, S.K., Qadeer, S.: Verifying properties of well-founded linked lists. In: POPL 2006, pp. 115­126. ACM Press, New York (2006) 17. Lambert, J.L.: A structure to decide reachability in Petri nets. Theor. Comput. Sci. 99(1), 79­104 (1992) 18. Lev-Ami, T., Immerman, N., Reps, T.W., Sagiv, S., Srivastava, S., Yorsh, G.: Simulating Reachability Using First-Order Logic with Applications to Verification of Linked Data Structures. In: Nieuwenhuis, R. (ed.) CADE 2005. LNCS (LNAI), vol. 3632, pp. 99­115. Springer, Heidelberg (2005) 19. Manevich, R., Berdine, J., Cook, B., Ramalingam, G., Sagiv, M.: Shape analysis by graph decomposition. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 3­18. Springer, Heidelberg (2007) 20. Manevich, R., Yahav, E., Ramalingam, G., Sagiv, M.: Predicate abstraction and canonical abstraction for singly-linked lists. In: Cousot, R. (ed.) VMCAI 2005. LNCS, vol. 3385, pp. 181­198. Springer, Heidelberg (2005) 21. Nystrom, E.M., Kim, H.-S., Hwu, W.W.: Bottom-up and top-down contextsensitive summary-based pointer analysis. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 165­180. Springer, Heidelberg (2004) 22. O'Hearn, P.W., Yang, H., Reynolds, J.C.: Separation and information hiding. In: POPL 2004, pp. 268­280. ACM Press, New York (2004) 23. Podelski, A., Wies, T.: Boolean Heaps. In: Hankin, C., Siveroni, I. (eds.) SAS 2005. LNCS, vol. 3672, pp. 268­283. Springer, Heidelberg (2005) 24. Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In: LICS 2002, pp. 55­74. IEEE Computer Society Press, Los Alamitos (2002) 25. Rugina, R., Rinard, M.: Pointer analysis for multithreaded programs. SIGPLAN Not. 34(5), 77­90 (1999) 26. Sagiv, M., Reps, T., Wilhelm, R.: Solving shape-analysis problems in languages with destructive updating. ACM Trans. Program. Lang. Syst. 20(1), 1­50 (1998) 27. Sagiv, M., Reps, T., Wilhelm, R.: Parametric shape analysis via 3-valued logic. ACM Trans. Program. Lang. Syst. 24(3), 217­298 (2002) 28. Salcianu, A., Rinard, M.: Pointer and escape analysis for multithreaded programs. In: PPoPP 2001, pp. 12­23. ACM Press, New York (2001) 29. Vechev, M.T., Yahav, E., Bacon, D.F.: Correctness-preserving derivation of concurrent garbage collection algorithms. In: PLDI 2006, pp. 341­353. ACM Press, New York (2006) 30. Yahav, E.: Verifying safety properties of concurrent Java programs using 3-valued logic. ACM SIGPLAN Notices 36(3), 27­40 (2001) 31. Yahav, E., Reps, T., Sagiv, M., Wilhelm, R.: Verifying Temporal Heap Properties Specified via Evolution Logic. In: Degano, P. (ed.) ESOP 2003 and ETAPS 2003. LNCS, vol. 2618, pp. 204­222. Springer, Heidelberg (2003) 32. Yen, H.-C.: A unified approach for deciding the existence of certain Petri net paths. Inf. Comput. 96(1), 119­137 (1992) 33. Yong, S.H., Horwitz, S.: Pointer-range analysis. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 133­148. Springer, Heidelberg (2004) 34. Zhu, J., Calman, S.: Symbolic pointer analysis revisited. In: PLDI 2004, pp. 145­ 157. ACM Press, New York (2004)