Automata theory An algorithmic approach 0 Lecture Notes Javier Esparza July 20, 2012 2 Chapter 9 Automata and Logic A regular expression can be seen as a set of instructions ( a ‘recipe’) for generating the words of a language. For instance, the expression aa(a + b)∗b can be interpreted as “write two a’s, repeatedly write a or b an arbitrary number of times, and then write a b”. We say that regular expressions are an operational description language. Languages can also be described in declarative style, as the set of words that satisfy a property. For instance, “the words over {a, b} containing an even number of a’s and an even number of b’s” is a declarative description. A language may have a simple declarative description and a complicated operational description as a regular expression. For instance, the regular expression (aa + bb + (ab + ba)(aa + bb)∗ (ba + ab))∗ is a natural operational description of the language above, and it is arguably less intuitive than the declarative one. This becomes even more clear if we consider the language of the words over {a, b, c} containing an even number of a’s, of b’s, and of c’s. In this chapter we present a logical formalism for the declarative description of regular languages. We use logical formulas to describe properties of words, and logical operators to construct complex properties out of simpler ones. We then show how to automatically translate a formula describing a property of words into an automaton recognizing the words satisfying the property. As a consequence, we obtain an algorithm to convert declarative into operational descriptions, and vice versa. 9.1 First-Order Logic on Words In declarative style, a language is defined by its membership predicate, i.e., the property that words must satisfy in order to belong to it. Predicate logic is the standard language to express membership predicates. Starting from some natural, “atomic” predicates, more complex ones can be constructed through boolean combinations and quantification. We introduce atomic predicates Qa(x), where a is a letter, and x ranges over the positions of the word. The intended meaning is “the letter at 159 160 CHAPTER 9. AUTOMATA AND LOGIC position x is an a.” For instance, the property “all letters are as” is formalized by the formula ∀x Qa(x). In order to express relations between positions we add to the syntax the predicate x < y, with intended meaning “position x is smaller than (i.e., lies to the left of) position y”. For example, the property “if the letter at a position is an a, then all letters to the right of this position are also as” is formalized by the formula ∀x∀y ((Qa(x) ∧ x < y) → Qa(y)) . Definition 9.1 Let V = {x, y, z, . . .} be an infinite set of variables, and let Σ = {a, b, c, . . .} be a finite alphabet. The set FO(Σ) of first-order formulas over Σ is the set of expressions generated by the grammar: ϕ := Qa(x) | x < y | ¬ϕ | (ϕ ∨ ϕ) | ∃x ϕ . As usual, variables within the scope of an existential quantifier are bounded, and otherwise free. A formula without free variables is a sentence. Sentences of FO(Σ) are interpreted on words over Σ. For instance, ∀x Qa(x) is true for the word aa, but false for word ab. Formulas with free variables cannot be interpreted on words alone: it does not make sense to ask whether Qa(x) holds for the word ab or not. A formula with free variables is interpreted over a pair (w, I), where I assigns to each free variable (and perhaps to others) a position in the word. For instance, Qa(x) is true for the pair (ab, x → 1), because the letter at position 1 of ab is a, but false for (ab, x → 2). Definition 9.2 An interpretation of a formula ϕ of FO(Σ) is a pair (w, I) where w ∈ Σ∗ and I is a mapping that assigns to every free variable x a position I(x) ∈ {1, . . . , |w|} (the mapping may also assign positions to other variables). Notice that if ϕ is a sentence then a pair (w, E), where E is the empty mapping that does not assign any position to any variable, is an interpretation of ϕ. Instead of (w, E) we write simply w. We now formally define when an interpretation satisfies a formula. Given a word w and a number k, let w[k] denote the letter of w at position k. Definition 9.3 The satisfaction relation (w, I) |= ϕ between a formula ϕ of FO(Σ) and an interpretation (w, I) of ϕ is defined by: (w, I) |= Qa(x) iff w[I(x)] = a (w, I) |= x < y iff I(x) < I(y) (w, I) |= ¬ϕ iff (w, I) |= ϕ (w, I) |= ϕ1 ∨ ϕ2 iff (w, I) |= ϕ1 or (w, I) |= ϕ2 (w, I) |= ∃x ϕ iff |w| ≥ 1 and some i ∈ {1, . . . , |w|} satisfies (w, I[i/x]) |= ϕ where w[i] is the letter of w at position i, and I[i/x] is the mapping that assigns i to x and otherwise coincides with I. (Notice that I may not assign any value to x.) If (w, I) |= ϕ we say that (w, I) is a model of ϕ. Two formulas are equivalent if they have the same models. 9.1. FIRST-ORDER LOGIC ON WORDS 161 It follows easily from this definition that if two interpretations (w, I1) and (w, I2) of ϕ differ only in the positions assigned by I1 and I2 to bounded variables, then either both interpretations are models of ϕ, or none of them is. In particular, whether an interpretation (w, I) of a sentence is a model or not depends only on w, not on I. We use some standard abbreviations: ∀x ϕ := ¬∃ x¬ϕ ϕ1 ∧ ϕ2 := ¬ (¬ϕ1 ∨ ¬ϕ2) ϕ1 → ϕ2 := ¬ϕ1 ∨ ϕ2 Notice that according to the definition of the satisfaction relation the empty word satisfies no formulas of the form ∃x ϕ, and all formulas of the form ∀x ϕ. While this causes no problems for our purposes, it is worth noticing that in other contexts it may lead to complications. For instance, the formulas ∃x Qa(x) and ∀y∃x Qa(x) do not hold for exactly the same words, because the empty word satisfies the second, but not the first. Further useful abbreviations are: first(x) := ¬∃y y < x “x is the first position” last(x) := ¬∃y x < y “x is the last position” y = x + 1 := x < y ∧ ¬∃ z(x < z ∧ z < y) “y is the successor position of x” y = x + 2 := ∃ z(z = x + 1 ∧ y = z + 1) y = x + (k + 1) := ∃ z(z = x + k ∧ y = z + 1) Example 9.4 Some examples of properties expressible in the logic: • “The last letter is a b and before it there are only a’s.” ∃x Qb(x) ∧ ∀x (last(x) → Qb(x) ∧ ¬last(x) → Qa(x)) • “Every a is immediately followed by a b.” ∀x (Qa(x) → ∃y (y = x + 1 ∧ Qb(y))) • “Every a is immediately followed by a b, unless it is the last letter.” ∀x (Qa(x) → ∀y (y = x + 1 → Qb(y))) • “Between every a and every later b there is a c.” ∀x∀y (Qa(x) ∧ Qb(y) ∧ x < y → ∃z (x < z ∧ z < y ∧ Qc(z))) 162 CHAPTER 9. AUTOMATA AND LOGIC 9.1.1 Expressive power of FO(Σ) Once we have defined which words satisfy a sentence, we can associate to a sentence the set of words satisfying it. Definition 9.5 The language L(ϕ) of a sentence ϕ ∈ FO(Σ) is the set L(ϕ) = {w ∈ Σ∗ | w |= φ}. We also say that ϕ expresses L(ϕ). A language L ⊆ Σ∗ is FO-definable if L = L(ϕ) for some formula ϕ of FO(Σ). The languages of the properties in the example are FO-definable by definition. To get an idea of the expressive power of FO(Σ), we prove a theorem characterizing the FO-definable languages in the case of a 1-letter alphabet Σ = {a}. In this simple case we only have one predicate Qa(x), which is always true in every interpretation. So every formula is equivalent to a formula without any occurrence of Qa(x). For example, the formula ∃y (Qa(y) ∧ y < x) is equivalent to ∃y y < x. We prove that a language over a one-letter alphabet is FO-definable if and only if it is finite or co-finite, where a language is co-finite if its complement is finite. So, for instance, even a simple language like {an | n is even } is not FO-definable. The plan of the proof is as follows. First, we define the quantifier-free fragment of FO({a}), denoted by QF; then we show that 1-letter languages are QF-definable iff they are finite or co-finite; finally, we prove that 1-letter languages are FO-definable iff they are QF-definable. For the definition of QF we need some more macros whose intended meaning should be easy to guess: x + k < y := ∃z (z = x + k ∧ z < y) x < y + k := ∃z (z = y + k ∧ x < z) k < last := ∀x (last(x) → x > k) In these macros k is a constant, that is, k < last standa for the infinite family of macros 1 < last, 2 < last, 3 < last . . .. Macros like k > x or x + k > y are defined similarly. Definition 9.6 The logic QF (for quantifier-free) is the fragment of FO({a}) with syntax f := x ≈ k | x ≈ y + k | k ≈ last | f1 ∨ f2 | f1 ∧ f2 where ≈ ∈ {<, >} and k ∈ N. Proposition 9.7 A language over a 1-letter alphabet is QF-definable iff it is finite or co-finite. Proof: (⇒): Let f be a sentence of QF. Since QF does not have quantifiers, f does not contain any occurrence of a variable, and so it is a positive (i.e., negation-free) boolean combination of formulas of the form k < last or k > last. We proceed by induction on the structure of f. If f = k < last, then L(ϕ) is co-finite, and if f = k > last, then L(ϕ) is finite. If f = f1 ∨ f2, then by induction hypothesis L(f1) and L(f2) are finite or co-finite; if L(f1) and L( f2) are finite, then so is L(f), and otherwise L(f) is co-finite. The case f = f1 ∧ f2 is similar. 9.2. MONADIC SECOND-ORDER LOGIC ON WORDS 163 (⇐): A finite language {ak1 , . . . , akn } is expressed by the formula (last > k1 − 1 ∧ last < k1 + 1) ∨ . . . ∨ (last > k1 − 1 ∧ last < k1 + 1). To express a co-finite language, it suffices to show that for every formula f of QF expressing a language L, there is another formula f expressing the language L. This is easily proved by induction on the structure of the formula. Theorem 9.8 Every formula ϕ of FO({a}) is equivalent to a formula f of QF. Proof: Sketch. By induction on the structure of ϕ. If ϕ(x, y) = x < y, then ϕ ≡ y < x + 0. If ϕ = ¬ψ, the result follows from the induction hypothesis and the fact that negations can be removed using De Morgan’s rules and equivalences like ¬(x < y + k) ≡ x ≥ y + k. If ϕ = ϕ1 ∨ ϕ2, the result follows directly from the induction hypothesis. Consider now the case ϕ = ∃x ψ. By induction hypothesis, ψ is equivalent to a formula f of QF, and we can assume that f is in disjunctive normal form, say f = D1 ∨ . . . ∨ Dn. Then ϕ ≡ ∃x D1 ∨ ∃x D2 ∨ . . . ∨ ∃x Dn, and so it suffices to find a formula fi of QF equivalent to ∃x Di. The formula fi is a conjunction of formulas containing all conjuncts of Di with no occurrence of x, plus other conjuncts obtained as follows. For every lower bound x < t1 of Di, where t1 = k1 or t1 = x1 + k1, and every upper bound of the form x > t2, where t2 = k1 or t2 = x1 + k1 we add to fi a conjunct equivalent to t2 + 1 < t1. For instance, y + 7 < x and x < z + 3 we add y + 5 < z. It is easy to see that fi ≡ ∃x Di. Corollary 9.9 The language Even = {a2n | n ≥ 0} is not first-order expressible. These results show that first-order logic cannot express all regular languages, not even over a 1-letter alphabet. For this reason we now introduce monadic second-order logic. 9.2 Monadic Second-Order Logic on Words Monadic second-order logic extends first-order logic with variables X, Y, Z, . . . ranging over sets of positions, and with predicates x ∈ X, meaning “position x belongs to the set X. 1 It is allowed to quantify over both kinds of variables. Before giving a formal definition, let us informally see how this extension allows to describe the language Even. The formula states that the last position belongs to the set of even positions. A position belongs to this set iff it is the second position, or the second successor of another position in the set. The following formula states that X is the set of even positions: second(x) := ∃y (first(y) ∧ x = y + 1) Even(X) := ∀x (x ∈ X ↔ (second(x) ∨ ∃y (x = y + 2 ∧ y ∈ X))) For the complete formula, we observe that the word has even length if its last position is even: EvenLength := ∃X (Even(X) ∧ ∀x (last(x) → x ∈ X) ) 1 More generally, second-order logic allows for variables ranging over relations of arbitrary arity. The monadic fragment only allows arity 1, which corresponds to sets. 164 CHAPTER 9. AUTOMATA AND LOGIC We now define the formal syntax and semantics of the logic. Definition 9.10 Let X1 = {x, y, z, . . .} and X2 = {X, Y, Z, . . .} be two infinite sets of first-order and second-order variables. Let Σ = {a, b, c, . . .} be a finite alphabet. The set MSO(Σ) of monadic second-order formulas over Σ is the set of expressions generated by the grammar: ϕ := Qa(x) | x < y | x ∈ X | ¬ϕ | ϕ ∨ ϕ | ∃x ϕ | ∃X ϕ An interpretation of a formula ϕ is a pair (w, I) where w ∈ Σ∗, and I is a mapping that assigns every free first-order variable x a position I(x) ∈ {1, . . . , |w|} and every free second-order variable X a set of positions I(X) ⊆ {1, . . . , |w|}. (The mapping may also assign positions to other variables.) The satisfaction relation (w, I) |= ϕ between a formula ϕ of MSO(Σ) and an interpretation (w, I) of ϕ is defined as for FO(Σ), with the following additions: (w, I) |= x ∈ X iff I(x) ∈ I(X) (w, I) |= ∃X ϕ iff |w| > 0 and some S ⊆ {1, . . . , |w|} satisfies (w, I[S/X]) |= ϕ where I[S/X] is the interpretation that assigns S to X and otherwise coincides with I — whether I is defined for X or not. If (w, I) |= ϕ we say that (w, I) is a model of ϕ. Two formulas are equivalent if they have the same models. The language L(ϕ) of a sentence ϕ ∈ MSO(Σ) is the set L(ϕ) = {w ∈ Σ∗ | w |= φ}. A language L ⊆ Σ∗ is MSO-definable if L = L(ϕ) for some formula ϕ ∈ MSO(Σ). Notice that in this definition the set S may be empty. So, for instance, ay interpretation that assigns the empty set to X is a model of the formula ∃X ∀x ¬(x ∈ X). We use the standard abbreviations ∀x ∈ X ϕ := ∀x (x ∈ X → ϕ) ∃x ∈ X ϕ := ∃x (x ∈ X ∧ ϕ) 9.2.1 Expressive power of MSO(Σ) We show that the languages expressible in monadic second-order logic are exactly the regular languages. We start with an example. Example 9.11 Let Σ = {a, b, c, d}. We construct a formula of MSO(Σ) expressing the regular language c∗(ab)∗d∗. The membership predicate of the language can be informally formulated as follows: There is a block of consecutive positions X such that: before X there are only c’s; after X there are only d’s; in X b’s and a’s alternate; the first letter in X is an a and the last letter is a b. The predicate is a conjunction of predicates. We give formulas for each of them. 9.2. MONADIC SECOND-ORDER LOGIC ON WORDS 165 • “X is a block of consecutive positions.” Cons(X) := ∀x ∈ X ∀y ∈ X (x < y → (∀z (x < z ∧ z < y) → z ∈ X)) • “x lies before/after X.” Before(x, X) := ∀y ∈ X x < y After(x, X) := ∀y ∈ X y < x • “Before X there are only c’s.” Before only c(X) := ∀x Before(x, X) → Qc(x) • “After X there are only d’s.” After only d(X) := ∀x After(x, X) → Qd(x) • “a’s and b’s alternate in X.” Alternate(X) := ∀x ∈ X ( Qa(x) → ∀y ∈ X (y = x + 1 → Qb(y) ) ∧ Qb(x) → ∀y ∈ X (y = x + 1 → Qa(y) ) ) • ”The first letter in X is an a and the last is a b.” First a(X) := ∀x ∈ X ∀y (y < x → ¬y ∈ X) → Qa(x) Last b(X) := ∀x ∈ X ∀y (y > x → ¬y ∈ X) → Qa(x) Putting everything together, we get the formula ∃X( Cons(X) ∧ Before only c(X) ∧ After only d(X) ∧ Alternate(X) ∧ First a(X) ∧ Last b(X) ) Notice that the empty word is a model of the formula. because the empty set of positions satisfies all the conjuncts. Let us now directly prove one direction of the result. Proposition 9.12 If L ⊆ Σ∗ is regular, then L is expressible in MSO(Σ). 166 CHAPTER 9. AUTOMATA AND LOGIC Proof: Let A = (Q, Σ, δ, q0, F) be a DFA with Q = {q0, . . . , qn} and L(A) = L. We construct a formula ϕA such that for every w , w |= ϕA iff w ∈ L(A). If ∈ L(A), then we can extend the formula to ϕA ∨ ϕA, where ϕA is only satisfied by the empty word (e.g. ϕA = ∀x x < x). We start with some notations. Let w = a1 . . . am be a word over Σ, and let Pq = i ∈ {1, . . . , m} | ˆδ(q0, a0 . . . ai) = q . In words, i ∈ Pq iff A is in state q immediately after reading the letter ai. Then A accepts w iff m ∈ q∈F Pq. Assume we were able to construct a formula Visits(X0, . . . Xn) with free variables X0, . . . Xn such that I(Xi) = Pqi holds for every model (w, I) and for every 0 ≤ i ≤ n. In words, Visits(X0, . . . Xn) is only true when Xi takes the value Pqi for every 0 ≤ i ≤ n. Then (w, I) would be a model of ψA := ∃X0 . . . ∃Xn Visits(X0, . . . Xn) ∧ ∃x  last(x) ∧ qi∈F x ∈ Xi   iff w has a last letter, and w ∈ L. So we could take ϕA := ψA if q0 F ψA ∨ ∀x x < x if q0 ∈ F Let us now construct the formula Visits(X0, . . . Xn). The sets Pq are the unique sets satisfying the following properties: (a) 1 ∈ Pδ(q0,a1), i.e., after reading the letter at position 1 the DFA is in state δ(q0, a1); (b) every position i belongs to exactly one Pq, i.e., the Pq’s build a partition of the set positions; and (c) if i ∈ Pq and δ(q, ai+1) = q then i + 1 ∈ Pq , i.e., the Pq’s “respect” the transition function δ. We express these properties through formulas. For every a ∈ Σ, let qia = δ(q0, a). The formula for (a) is: Init(X0, . . . , Xn) = ∃x  first(x) ∧   a∈Σ (Qa(x) ∧ x ∈ Xia )     (in words: if the letter at position 1 is a, then the position belongs to Xia ). Formula for (b): Partition(X0, . . . , Xn) = ∀x   n i=0 x ∈ Xi ∧ n i, j = 0 i j (x ∈ Xi → x Xj)   9.2. MONADIC SECOND-ORDER LOGIC ON WORDS 167 Formula for (c): Respect(X0, . . . , Xn) = ∀x∀y   y = x + 1 → a ∈ Σ i, j ∈ {0, . . . , n} δ(qi, a) = qj (x ∈ Xi ∧ Qa(x) ∧ y ∈ Xj)   Altogether we get Visits(X0, . . . Xn) := Init(X0, . . . , Xn) ∧ Partition(X0, . . . , Xn) ∧ Respect(X0, . . . , Xn) It remains to prove that MSO-definable languages are regular. Given a sentence ϕ ∈ MSO(Σ) show that L(ϕ) is regular by induction on the structure of ϕ. However, since the subformulas of a sentence are not necessarily sentences, the language defined by the subformulas of ϕ is not defined. We correct this. Recall that the interpretations of a formula are pairs (w, I) where I assigns positions to the free first-order variables and sets of positions to the free second-order variables. For example, if Σ = {a, b} and if the free first-order and second-order variables of the formula are x, y and X, Y, respectively, then two possible interpretations are   aab , x → 1 y → 3 X → {2, 3} Y → {1, 2}     ba , x → 2 y → 1 X → ∅ Y → {1}   Given an interpretation (w, I), we can encode each assignment x → k or X → {k1, . . . , kl} as a bitstring of the same length as w: the string for x → k contains exactly a 1 at position k, and 0’s everywhere else; the string for X → {k1, . . . , kl} contains 1’s at positions k1, . . . , kl, and 0’s everywhere else. After fixing an order on the variables, an interpretation (w, I) can then be encoded as a tuple (w, w1, . . . , wn), where n is the number of variables, w ∈ Σ∗, and w1, . . . , wn ∈ {0, 1}∗. Since all of w, w1, . . . , wn have the same length, we can as in the case of transducers look at (w, w1, . . . , wn) as a word over the alphabet Σ × {0, 1}n. For the two interpretations above we get the encodings x y X Y a 1 0 0 1 a 0 0 1 1 b 0 1 1 0 and x y X Y b 0 1 0 1 a 1 0 0 0 corresponding to the words 168 CHAPTER 9. AUTOMATA AND LOGIC   a 1 0 0 1     a 0 0 1 1     b 0 1 1 0   and   b 0 1 0 1     a 1 0 0 0   over Σ × {0, 1}4 Definition 9.13 Let ϕ be a formula with n free variables, and let (w, I) be an interpretation of ϕ. We denote by enc(w, I) the word over the alphabet Σ × {0, 1}n described above. The language of ϕ is L(ϕ) = {enc(w, I) | (w, I) |= ϕ}. Now that we have associated to every formula ϕ a language (whose alphabet depends on the free variables), we prove by induction on the structure of ϕ that L(ϕ) is regular. We do so by exhibiting automata (actually, transducers) accepting L(ϕ). For simplicity we assume Σ = {a, b}, and denote by free(ϕ) the set of free variables of ϕ. • ϕ = Qa(x). Then free(ϕ) = x, and the interpretations of ϕ are encoded as words over Σ×{0, 1}. The language L(ϕ) is given by L(ϕ) =    a1 b1 . . . . . . ak bk k ≥ 0, ai ∈ Σ and bi ∈ {0, 1} for every i ∈ {1, . . . , k}, and bi = 1 for exactly one index i ∈ {1, . . . , k} such that ai = a    and is recognized by a 0 , b 0 a 0 , b 0 a 1 • ϕ = x < y. Then free(ϕ) = {x, y}, and the interpretations of φ are encoded as words over Σ × {0, 1}2. The language L(ϕ) is given by L(ϕ) =      a1 b1 c1   . . . . . .   ak bk ck   k ≥ 0, ai ∈ Σ and bi, ci ∈ {0, 1} for every i ∈ {1, . . . , k}, bi = 1 for exactly one index i ∈ {1, . . . , k}, cj = 1 for exactly one index j ∈ {1, . . . , k}, and i < j    and is recognized by 9.2. MONADIC SECOND-ORDER LOGIC ON WORDS 169   a 0 1     b 0 1     a 0 0     b 0 0     a 0 0   ,   b 0 0     a 0 0     b 0 0     a 1 0     b 1 0   • ϕ = x ∈ X. Then free(ϕ) = {x, X}, and interpretations are encoded as words over Σ × {0, 1}2. The language L(ϕ) is given by L(ϕ) =      a1 b1 c1   . . . . . .   ak bk ck   k ≥ 0, ai ∈ Σ and bi, ci ∈ {0, 1} for every i ∈ {1, . . . , k}, bi = 1 for exactly one index i ∈ {1, . . . , k}, and for every i ∈ {1, . . . , k}, if bi = 1 then ci = 1    and is recognized by   a 0 0   ,   b 0 0   ,   a 0 1   ,   b 0 1     a 0 0   ,   b 0 0   ,   a 0 1   ,   b 0 1     a 1 1   ,   b 1 1   • ϕ = ¬ψ. Then free(ϕ) = free(ψ), and by induction hypothesis there exists an automaton Aψ s.t. L(Aψ) = L(ψ). Observe that L(ϕ) is not in general equal to L(ψ). To see why, consider for example the case ψ = Qa(x) and ϕ = ¬Qa(x). The word a 1 a 1 a 1 belongs neither to L(ψ) nor L(ϕ), because it is not the encoding of any interpretation: the bitstring for x contains more than one 1. What holds is L(ϕ) = L(ψ) ∩ Enc(ψ), where Enc(ψ) is the language of the encodings of all the interpretations of ψ (whether they are models of ψ or not). We construct an automaton Aenc ψ recognizing Enc(ψ), and so we can take Aϕ = Aψ ∩ Aenc ψ . Assume ψ has k first-order variables. Then a word belongs to Enc(ψ) iff each of its projections onto the 2nd, 3rd, ..., (k + 1)-th component is a bitstring containing exactly one 1. As states of Aenc ψ we take all the strings {0, 1}k. The intended meaning of a state, say state 101 for the case k = 3, is “the automaton has already read the 1’s in the bitstrings of the first and third variables, but not yet read the 1 in the second.” The initial and final states are 0k and 1k, respectively. The transitions are defined according to the intended meaning of the states. For instance, the automaton Aenc x