Leonid Libkin Elements of Finite Model Theory With 24 Figures February 7, 2012 Springer Berlin Heidelberg NewYork Hong Kong London Milan Paris Tokyo To Helen, Daniel, and Victoria Preface Finite model theory is an area of mathematical logic that grew out of computer science applications. The main sources of motivational examples for finite model theory are found in database theory, computational complexity, and formal languages, although in recent years connections with other areas, such as formal methods and verification, and artificial intelligence, have been discovered. The birth of finite model theory is often identified with Trakhtenbrot’s result from 1950 stating that validity over finite models is not recursively enumerable; in other words, completeness fails over finite models. The technique of the proof, based on encoding Turing machine computations as finite structures, was reused by Fagin almost a quarter century later to prove his celebrated result that put the equality sign between the class NP and existential second-order logic, thereby providing a machine-independent characterization of an important complexity class. In 1982, Immerman and Vardi showed that over ordered structures, a fixed point extension of first-order logic captures the complexity class Ptime of polynomial time computable properties. Shortly thereafter, logical characterizations of other important complexity classes were obtained. This line of work is often referred to as descriptive complexity. A different line of finite model theory research is associated with the development of relational databases. By the late 1970s, the relational database model had replaced others, and all the basic query languages for it were essentially first-order predicate calculus or its minor extensions. In 1974, Fagin showed that first-order logic cannot express the transitive closure query over finite relations. In 1979, Aho and Ullman rediscovered this result and brought it to the attention of the computer science community. Following this, Chandra and Harel proposed a fixed-point extension of first-order logic on finite relational structures as a query language capable of expressing queries such as the transitive closure. Logics over finite models have become the standard starting point for developing database query languages, and finite model theory techniques are used for proving results about their expressiveness and complexity. VIII Preface Yet another line of work on logics over finite models originated with B¨uchi’s work from the early 1960s: he showed that regular languages are precisely those definable in monadic second-order logic over strings. This line of work is the automata-theoretic counterpart of descriptive complexity: instead of logical characterizations of time/space restrictions of Turing machines, one provides such characterizations for weaker devices, such as automata. More recently, connections between database query languages and automata have been explored too, as the field of databases started moving away from relations to more complex data models. In general, finite model theory studies the behavior of logics on finite structures. The reason this is a separate subject, and not a tiny chapter in classical model theory, is that most standard model-theoretic tools (most notably, compactness) fail over finite models. Over the past 25–30 years, many tools have been developed to study logics over finite structures, and these tools helped answer many questions about complexity theory, databases, formal languages, etc. This book is an introduction to finite model theory, geared towards theoretical computer scientists. It grew out of my finite model theory course, taught to computer science graduate students at the University of Toronto. While teaching that course, I realized that there is no single source that covers all the main areas of finite model theory, and yet is suitable for computer science students. There are a number of excellent books on the subject. Finite Model Theory by Ebbinghaus and Flum was the first standard reference and heavily influenced the development of the field, but it is a book written for mathematicians, not computer scientists. There is also a nice set of notes by V¨a¨an¨anen, available on the web. Immerman’s Descriptive Complexity deals extensively with complexity-theoretic aspects of finite model theory, but does not address other applications. Foundations of Databases by Abiteboul, Hull, and Vianu covers many database applications, and Thomas’s chapter “Languages, automata, and logic” in the Handbook of Formal Languages describes connections between logic and formal languages. Given the absence of a single source for all the subjects, I decided to write course notes, which eventually became this book. The reader is assumed to have only the most basic computer science and logic background: some discrete mathematics, theory of computation, complexity, propositional and predicate logic. The book also includes a background chapter, covering logic, computability theory, and computational complexity. In general, the book should be accessible to senior undergraduate students in computer science. A note on exercises: there are three kinds of these. Some are the usual exercises that the reader should be able to do easily after reading each chapter. If I indicate that an exercise comes from a paper, it means that its level could range from moderately to extremely difficult: depending on the exact level, such an “exercise” could be a question on a take-home exam, or even a course Preface IX project, whose main goal is to understand the paper where the result is proven. Such exercises also gave me the opportunity to mention a number of interesting results that otherwise could not have been included in the book. There are also exercises marked with an asterisk: for these, I do not know solutions. It gives me the great pleasure to thank my colleagues and students for their help. I received many comments from Marcelo Arenas, Pablo Barcel´o, Michael Benedikt, Ari Brodsky, Anuj Dawar, Ron Fagin, Arthur Fischer, Lauri Hella, Christoph Koch, Janos Makowsky, Frank Neven, Juha Nurmonen, Ben Rossman, Luc Segoufin, Thomas Schwentick, Jan Van den Bussche, Victor Vianu, and Igor Walukiewicz. Ron Fagin, as well as Yuri Gurevich, Alexander Livchak, Michael Taitslin, and Vladimir Sazonov, were also very helpful with historical comments. I taught two courses based on this book, and students in both classes provided very useful feedback; in addition to those I already thanked, I would like to acknowledge Antonina Kolokolova, Shiva Nejati, Ken Pu, Joseph Rideout, Mehrdad Sabetzadeh, Ramona Truta, and Zheng Zhang. Despite their great effort, mistakes undoubtedly remain in the book; if you find one, please let me know. My email is libkin@cs.toronto.edu. Many people in the finite model theory community influenced my view of the field; it is impossible to thank them all, but I want to mention Scott Weinstein, from whom I learned finite model theory, and immediately became fascinated with the subject. Finally, I thank Ingeborg Mayer, Alfred Hofmann, and Frank Holzwarth at Springer-Verlag for editorial assistance, and Denis Th´erien for providing ideal conditions for the final proofreading of the book. This book is dedicated to my wife, Helen, and my son, Daniel. Daniel was born one week after I finished teaching a finite model theory course in Toronto, and after several sleepless nights I decided that perhaps writing a book is the type of activity that goes well with the lack of sleep. By the time I was writing Chap. 6, Daniel had started sleeping through the night, but at that point it was too late to turn back. And without Helen’s help and support I certainly would not have finished this book in only two years. Toronto, Ontario, Canada May 2004 Leonid Libkin Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 A Database Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 An Example from Complexity Theory . . . . . . . . . . . . . . . . . . . . . . 4 1.3 An Example from Formal Language Theory. . . . . . . . . . . . . . . . . 6 1.4 An Overview of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Background from Mathematical Logic . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Background from Automata and Computability Theory . . . . . . 17 2.3 Background from Complexity Theory . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Ehrenfeucht-Fra¨ıss´e Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1 First Inexpressibility Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Definition and Examples of Ehrenfeucht-Fra¨ıss´e Games . . . . . . . 26 3.3 Games and the Expressive Power of FO . . . . . . . . . . . . . . . . . . . . 32 3.4 Rank-k Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 Proof of the Ehrenfeucht-Fra¨ıss´e Theorem . . . . . . . . . . . . . . . . . . 35 3.6 More Inexpressibility Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4 Locality and Winning Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Neighborhoods, Hanf-locality, and Gaifman-locality . . . . . . . . . . 45 4.2 Combinatorics of Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Locality of FO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Structures of Small Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5 Locality of FO Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 XII Contents 5 Ordered Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1 Invariant Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 The Power of Order-invariant FO . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 Locality of Order-invariant FO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6 Complexity of First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.1 Data, Expression, and Combined Complexity . . . . . . . . . . . . . . . 87 6.2 Circuits and FO Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3 Expressive Power with Arbitrary Predicates. . . . . . . . . . . . . . . . . 93 6.4 Uniformity and AC0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.5 Combined Complexity of FO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.6 Parametric Complexity and Locality . . . . . . . . . . . . . . . . . . . . . . . 99 6.7 Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7 Monadic Second-Order Logic and Automata . . . . . . . . . . . . . . . 113 7.1 Second-Order Logic and Its Fragments . . . . . . . . . . . . . . . . . . . . . 113 7.2 MSO Games and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.3 Existential and Universal MSO on Graphs . . . . . . . . . . . . . . . . . . 119 7.4 MSO on Strings and Regular Languages . . . . . . . . . . . . . . . . . . . . 124 7.5 FO on Strings and Star-Free Languages . . . . . . . . . . . . . . . . . . . . 127 7.6 Tree Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.7 Complexity of MSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8 Logics with Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.1 Counting and Unary Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.2 An Infinitary Counting Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.3 Games for L∗ ∞ω(Cnt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.4 Counting and Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.5 Complexity of Counting Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . 155 8.6 Aggregate Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 9 Turing Machines and Finite Models . . . . . . . . . . . . . . . . . . . . . . . 165 9.1 Trakhtenbrot’s Theorem and Failure of Completeness . . . . . . . . 165 9.2 Fagin’s Theorem and NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9.3 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Contents XIII 10 Fixed Point Logics and Complexity Classes . . . . . . . . . . . . . . . . 177 10.1 Fixed Points of Operators on Sets . . . . . . . . . . . . . . . . . . . . . . . . . 178 10.2 Fixed Point Logics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 10.3 Properties of LFP and IFP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 10.4 LFP, PFP, and Polynomial Time and Space . . . . . . . . . . . . . . . . 192 10.5 Datalog and LFP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 10.6 Transitive Closure Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 10.7 A Logic for Ptime? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 10.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 11 Finite Variable Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 11.1 Logics with Finitely Many Variables . . . . . . . . . . . . . . . . . . . . . . . 211 11.2 Pebble Games. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 11.3 Definability of Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 11.4 Ordering of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 11.5 Canonical Structures and the Abiteboul-Vianu Theorem . . . . . . 229 11.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 11.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 12 Zero-One Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 12.1 Asymptotic Probabilities and Zero-One Laws . . . . . . . . . . . . . . . 235 12.2 Extension Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 12.3 The Random Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 12.4 Zero-One Law and Second-Order Logic . . . . . . . . . . . . . . . . . . . . . 243 12.5 Almost Everywhere Equivalence of Logics . . . . . . . . . . . . . . . . . . 245 12.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 12.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 13 Embedded Finite Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 13.1 Embedded Finite Models: the Setting . . . . . . . . . . . . . . . . . . . . . . 249 13.2 Analyzing Embedded Finite Models. . . . . . . . . . . . . . . . . . . . . . . . 252 13.3 Active-Generic Collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 13.4 Restricted Quantifier Collapse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 13.5 The Random Graph and Collapse to MSO . . . . . . . . . . . . . . . . . . 265 13.6 An Application: Constraint Databases . . . . . . . . . . . . . . . . . . . . . 267 13.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 13.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 14 Other Applications of Finite Model Theory . . . . . . . . . . . . . . . . 275 14.1 Finite Model Property and Decision Problems. . . . . . . . . . . . . . . 275 14.2 Temporal and Modal Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 14.3 Constraint Satisfaction and Homomorphisms of Finite Models . 285 14.4 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 XIV Contents References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 1 Introduction Finite model theory studies the expressive power of logics on finite models. Classical model theory, on the other hand, concentrates on infinite structures: its origins are in mathematics, and most objects of interest in mathematics are infinite, e.g., the sets of natural numbers, real numbers, etc. Typical examples of interest to a model-theorist would be algebraically closed fields (e.g., C, +, · ), real closed fields (e.g., R, +, ·, < ), various models of arithmetic (e.g., N, +, · or N, + ), and other structures such as Boolean algebras or random graphs. The origins of finite model theory are in computer science where most objects of interest are finite. One is interested in the expressiveness of logics over finite graphs, or finite strings, other finite relational structures, and sometimes restrictions of arithmetic structures to an initial segment of natural numbers. The areas of computer science that served as a primary source of examples, as well as the main consumers of techniques from finite model theory, are databases, complexity theory, and formal languages (although finite model theory found applications in other areas such as AI and verification). In this chapter, we give three examples that illustrate the need for studying logics over finite structures. 1.1 A Database Example While early database systems used rather ad hoc data models, from the early 1970s the world switched to the relational model. In that model, a database stores tables, or relations, and is queried by a logic-based declarative language. The most standard such language, relational calculus, has precisely the power of first-order predicate calculus. In real life, it comes equipped with a specialized programming syntax (e.g., the select-from-where statement of SQL). Suppose that we have a company database, and one of its relations is the Reports To relation: it stores pairs (x, y), where x is an employee, and y is 2 1 Introduction his/her immediate manager. Organizational hierarchies tend to be quite complicated and often result in many layers of management, so one may want to skip the immediate manager level and instead look for the manager’s manager. In SQL, this would be done by the following query: select R1.employee, R2.manager from Reports_To R1, Reports_To R2 where R1.manager=R2.employee This is simply a different way of writing the following first-order logic formula: ϕ(x, y) ≡ ∃z Reports To(x, z) ∧ Reports To(z, y) . Continuing, we may ask for someone’s manager’s manager’s manager: ∃z1∃z2 Reports To(x, z1) ∧ Reports To(z1, z2) ∧ Reports To(z2, y) , and so on. But what if we want to find everyone who is higher in the hierarchy than a given employee? Speaking graph-theoretically, if we associate a pair (x, y) in the Reports To relation with a directed edge from x to y in a graph, then we want to find, for a given node, all the nodes reachable from it. This does not seem possible in first-order logic, but how can one prove this? There are other queries naturally related to this reachability property. Suppose that once in a while, the company wants to make sure that its management hierarchy is logically consistent; that is, we cannot have cycles in the Reports To relation. In graph-theoretic terms, it means that Reports To is acyclic. Again, if one thinks about it for a while, it seems that first-order logic does not have enough power to express this query. We now consider a different kind of query. Suppose we have two managers, x and y, and let X be the set of all the employees directly managed by x (i.e., all x′ such that (x′ , x) is in Reports To), and likewise let Y be the set of all the employees directly managed by y. Can we write a query asking whether |X | = |Y |; that is, a query asking whether x and y have the same number of people reporting to them? It turns out that first-order logic is again not sufficiently expressive for this kind of query, but since queries like those described above are so common in practice, SQL adds special features to the language to perform them. That is, SQL can count: it can apply the cardinality function (and more complex functions as well) to entire columns in relations. For example, in SQL one can write a query that finds all pairs of managers x and y who have the same number of people reporting to them: 1.1 A Database Example 3 select R1.manager, R2.manager from Reports_To R1, Reports_To R2 where (select count(Reports_To.employee) from Reports_To where Reports_To.manager = R1.manager) = (select count(Reports_To.employee) from Reports_To where Reports_To.manager = R2.manager) Since this cannot be done in first-order logic, but can be done in SQL (and, in fact, in some rather simple extensions of first-order logic with counting), it is natural to ask whether counting provides enough expressiveness to define queries such as reachability (can node x be reached from node y in a given graph?) and acyclicity. Typical applications of finite model theory in databases have to deal with questions of this sort: what can, and, more importantly, what cannot, be expressed in various query languages. Let us now give intuitive reasons why reachability queries are not expressible in first-order logic. Consider a different example. Suppose that we have an airline database, with a binary relation R (for routes), such that an entry (A, B) in R indicates that there is a flight from A to B. Now suppose we want to find all pairs of cities A, B such that there is a direct flight between them; this is done by the following query: q0(x, y) ≡ R(x, y), which is simply a first-order formula with two free variables. Next, suppose we want to know if one can get from x to y with exactly one change of plane; then we write q1(x, y) ≡ ∃z R(x, z) ∧ R(z, y). Doing “with at most one change” means having a disjunction Q1(x, y) ≡ q1(x, y) ∨ q0(x, y). Clearly, for each fixed k we can write a formula stating that one can get from x to y with exactly k stops: qk(x, y) ≡ ∃z1 . . . ∃zk R(x, z1) ∧ R(z1, z2) ∧ . . . ∧ R(zk, y), as well as Qk = j≤k qj testing if at most k stops suffice. But what about the reachability query: can we get from x to y? That is, one wants to compute the transitive closure of R. The problem with this is that we do not know in advance what k is supposed to be. So the query that we need to write is k∈N qk, 4 1 Introduction but this is not a first-order formula! Of course this is not a formal proof that reachability is not expressible in first-order logic (we shall see a proof of this fact in Chap. 3), but at least it gives a hint as to what the limitations of first-order logic are. The inability of first-order logic to express some important queries motivated a lot of research on extensions of first-order logic that can do queries such as transitive closure or cardinality comparisons. We shall see a number of extensions of these kinds – fixed point logics, (fragments of) second-order logic, counting logics – that are important for database theory, and we shall study properties of these extensions as well. 1.2 An Example from Complexity Theory We now turn to a different area, and to more expressive logics. Suppose that we have a graph, this time undirected, given to us as a pair V, E , where V is the set of vertices, or nodes, and E is the edge relation. Assume that now we can specify graph properties in second-order logic; that is, we can quantify over sets (or relations) of nodes. Consider a well-known property of Hamiltonicity. A simple circuit in a graph G is a sequence (a1, . . . , an) of distinct nodes such that there are edges (a1, a2), (a2, a3), . . . , (an−1, an), (an, a1). A simple circuit is Hamiltonian if V = {a1, . . . , an}. A graph is Hamiltonian if it has a Hamiltonian circuit. We now consider the following formula: ∃L ∃S     linear order(L) ∧ S is the successor relation of L ∧ ∀x∃y (L(x, y) ∨ L(y, x)) ∧ ∀x∀y (S(x, y) → E(x, y))     (1.1) The quantifiers ∃L ∃S state the existence of two binary relations, L and S, that satisfy the formula in parentheses. That formula uses some abbreviations. The subformula linear order(L) in (1.1) states that the relation L is a linear ordering; it can be defined as ∀x¬L(x, x) ∧ ∀x∀y∀z (L(x, y) ∧ L(y, z) → L(x, z)) ∧ ∀x∀y (x = y) → L(x, y) ∨ L(y, x) . The subformula S is the successor relation of L states that S is the successor relation associated with the linear ordering L; it can be defined as ∀x∀y S(x, y) ↔ L(x, y) ∧ ¬∃z L(x, z) ∧ L(z, y) ∨ ¬∃z L(x, z) ∧ ¬∃z L(z, y) 1.2 An Example from Complexity Theory 5 Note that S is the circular successor relation, as it also includes the pair (x, y) where x is the maximal and y the minimal element with respect to L. Then (1.1) says that L and S are defined on all nodes of the graph, and that S is a subset of E. Hence, S is a Hamiltonian circuit, and thus (1.1) tests if a graph is Hamiltonian. It it well known that testing Hamiltonicity is an NP-complete problem. Is this a coincidence, or is there a natural connection between NP and secondorder logic? Let us turn our attention to two other well-known NP-complete problems: 3-colorability and clique. To test if a graph is 3-colorable, we have to check that there exist three disjoint sets A, B, C covering the nodes of the graph such that for every edge (a, b) ∈ E, the nodes a and b cannot belong to the same set. The sentence below does precisely that: ∃A∃B∃C           ∀x   (A(x) ∧ ¬B(x) ∧ ¬C(x)) ∨ (¬A(x) ∧ B(x) ∧ ¬C(x)) ∨ (¬A(x) ∧ ¬B(x) ∧ C(x))   ∧ ∀x, y E(x, y) → ¬   (A(x) ∧ A(y)) ∨ (B(x) ∧ B(y)) ∨ (C(x) ∧ C(y))             (1.2) For clique, typically one has a parameter k, and the problem is to check whether a clique of size k exists. Here, to stay purely within the formalism of second-order logic, we assume that the input is a graph E and a set of nodes (a unary relation) U, and we ask if E has a clique of size | U |. We do it by testing if there is a set C (nodes of the clique) and a binary relation F that is a one-to-one correspondence between C and U. Testing that the restriction of E to C is a clique, and that F is one-to-one, can be done in first-order logic. Thus, the test is done by the following second-order sentence: ∃C∃F     ∀x∀y F(x, y) → (C(x) ∧ U(y)) ∧ ∀x C(x) → ∃!y(F(x, y) ∧ U(y)) ∧ ∀y U(y) → ∃!x(F(x, y) ∧ C(x)) ∧ ∀x∀y C(x) ∧ C(y) → E(x, y)     (1.3) Here ∃!xϕ(x) means “there exists exactly one x such that ϕ(x)”; this is an abbreviation for ∃x ϕ(x) ∧ ∀y (ϕ(y) → x = y) . Notice that (1.1), (1.2), and (1.3) all follow the same pattern: they start with existential second-order quantifiers, followed by a first-order formula. Such formulas form what is called existential second-order logic, abbreviated as ∃SO. The connection to NP can easily be seen: existential second-order quantifiers correspond to the guessing stage of an NP algorithm, and the remaining first-order formula corresponds to the polynomial time verification stage of an NP algorithm. 6 1 Introduction It turns out that the connection between NP and ∃SO is exact, as was shown by Fagin in his celebrated 1974 theorem, stating that NP = ∃SO. This connection opened up a new area, called descriptive complexity. The goals of descriptive complexity are to describe complexity classes by means of logical formalisms, and then use tools from mathematical logic to analyze those classes. We shall prove Fagin’s theorem later, and we shall also see logical characterizations of a number of other familiar complexity classes. 1.3 An Example from Formal Language Theory Now we turn our attention to strings over a finite alphabet, say Σ = {a, b}. We want to represent a string as a structure, much like a graph. Given a string s = s1s2 . . . sn, we create a structure Ms as follows: the universe is {1, . . . , n} (corresponding to positions in the string), we have one binary relation < whose meaning of course is the usual order on the natural numbers, and two unary relations A and B. Then A(i) is true if si = a, and B(i) is true if si = b. For example, Mabba has universe {1, 2, 3, 4}, with A interpreted as {1, 4} and B as {2, 3}. Let us look at the following second-order sentence in which quantifiers range over sets of positions in a string: Φ ≡ ∃X∃Y    ∀x X(x) ↔ ¬Y (x) ∧ ∀x ∀y (X(x) ∧ Y (y) → x < y) ∧ ∀x (X(x) → A(x) ∧ Y (x) → B(x))    When is Ms a model of Φ? This happens iff there exists two sets of positions, X and Y , such that X and Y form a partition of the universe (this is what the first conjunct says), that all positions in X precede the positions in Y (that is what the second conjunct says), and that for each position i in X, the ith symbol of s is a, for each position j in Y , the jth symbol is b (this is stated in the third conjunct). That is, the string starts with some a’s, and then switches to all b’s. Using the language of regular expressions, we can say that Ms |= Φ iff s ∈ a∗ b∗ . Is quantification over sets really necessary in this example? It turns out that the answer is no: one can express the fact that s is in a∗ b∗ by saying that there are no two positions i < j such that the ith symbol is b and the jth symbol is a. This, of course, can be done in first-order logic: ¬∃i∃j (i < j) ∧ B(i) ∧ A(j) . A natural question that arises then is the following: are second-order quantifiers of no use if one wants to describe regular languages by logical means? The answer is no, as we shall see later. For now, we can give an example. 1.3 An Example from Formal Language Theory 7 First, consider the sentence Φa ≡ ∀i A(i), which is true in Ms iff s ∈ a∗ . Next, define a relation i ≺ j saying that j is the successor of i. It can be defined by the formula (i < j) ∧ ∀k ((k ≤ i) ∨ (k ≥ j)) . Now consider the sentence Φ1 ≡ ∃X∃Y       ∀i X(i) ↔ ¬Y (i) ∧ ∀i ¬∃j(j < i) → X(i) ∧ ∀i ¬∃j(j > i) → Y (i) ∧ ∀i∀j (i ≺ j) ∧ X(i) → Y (j) ∧ ∀i∀j (i ≺ j) ∧ Y (i) → X(j)       This sentence says that the universe {1, . . ., n} can be partitioned into two sets X and Y such that 1 ∈ X, n ∈ Y , and the successor of an element of X is in Y and vice versa; that is, the size of the universe is even. Now what is Φ1 ∧ Φa? It says that the string is of even length, and has only a’s in it – hence, Ms |= Φ1 ∧Φa iff s ∈ (aa)∗ . It turns out that one cannot define (aa)∗ using first-order logic alone: one needs second-order quantifiers. Moreover, with second-order quantifiers ranging over sets of positions, one defines precisely the regular languages. We shall deal with both expressibility and inexpressibility results related to logics over strings later in this book. There are a number of common themes in the examples presented above. In all the cases, we are talking about the expressive power of logics over finite objects: relational databases, graphs, and strings. There is a close connection between logical formalisms and familiar concepts from computer science: firstorder logic corresponds to relational calculus, existential second-order logic to the complexity class NP, and second-order logic with quantifiers ranging over sets describes regular languages. Of equal importance is the fact that in all the examples we want to show some inexpressibility results. In the database example, we want to show that the transitive closure is not expressible in first-order logic. In the complexity example, it would be nice to show that certain problems cannot be expressed in ∃SO – any such result would give us bounds on the class NP, and this would hopefully lead to separation results for complexity classes. In the example from formal languages, we want to show that certain regular languages (e.g., (aa)∗ ) cannot be expressed in first-order logic. Inexpressibility results have traditionally been a core theme of finite model theory. The main explanation for that is the source of motivating examples for finite model theory. Most of them come from computer science, where one is dealing not with natural phenomena, but rather with artificial creations. Thus, we often want to know the limitations of these creations. In general, this explains the popularity of impossibility results in computer science. After all, the most famous open problem of computer science, the Ptime vs NP problem, is so fascinating because the expected answer would tell us that a large number of important problems cannot be solved efficiently. 8 1 Introduction Concentrating on inexpressibility results highlights another important feature of finite model theory: since we are often interested in counterexamples, many constructions and techniques of interest apply only to a “small” fraction of structures. In fact, we shall see that some techniques (e.g., locality) degenerate to trivial statements on almost all structures, and yet it is that small fraction of structures on which they behave interestingly that gives us important techniques for analyzing expressiveness of logics, query languages, etc. Towards the end of the book, we shall also see that on most typical structures, some very expressive logics collapse to rather weak ones; however, all interesting separation examples occur outside the class of “typical” structures. 1.4 An Overview of the Book In Chap. 2, we review the background material from mathematical logic, computability theory, and complexity theory. In Chap. 3 we introduce the fundamental tool of Ehrenfeucht-Fra¨ıss´e games, and prove their completeness for expressibility in first-order logic (FO). The game is played by two players, the spoiler and the duplicator, on two structures. The spoiler tries to show that the structures are different, while the duplicator tries to show that they are the same. If the duplicator can succeed for k rounds of such a game, it means that the structures cannot be distinguished by FO sentences whose depth of quantifier nesting does not exceed k. We also define types, which play a very important role in many aspects of finite model theory. In the same chapter, we see some bounds on the expressive power of FO, proved via Ehrenfeucht-Fra¨ıss´e games. Finding winning strategies in Ehrenfeucht-Fra¨ıss´e games becomes quite hard for nontrivial structures. Thus, in Chap. 4, we introduce some sufficient conditions that guarantee a win for the duplicator. These conditions are based on the idea of locality. Intuitively, local formulae cannot see very far from their free variables. We show several different ways of formalizing this intuition, and explain how each of those ways gives us easy proofs of bounds on the expressiveness of FO. In Chap. 5 we continue to study first-order logic, but this time over structures whose universe is ordered. Here we see the phenomenon that is very common for logics over finite structures. We call a property of structures order-invariant if it can be defined with a linear order, but is independent of a particular linear order used. It turns out that there are order-invariant FO-definable properties that are not definable in FO alone. We also show that such order-invariant properties continue to be local. Chap. 6 deals with the complexity of FO. We distinguish two kinds of complexity: data complexity, meaning that a formula is fixed and the structure on which it is evaluated varies, and combined complexity, meaning that both the formula and the structure are part of the input. We show how to evaluate 1.4 An Overview of the Book 9 FO formulae by Boolean circuits, and use this to derive drastically different bounds for the complexity of FO: AC0 for data complexity, and Pspace for combined complexity. We also consider the parametric complexity of FO: in this case, the formula is viewed as a parameter of the input. Finally, we study a subclass of FO queries, called conjunctive queries, which is very important in database theory, and prove complexity bounds for it. In Chap. 7, we move away from FO, and consider its extension with monadic second-order quantifiers: such quantifiers can range over subsets of the universe. The resulting logic is called monadic second-order logic, or MSO. We also consider two restrictions of MSO: an ∃MSO formula starts with a sequence of existential second-order quantifiers, which is followed by an FO formula, and an ∀MSO formula starts with a sequence of universal secondorder quantifiers, followed by an FO formula. We first study ∃MSO and ∀MSO on graphs, where they are shown to be different. We then move to strings, where MSO collapses to ∃MSO and captures precisely the regular languages. Further restricting our attention to FO over strings, we prove that it captures the star-free languages. We also cover MSO over trees, and tree automata. In Chap. 8 we study a different extension of FO: this time, we add mechanisms for counting, such as counting terms, counting quantifiers, or certain generalized unary quantifiers. We also introduce a logic that has a lot of counting power, and prove that it remains local, much as FO. We apply these results in the database setting, considering a standard feature of many query languages – aggregate functions – and proving bounds on the expressiveness of languages with aggregation. In Chap. 9 we present the technique of coding Turing machines as finite structures, and use it to prove two results: Trakhtenbrot’s theorem, which says that the set of finitely satisfiable sentences is not recursive, and Fagin’s theorem, which says that NP problems are precisely those expressible in existential second-order logic. Chapter 10 deals with extensions of FO for expressing properties that, algorithmically, require recursion. Such extensions have fixed point operators. There are three flavors of them: least, inflationary, and partial fixed point operators. We study properties of resulting fixed point logics, and prove that in the presence of a linear order, they capture complexity classes Ptime (for least and inflationary fixed points) and Pspace (for partial fixed points). We also deal with a well-known database query language that adds fixed points to FO: Datalog. In the same chapter, we consider a closely related logic based on adding the transitive closure operator to FO, and prove that over order structures it captures nondeterministic logarithmic space. Fixed point logics are not very easy to analyze. Nevertheless, they can be embedded into a logic which uses infinitary connectives, but has a restriction that every formula only mentions finitely many variables. This logic, and its fragments, are studied in Chap. 11. We introduce the logic Lω ∞ω, define games for it, and prove that fixed point logics are embeddable into it. We 10 1 Introduction study definability of types for finite variable logics, and use them to provide a purely logical counterpart of the Ptime vs. Pspace question. In Chap. 12 we study the asymptotic behavior of FO and prove that every FO sentence is either true in almost all structures, or false in almost all structures. This phenomenon is known as the zero-one law. We also prove that Lω ∞ω, and hence fixed point logics, have the zero-one law. In the same chapter we define an infinite structure whose theory consists precisely of FO sentences that hold in almost all structures. We also prove that almost everywhere, fixed point logics collapse to FO. In Chap. 13, we show how finite and infinite model theory mix: we look at finite structures that live in an infinite one, and study the power of FO over such hybrid structures. We prove that for some underlying infinite structures, like N, +, · , every computable property of finite structures embedded into them can be defined, but for others, like R, +, · , one can only define properties which are already expressible in FO over the finite structure alone. We also explain connections between such mixed logics and database query languages. Finally, in Chap. 14, we outline other applications of finite model theory: in decision problems in mathematical logic, in formal verification of properties of finite state systems, and in constraint satisfaction. 1.5 Exercises Exercise 1.1. Show how to express the following properties of graphs in first-order logic: • A graph is complete. • A graph has an isolated vertex. • A graph has at least two vertices of out-degree 3. • Every vertex is connected by an edge to a vertex of out-degree 3. Exercise 1.2. Show how to express the following properties of graphs in existential second-order logic: • A graph has a kernel, i.e., a set of vertices X such that there is no edge between any two vertices in X, and every vertex outside of X is connected by an edge to a vertex of X. • A graph on n vertices has an independent set X (i.e., no two nodes in X are connected by an edge) of size at least n/2. • A graph has an even number of vertices. • A graph has an even number of edges. • A graph with m edges has a bipartite subgraph with at least m/2 edges. Exercise 1.3. (a) Show how to define the following regular languages in monadic second-order logic: • a∗ (b + c)∗ aa∗ ; 1.5 Exercises 11 • (aaa)∗ (bb)+ ; • “` (a + b)∗ cc∗ ´∗ (aa)∗ ”∗ a. For the first language, provide a first-order definition as well. (b) Let Φ be a monadic second-order logic sentence over strings. Show how to construct a sentence Ψ such that Ms |= Ψ iff there is a string s′ such that |s|=|s′ | and Ms·s′ |= Φ. Here |s| refers to the length of s, and s · s′ is the concatenation of s and s′ . Remark: once we prove B¨uchi’s theorem in Chap. 7, you will see that the above statement says that if L is a regular language, then the language 1 2 L = {s | for some s′ , |s|=|s′ | and s · s′ ∈ L} is regular too (see, e.g., Exercise 3.16 in Hopcroft and Ullman [126]). 2 Preliminaries The goal of this chapter is to provide the necessary background from mathematical logic, formal languages, and complexity theory. 2.1 Background from Mathematical Logic We now briefly review some standard definitions from mathematical logic. Definition 2.1. A vocabulary σ is a collection of constant symbols (denoted c1, . . . , cn, . . . ), relation, or predicate, symbols (P1, . . . , Pn, . . . ) and function symbols (f1, . . . , fn, . . . ). Each relation and function symbol has an associated arity. A σ-structure (also called a model) A = A, {cA i }, {PA i }, {fA i } consists of a universe A together with an interpretation of • each constant symbol ci from σ as an element cA i ∈ A; • each k-ary relation symbol Pi from σ as a k-ary relation on A; that is, a set PA i ⊆ Ak ; and • each k-ary function symbol fi from σ as a function fA i : Ak → A. A structure A is called finite if its universe A is a finite set. The universe of a structure is typically denoted by a Roman letter corresponding to the name of the structure; that is, the universe of A is A, the universe of B is B, and so on. We shall also occasionally write x ∈ A instead of x ∈ A. For example, if σ has constant symbols 0, 1, a binary relation symbol <, and two binary function symbols · and +, then one possible structure for σ is the real field R = R, 0R , 1R , n, wn′ = (the blank symbol). If M is in state q, we denote this configuration by w1w2 . . . wj−1qwj . . . wn. We define the relation C ⊢δ C′ as follows. If C = s · q · a · s′ , where s, s′ ∈ ∆∗ , a ∈ ∆, and q ∈ Qa ∪ Qr, then • if (q′ , b, ℓ) ∈ δ(q, a), then C ⊢δ s0 · q′ · c · b · s′ , where s = s0 · c (that is, a is replaced by b, the new state is q′ , and the head moves left; if s = ǫ, then C ⊢δ q′ · b · s′ ), and • if (q′ , b, r) ∈ δ(q, a), then C ⊢δ s · b · q′ · s′ (that is, a is replaced by b, the new state is q′ , and the head moves right). 2.3 Background from Complexity Theory 19 A configuration s · q · s′ is accepting if q ∈ Qa, and rejecting if q ∈ Qr. Suppose we have a string s ∈ Σ∗ . The initial configuration C(s) corresponding to this string is q0 · s; that is, the state is q0, the head points to the first position of s, and the tape contains s followed by blanks. We say that s is accepted by M if there is a sequence of configurations C0, C1, . . . , Cn such that C0 = C(s), Ci ⊢δ Ci+1, i < n, and Cn is an accepting configuration. The set of all strings accepted by M is denoted by L(M). We call a subset L of Σ∗ recursively enumerable, or r.e. for short, if there is a Turing machine M such that L = L(M). Notice that in general, there are three possibilities for computations by a Turing machine M on input s: M accepts s, or M eventually enters a rejecting state, or M loops; that is, it never enters a halting state. We call a Turing machine halting if the last outcome is impossible. In other words, on every input, M eventually enters a halting state. We call a subset L of Σ∗ recursive if there is a halting Turing machine M such that L = L(M). Halting Turing machines can be seen as deciders for some sets L: for every string s, M eventually enters either an accepting or a rejecting state, which decides whether s ∈ L. For that reason, one sometimes uses decidable instead of recursive. When we speak of decidable problems, we mean that a suitable encoding of the problem as a subset of Σ∗ for some finite Σ is decidable. A canonical example of an undecidable problem is the halting problem: given a Turing machine M and an input w, does M halt on w (i.e., eventually enters a halting state)? In general, any nontrivial property of recursively enumerable sets is undecidable. One result we shall use later is that it is undecidable whether a given Turing machine halts on the empty input. 2.3 Background from Complexity Theory Let L be a language accepted by a halting Turing machine M. Assume that for some function f : N → N, it is the case that the number of transitions M makes before accepting or rejecting a string s is at most f(|s|), where |s| is the length of s. If M is deterministic, then we write L ∈ DTIME(f); if M is nondeterministic, then we write L ∈ NTIME(f). We define the class Ptime of polynomial-time computable problems as Ptime = k∈N DTIME(nk ), and the class NP of problems computable by nondeterministic polynomialtime Turing machines as NP = k∈N NTIME(nk ). 20 2 Preliminaries The class coNP is defined as the class of languages whose complements are in NP. Notice that Ptime is closed under complementation, but this is not clear in the case of NP. We have Ptime ⊆ NP ∩ coNP, but it is not known whether the containment is proper, and whether NP equals coNP. Now assume that f(n) ≥ n for all n ∈ N. Define DSPACE(f) as the class of languages L that are accepted by deterministic halting Turing machines M such that for every string s, the length of the longest configuration of M that occurs during the computation on s is at most f(|s|). In other words, M does not use more than f(|s|) cells of the tape. Similarly, we define the class NSPACE(f) by using nondeterministic machines. We then let Pspace = k∈N DSPACE(nk ). In the case of space complexity, the nondeterministic case collapses to the deterministic one: by Savitch’s theorem, Pspace = k∈N NSPACE(nk ). To define space complexity for sublinear functions f, we use a model of Turing machines with a work tape. In such a model, a machine M has two tapes, and two heads. The first tape is the input tape: it stores the input, and the machine cannot write on it (but can move the head). The second tape is the work tape, which operates as the normal tape of a Turing machine. We define the class NLog as the class of languages accepted by such nondeterministic machines where the size of the work tape does not exceed O(log |s|), on the input s. Likewise, we define the class DLog as the class of language accepted by deterministic machines with the work tape, where at most O(log |s|) cells of the work tape are used. Finally, we define the polynomial hierarchy PH. Let Σp 0 = Πp 0 = Ptime. Define inductively Σp i = NPΣp i−1 , for i ≥ 1. That is, languages in Σp i are those accepted by a nondeterministic Turing machine running in polynomial time such that this machine can make “calls” to another machine that computes a language in Σp i−1. Such a call is assumed to have unit cost. We define the class Πp i as the class of languages whose complements are in Σp i . Notice that Σp 1 = NP and Πp 1 = coNP. We define the polynomial hierarchy as PH = i∈N Σp i = i∈N Πp i . This will be sufficient for our purposes, but there is another interesting definition of PH in terms of alternating Turing machines. The relationship between the complexity classes we introduced is as fol- lows: DLog ⊆ NLog ⊆ Ptime ⊆ NP coNP ⊆ PH ⊆ Pspace. 2.4 Bibliographic Notes 21 None of the containments of any two consecutive classes in this sequence is known to be proper, although it is known that NLog Pspace. We shall also refer to two classes based on exponential running time. These are Exptime = k∈N DTIME(2nk ) and Nexptime = k∈N NTIME(2nk ). Both of these contain Pspace. Later in the book we shall see a number of other complexity classes, in particular circuit-based classes AC0 and TC0 (which are both contained in DLog). 2.4 Bibliographic Notes Standard mathematical logic texts are Enderton [66], Ebbinghaus, Flum, and Thomas [61], and van Dalen [241]; infinite model theory is the subject of Chang and Keisler [35], Hodges [125], and Poizat [201]. Good references on complexity theory are Papadimitriou [195], Johnson [139], and Du and Ko [59]. For the basics on automata and computability, see Hopcroft and Ullman [126], Khoussainov and Nerode [145], and Sipser [221]. 3 Ehrenfeucht-Fra¨ıss´e Games We start this chapter by giving a few examples of inexpressibility proofs, using the standard model-theoretic machinery (compactness, the L¨owenheimSkolem theorem). We then show that this machinery is not generally applicable in the finite model theory context, and introduce the notion of Ehrenfeucht-Fra¨ıss´e games for first-order logic. We prove the EhrenfeuchtFra¨ıss´e theorem, characterizing the expressive power of FO via games, and introduce the notion of types, which will be central throughout the book. 3.1 First Inexpressibility Proofs How can one prove that a certain property is inexpressible in FO? Certainly logicians must have invented tools for proving such results, and we shall now see a few examples. The problem is that these tools are not particularly well suited to the finite context, so in the next section, we introduce a different technique that will be used for FO and other logics over finite models. In the first example, we deal with connectivity: given a graph G, is it connected? Recall that a graph with an edge relation E is connected if for every two nodes a, b one can find a number n and nodes c1, . . . , cn ∈ V such that (a, c1), (c1, c2), . . . , (cn, b) are all edges in the graph. A standard modeltheoretic argument below shows that connectivity is not FO-definable. Proposition 3.1. Connectivity of arbitrary graphs is not FO-definable. Proof. Assume that connectivity is definable by a sentence Φ, over vocabulary σ = {E}. Let σ2 expand σ with two constant symbols, c1 and c2. For every n, let Ψn be the sentence ¬ ∃x1 . . . ∃xn (E(c1, x1) ∧ E(x1, x2) ∧ . . . ∧ E(xn, c2)) , saying that there is no path of length n + 1 from c1 to c2. Let T be the theory 24 3 Ehrenfeucht-Fra¨ıss´e Games {Ψn | n > 0} ∪ {¬(c1 = c2), ¬E(c1, c2)} ∪ {Φ}. We claim that T is consistent. By compactness, we have to show that every finite subset T ′ ⊆ T is consistent. Indeed, let N be such that for all Ψn ∈ T ′ , n < N. Then a connected graph in which the shortest path from c1 to c2 has length N + 1 is a model of T ′ . Since T is consistent, it has a model. Let G be a model of T . Then G is connected, but there is no path from c1 to c2 of length n, for any n. This contradiction shows that connectivity is not FO-definable. Does the proof above tell us that FO, or relational calculus, cannot express the connectivity test over finite graphs? Unfortunately, it does not. While connectivity is not definable in FO over arbitrary graphs, the proof above leaves open the possibility that there is a first-order sentence that correctly tests connectivity only for finite graphs. But to prove the desired result for relational calculus, one has to show inexpressibility of connectivity over finite graphs. Can one modify the proof above for finite models? An obvious way to do so would be to use compactness over finite graphs (i.e., if every finite subset of T has a finite model, then T has a finite model), assuming this holds. Unfortunately, this turns out not to be the case. Proposition 3.2. Compactness fails over finite models: there is a theory T such that 1. T has no finite models, and 2. every finite subset of T has a finite model. Proof. We assume that σ = ∅, and define λn as a sentence stating that the universe has at least n distinct elements: λn ≡ ∃x1 . . . ∃xn i=j ¬(xi = xj). (3.1) Now T = {λn|n ≥ 0}. Clearly, T has no finite model, but for each finite subset {λn1 , . . . , λnk } of T , a set whose cardinality exceeds all the ni’s is a model. However, sometimes a compactness argument works nicely in the finite context. We now consider a very important property, which will be seen many times in this book. We want to test if the cardinality of the universe is even. That is, we are interested in query even defined as even(A) = true iff |A| mod 2 = 0. Note that this only makes sense over finite models; for infinite A the value of even could be arbitrary. 3.1 First Inexpressibility Proofs 25 Proposition 3.3. Assume that σ = ∅. Then even is not FO-definable. Proof. Suppose even is definable by a sentence Φ. Consider sentences λn (3.1) from the proof of Proposition 3.2 and two theories: T1 = {Φ} ∪ {λk | k > 0}, T2 = {¬Φ} ∪ {λk | k > 0}. By compactness, both are consistent. These theories only have infinite models, so by the L¨owenheim-Skolem theorem, both have countable models, A1 and A2. Since σ = ∅, the structures A1 and A2 are just countable sets, and hence isomorphic. Thus, we have two isomorphic models, A1 and A2, with A1 |= Φ and A2 |= ¬Φ. This contradiction proves the result. This is nice, but there is a small problem: we assumed that the vocabulary is empty. But what if we have, for example, σ = {<}, and we want to prove that evenness of ordered sets is not definable? In this case we would expand T1 and T2 with axioms of ordered sets, and we would obtain, by compactness and L¨owenheim-Skolem, two countable linear orderings A1 and A2, one a model of Φ, the other a model of ¬Φ. This is a dead end, since two arbitrary countable linear orders need not be isomorphic (in fact, some can be distinguished by first-order sentences: think, for example, of a discrete order like N, < and a dense one like Q, < ). Thus, while traditional tools from model theory may help us prove some results, they are often not sufficient for proving results about finite models. We shall examine, in subsequent chapters, tools designed for proving expressivity bounds in the finite case. As an introduction to these tools, let us revisit the proof of Proposition 3.3. In the proof, we constructed two models, A1 and A2, that agree on all FO sentences (since they are isomorphic), and yet compactness tells us that they disagree on Φ, which was assumed to define even – hence even is not first-order. Can we extend this technique to prove inexpressibility results over finite models? The most straightforward attempt to do so fails due to the following. Lemma 3.4. For every finite structure A, there is a sentence ΦA such that B |= ΦA iff B ∼= A. Proof. Assume without loss of generality that A is a graph: σ = {E}. Let A = {a1, . . . , an}, E . Define ΦA as ∃x1 . . . ∃xn    i=j ¬(xi = xj) ∧ ∀y i y = xi ∧ (ai,aj)∈E E(xi, xj) ∧ (ai,aj )∈E ¬E(xi, xj)    . Then B |= ΦA iff B ∼= A. 26 3 Ehrenfeucht-Fra¨ıss´e Games In particular, every two finite structures that agree on all FO sentences are isomorphic, and hence agree on any Boolean query (as Boolean queries are closed under isomorphism). The idea that is prevalent in inexpressibility proofs in finite model theory is, nevertheless, very close to the original idea of finding structures A and B that agree on all FO sentences but disagree on a given query. But instead of two structures, A and B, we consider two families of structures, {Ak | k ∈ N} and {Bk | k ∈ N}, and instead of all FO sentences, we consider a certain partition of FO sentences into infinitely many classes. In general, the methodology is as follows. Suppose we want to prove that a property P is not expressible in a logic L. We then partition the set of all sentences of L into countably many classes, L[0], L[1], . . . , L[k], . . . (we shall see in Sect. 3.3 how to do it), and find two families of structures, {Ak | k ∈ N} and {Bk | k ∈ N}, such that • Ak |= Φ iff Bk |= Φ for every L[k] sentence Φ; and • Ak has property P, but Bk does not. Clearly, this would show P ∈ L; it “only” remains to show what L[k] is, and give techniques that help us prove that two structures agree on L[k]. We shall do precisely that in the rest of the chapter, for the case of L = FO, and later for other logics. 3.2 Definition and Examples of Ehrenfeucht-Fra¨ıss´e Games Ehrenfeucht-Fra¨ıss´e games give us a nice tool for describing expressiveness of logics over finite models. In general, games are applicable for both finite and infinite models (at least for FO), but we have seen that in the infinite case we have a number of more powerful tools. In fact, in some model theory texts, Ehrenfeucht-Fra¨ıss´e games are only briefly mentioned (or even appear only as exercises), but in the finite case, their applicability makes them a central notion. The idea of the game – for FO and other logics as well – is almost invariably the same. There are two players, called the spoiler and the duplicator (or, less imaginatively, player I and player II). The board of the game consists of two structures, say A and B. The goal of the spoiler is to show that these two structures are different; the goal of the duplicator is to show that they are the same. In the classical Ehrenfeucht-Fra¨ıss´e game, the players play a certain number of rounds. Each round consists of the following steps: 1. The spoiler picks a structure (A or B). 3.2 Definition and Examples of Ehrenfeucht-Fra¨ıss´e Games 27 2. The spoiler makes a move by picking an element of that structure: either a ∈ A or b ∈ B. 3. The duplicator responds by picking an element in the other structure. An illustration is given in Fig. 3.1. The spoiler’s moves are shown as filled circles, and the duplicator’s moves as empty circles. In the first round, the spoiler picks B and selects b1 ∈ B; the duplicator responds by a1 ∈ A. In the next round, the spoiler changes structures and picks a2 ∈ A; the duplicator responds by b2 ∈ B. In the third round the spoiler plays b3 ∈ B; the response of the duplicator is a3 ∈ A. Since there is a game, someone must win it. To define the winning condition we need a crucial definition of a partial isomorphism. Recall that all finite structures have a relational vocabulary (no function symbols). Definition 3.5 (Partial isomorphism). Let A, B be two σ-structures, where σ is relational, and a = (a1, . . . , an) and b = (b1, . . . , bn) two tuples in A and B respectively. Then (a, b) defines a partial isomorphism between A and B if the following conditions hold: • For every i, j ≤ n, ai = aj iff bi = bj. • For every constant symbol c from σ, and every i ≤ n, ai = cA iff bi = cB . • For every k-ary relation symbol P from σ and every sequence (i1, . . . , ik) of (not necessarily distinct) numbers from [1, n], (ai1 , . . . , aik ) ∈ PA iff (bi1 , . . . , bik ) ∈ PB . In the absence of constant symbols, this definition says that the mapping ai → bi, i ≤ n, is an isomorphism between the substructures of A and B generated by {a1, . . . , an} and {b1, . . . , bn}, respectively. After n rounds of an Ehrenfeucht-Fra¨ıss´e game, we have moves (a1, . . . , an) and (b1, . . . , bn). Let c1, . . . , cl be the constant symbols in σ; then c A denotes (cA 1 , . . . , cA l ) and likewise for c B . We say that (a, b) is a winning position for the duplicator if ((a, c A ), (b, c B )) is a partial isomorphism between A and B. In other words, the map that sends each ai into bi and each cA j into cB j is an isomorphism between the substructures of A and B generated by {a1, . . . , an, cA 1 , . . . , cA l } and {b1, . . . , bn, cB 1 , . . . , cB l } respectively. We say that the duplicator has an n-round winning strategy in the Ehrenfeucht-Fra¨ıss´e game on A and B if the duplicator can play in a way 28 3 Ehrenfeucht-Fra¨ıss´e Games   -  A B a1 a2 b1 b2 b3 c A c B a3 Fig. 3.1. Ehrenfeucht-Fra¨ıss´e game that guarantees a winning position after n rounds, no matter how the spoiler plays. Otherwise, the spoiler has an n-round winning strategy. If the duplicator has an n-round winning strategy, we write A ≡n B. Observe that A ≡n B implies A ≡k B for every k ≤ n. Before we connect Ehrenfeucht-Fra¨ıss´e games and FO-definability, we give some examples of winning strategies. Games on Sets In this example, the vocabulary σ is empty. That is, a structure is just a set. Let |A|, |B| ≥ n. Then A ≡n B. The strategy for the duplicator works as follows. Suppose i rounds have been played, and the position is ((a1, . . . , ai), (b1, . . . , bi)). Assume the spoiler picks an element ai+1 ∈ A. If ai+1 = aj for j ≤ i, then the duplicator responds with bi+1 = bj; otherwise, the duplicator responds with any bj+1 ∈ B − {b1, . . . , bi} (which exists since |B |≥ n). Games on Linear Orders Our next example is a bit more complicated, as we add a binary relation < to σ, to be interpreted as a linear order. Now suppose L1, L2 are two linear orders of size at least n (i.e., structures of the form {1, . . . , m}, < , m ≥ n). Is it true that L1 ≡n L2? It is very easy to see that the answer is negative even for the case of n = 2. Let L1 contain three elements (say {1, 2, 3}), and L2 two elements ({1, 2}). In the first move, the spoiler plays 2 in L1. The duplicator has to respond with either 1 or 2 in L2. Suppose the duplicator responds with 1 ∈ L2; then the spoiler plays 1 ∈ L1 and the duplicator is lost, since he has to respond with an element less than 1 in L1, and there is no such element. If the duplicator selects 2 ∈ L2 as his first-round move, the spoiler plays 3 ∈ L1, and the duplicator is lost again. Hence, L1 ≡2 L2. However, a winning strategy for the duplicator can be guaranteed if L1, L2 are much larger than the number of rounds. 3.2 Definition and Examples of Ehrenfeucht-Fra¨ıss´e Games 29 aj ai+1 al bj bi+1 bl 2k−i 2k−(i+1) 2k−(i+1) 2k−i 2k−(i+1) 2k−(i+1) d d 2k−i 2k−(i+1) 2k−i 2k−(i+1) aj ai+1 al bj bi+1 bl (a) d < 2k−(i+1) (b) L1 L2 Fig. 3.2. Illustration for the proof of Theorem 3.6 Theorem 3.6. Let k > 0, and let L1, L2 be linear orders of length at least 2k . Then L1 ≡k L2. We shall give two different proofs of this result that illustrate two different techniques often used in game proofs. Theorem 3.6, Proof # 1. The idea of the first proof is as follows. We use induction on the number of rounds of the game, and our induction hypothesis is stronger than just the partial isomorphism claim. The reason is that if we simply state that after i rounds we have a partial isomorphism, the induction step will not get off the ground as there are too few assumptions. Hence, we have to make additional assumptions. But if we try to impose too many conditions, there is no guarantee that a game can proceed in a way that preserves them. The main challenge in proofs of this type is to find the right induction hypothesis: the one that is strong enough to imply partial isomorphism, and that has enough conditions to make the inductive proof possible. We now illustrate this general principle by proving Theorem 3.6. We expand the vocabulary with two new constant symbols min and max, to be interpreted as the minimum and the maximum element of a linear ordering, and we prove a stronger fact that L1 ≡k L2 in the expanded vocabulary. Let L1 have the universe {1, . . ., n} and L2 have the universe {1, . . ., m}. Assume that the lengths of L1 and L2 are at least 2k ; that is, n, m ≥ 2k +1. The distance between two elements x, y of the universe, d(x, y), is simply |x − y|. We claim that the duplicator can play in such a way that the following holds after each round i. Let a = (a−1, a0, a1, . . . , ai) consist of a−1 = minL1 , a0 = maxL1 and the i moves a1, . . . , ai in L1, and likewise let b = (b−1, b0, b1, . . . , bi) consist of b−1 = minL2 , b0 = maxL2 and the i moves in L2. Then, for −1 ≤ j, l ≤ i: 30 3 Ehrenfeucht-Fra¨ıss´e Games 1. if d(aj, al) < 2k−i , then d(bj, bl) = d(aj, al). 2. if d(aj, al) ≥ 2k−i , then d(bj, bl) ≥ 2k−i . 3. aj ≤ al ⇐⇒ bj ≤ bl. (3.2) We prove (3.2) by induction; notice that the third condition ensures partial isomorphism, so we do prove an induction statement that says more than just maintaining partial isomorphism. And now a simple proof: the base case of i = 0 is immediate since d(a−1, a0), d(b−1, b0) ≥ 2k by assumption. For the induction step, suppose the spoiler is making his (i + 1)st move in L1 (the case of L2 is symmetric). If the spoiler plays one of aj, j ≤ i, the response is bj, and all the conditions are trivially preserved. Otherwise, the spoiler’s move falls into an interval, say aj < ai+1 < al, such that no other previously played moves are in the same interval. By condition 3 of (3.2), this means that the interval between bj and bl contains no other elements of b. There are two cases: • d(aj, al) < 2k−i . Then d(bj, bl) = d(aj, al), and the intervals [aj, al] and [bj, bl] are isomorphic. Then we simply find bi+1 so that d(aj, ai+1) = d(bj, bi+1) and d(ai+1, al) = d(bi+1, bl). Clearly, this ensures that all the conditions in (3.2) hold. • d(aj, al) ≥ 2k−i . In this case d(bj, bl) ≥ 2k−i . We have three possibilities: 1. d(aj, ai+1) < 2k−(i+1) . Then d(ai+1, al) ≥ 2k−(i+1) , and we can choose bi+1 so that d(bj, bi+1) = d(aj, ai+1) and d(bi+1, bl) ≥ 2k−(i+1) . This is illustrated in Fig. 3.2 (a), where d stands for d(aj, ai+1). 2. d(ai+1, al) < 2k−(i+1) . This case is similar to the previous one. 3. d(aj, ai+1) ≥ 2k−(i+1) , d(ai+1, al) ≥ 2k−(i+1) . Since d(bj, bl) ≥ 2k−i , by choosing bi+1 to be the middle of the interval [bj, bl] we ensure that d(bj, bi+1) ≥ 2k−(i+1) and d(bi+1, bl) ≥ 2k−(i+1) . This case is illustrated in Fig. 3.2 (b). Thus, in all the cases, (3.2) is preserved. This completes the inductive proof; hence we have shown that the duplicator can win a k-round Ehrenfeucht-Fra¨ıss´e game on L1 and L2. Theorem 3.6, Proof # 2. The second proof relies on the composition method: a way of composing simpler games into more complicated ones. Before we proceed, we make the following observation. Suppose L1 ≡k L2. Then we can assume, without loss of generality, that the duplicator has a winning strategy in which he responds to the minimal element of one ordering by the minimal element of the other ordering (and likewise for the maximal elements). Indeed, suppose the spoiler plays minL1 , the minimal element of L1. If the duplicator responds by b > minL2 and there is at least one round left, then in the next round the spoiler plays minL2 and the duplicator loses. If this is the last round of the game, then the duplicator can respond by any element 3.2 Definition and Examples of Ehrenfeucht-Fra¨ıss´e Games 31 that does not exceed those previously played in L2, in particular, minL2 . The proof for other cases is similar. Let L be a linear ordering, and a ∈ L. By L≤a we mean the substructure of L that consists of all the elements b ≤ a, and by L≥a the substructure of L that consists of all the elements b ≥ a. The composition result we shall need says the following. Lemma 3.7. Let L1, L2, a ∈ L1, and b ∈ L2 be such that L≤a 1 ≡k L≤b 2 and L≥a 1 ≡k L≥b 2 . Then (L1, a) ≡k (L2, b). Proof of Lemma 3.7. The strategy for the duplicator is very simple: if the spoiler plays in L≤a 1 , the duplicator uses the winning strategy for L≤a 1 ≡k L≤b 2 , and if the spoiler plays in L≥a 1 , the duplicator uses the winning strategy for L≥a 1 ≡k L≥b 2 (the case when the spoiler plays in L2 is symmetric). By the remark preceding the lemma, the duplicator always responds to a by b and to b by a, which implies that the strategy allows him to win in the k-round game on (L1, a) and (L2, b). And now we prove Theorem 3.6. The proof again is by induction on k, and the base case is easily verified. For the induction step, assume we have two linear orderings, L1 and L2, of length at least 2k . Suppose the spoiler plays a ∈ L1 (the case when the spoiler plays in L2 is symmetric). We will show how to find b ∈ L2 so that (L1, a) ≡k−1 (L2, b). There are three cases: • The length of L≤a 1 is less than 2k−1 . Then let b be an element of L2 such that d(minL1 , a) = d(minL2 , b); in other words, L≤a 1 ∼= L≤b 2 . Since the length of each of L≥a 1 and L≥b 2 is at least 2k−1 , by the induction hypothesis, L≥a 1 ≡k−1 L≥b 2 . Hence, by Lemma 3.7, (L1, a) ≡k−1 (L2, b). • The length of L≥a 1 is less than 2k−1 . This case is symmetric to the previous case. • The lengths of both L≤a 1 and L≥a 1 are at least 2k−1 . Since the length of L2 is at least 2k , we can find b ∈ L2 such that the lengths of both L≤b 2 and L≥b 2 are at least 2k−1 . Then, by the induction hypothesis, L≤a 1 ≡k−1 L≤b 2 and L≥a 1 ≡k−1 L≥b 2 , and by Lemma 3.7, (L1, a) ≡k−1 (L2, b). Thus, for every a ∈ L1, we can find b ∈ L2 such that (L1, a) ≡k−1 (L2, b) (and symmetrically with the roles of L1 and L2 reversed). This proves L1 ≡k L2, and completes the proof of the theorem. 32 3 Ehrenfeucht-Fra¨ıss´e Games 3.3 Games and the Expressive Power of FO And now it is time to see why games are important. For this, we need a crucial definition of quantifier rank. Definition 3.8 (Quantifier rank). The quantifier rank of a formula qr(ϕ) is its depth of quantifier nesting. That is: • If ϕ is atomic, then qr(ϕ) = 0. • qr(ϕ1 ∨ ϕ2) = qr(ϕ1 ∧ ϕ2) = max(qr(ϕ1), qr(ϕ2)). • qr(¬ϕ) = qr(ϕ). • qr(∃xϕ) = qr(∀xϕ) = qr(ϕ) + 1. We use the notation FO[k] for all FO formulae of quantifier rank up to k. In general, quantifier rank of a formula is different from the total of number of quantifiers used. For example, we can define a family of formulae by induction: d0(x, y) ≡ E(x, y), and dk ≡ ∃z dk−1(x, z) ∧ dk−1(z, y). The quantifier rank of dk is k, but the total number of quantifiers used in dk is 2k − 1. For formulae in the prenex form (i.e., all quantifiers are in front, followed by a quantifier-free formula), quantifier rank is the same as the total number of quantifiers. Given a set S of FO sentences (over vocabulary σ), we say that two σstructures A and B agree on S if for every sentence Φ of S, it is the case that A |= Φ ⇔ B |= Φ. Theorem 3.9 (Ehrenfeucht-Fra¨ıss´e). Let A and B be two structures in a relational vocabulary. Then the following are equivalent: 1. A and B agree on FO[k]. 2. A ≡k B. We will prove this theorem shortly, but first we discuss how this is useful for proving inexpressibility results. Characterizing the expressive power of FO via games gives rise to the following methodology for proving inexpressibility results. Corollary 3.10. A property P of finite σ-structures is not expressible in FO if for every k ∈ N, there exist two finite σ-structures, Ak and Bk, such that: • Ak ≡k Bk, and • Ak has property P, and Bk does not. 3.4 Rank-k Types 33 Proof. Assume to the contrary that P is definable by a sentence Φ. Let k = qr(Φ), and pick Ak and Bk as above. Then Ak ≡k Bk, and thus if Ak has property P, then so does Bk, which contradicts the assumptions. We shall see in the next section that the if of Corollary 3.10 can be replaced by iff ; that is, Ehrenfeucht-Fra¨ıss´e games are complete for first-order definability. The methodology above extends from sentences to formulas with free vari- ables. Corollary 3.11. An m-ary query Q on σ-structures is not expressible in FO iff for every k ∈ N, there exist two finite σ-structures, Ak and Bk, and two m-tuples a and b in them such that: • (Ak, a) ≡k (Bk, b), and • a ∈ Q(Ak) and b ∈ Q(Bk). We next see some simple examples of using games; more examples will be given in Sect. 3.6. An immediate application of the Ehrenfeucht-Fra¨ıss´e theorem is that even is not FO-expressible when σ is empty: we take Ak to contain k elements, and Bk to contain k + 1 elements. However, we have already proved this by a simple compactness argument in Sect. 3.1. But we could not prove, by the same argument, that even is not expressible over finite linear orders. Now we get this for free: Corollary 3.12. even is not FO-expressible over linear orders. Proof. Pick Ak to be a linear order of length 2k , and Bk to be a linear order of length 2k +1. By Theorem 3.6, Ak ≡k Bk. The statement now follows from Corollary 3.10. 3.4 Rank-k Types We now further analyze FO[k] and introduce the concept of types (more precisely, rank-k types). First, what is FO[0]? It contains Boolean combinations of atomic formulas. If we are interested in sentences in FO[0], these are precisely atomic sentences: that is, sentences without quantifiers. In a relational vocabulary, such sentences are Boolean combinations of formulae of the form c = c′ and R(c1, . . . , ck), where c, c′ , c1, . . . , ck are constant symbols from σ. Next, assume that ϕ is an FO[k + 1] formula. If ϕ = ϕ1 ∨ ϕ2, then both ϕ1, ϕ2 are FO[k + 1] formulae, and likewise for ∧; if ϕ = ¬ϕ1, then ϕ1 ∈ FO[k + 1]. However, if ϕ = ∃xψ or ϕ = ∀xψ, then ψ is an FO[k] formula. Hence, every formula from FO[k + 1] is equivalent to a Boolean combination of formulae of the form ∃xψ, where ψ ∈ FO[k]. Using this, we show: 34 3 Ehrenfeucht-Fra¨ıss´e Games Lemma 3.13. If σ is finite, then up to logical equivalence, FO[k] over σ contains only finitely many formulae in m free variables x1, . . . , xm. Proof. The proof is by induction on k. The base case is FO[0]; there are only finitely many atomic formulae, and hence only finitely many Boolean combinations of those, up to logical equivalence. Going from k to k + 1, recall that each formula ϕ(x1, . . . , xm) from FO[k + 1] is a Boolean combination of ∃xm+1ψ(x1, . . . , xm, xm+1), where ψ ∈ FO[k]. By the hypothesis, the number of FO[k] formulae in m + 1 free variables x1, . . . , xm+1 is finite (up to logical equivalence) and hence the same can be concluded about FO[k + 1] formulas in m free variables. In model theory, a type (or m-type) of an m-tuple a over a σ structure A is the set of all FO formulae ϕ in m free variables such that A |= ϕ(a). This notion is too general in our setting, as the type of a over a finite A describes (A, a) up to isomorphism. Definition 3.14 (Types). Fix a relational vocabulary σ. Let A be a σstructure, and a an m-tuple over A. Then the rank-k m-type of a over A is defined as tpk(A, a) = {ϕ ∈ FO[k] | A |= ϕ(a)}. A rank-k m-type is any set of formulae of the form tpk(A, a), where |a|= m. When m is clear from the context, we speak of rank-k types. In the special case of m = 0 we deal with tpk(A), defined as the set of FO[k] sentences that hold in A. Also note that rank-k types are maximally consistent sets of formulae: that is, each rank-k type S is consistent, and for every ϕ(x1, . . . , xm) ∈ FO[k], either ϕ ∈ S or ¬ϕ ∈ S. At this point, it seems that rank-k types are inherently infinite objects, but they are not, because of Lemma 3.13. We know that up to logical equivalence, FO[k] is finite, for a fixed number m of free variables. Let ϕ1(x), . . . , ϕM (x) enumerate all the nonequivalent formulae in FO[k] with free variables x = (x1, . . . , xm). Then a rank-k type is uniquely determined by a subset K of {1, . . . , M} specifying which of the ϕi’s belong to it. Moreover, testing that x satisfies all the ϕi’s with i ∈ K and does not satisfy all the ϕj’s with j ∈ K can be done by a single formula αK(x) ≡ i∈K ϕi ∧ j∈K ¬ϕj. (3.3) Note that αK(x) itself is an FO[k] formula, since no new quantifiers were introduced. Furthermore, all the αK’s are mutually exclusive: for K = K′ , if A |= αK(a), then A |= ¬αK′ (a). Every FO[k] formula is a disjunction of some of the αK’s: indeed, every FO[k] formula is equivalent to some ϕi in the above enumeration, which is the disjunction of all αK’s with i ∈ K. Summing up, we have the following. 3.5 Proof of the Ehrenfeucht-Fra¨ıss´e Theorem 35 Theorem 3.15. a) For a finite relational vocabulary σ, the number of different rank-k m-types is finite. b) Let T1, . . . , Tr enumerate all the rank-k m-types. There exist FO[k] formulae α1(x), . . . , αr(x) such that: • for every A and a ∈ Am , it is the case that A |= αi(a) iff tpk(A, a) = Ti, and • every FO[k] formula ϕ(x) in m free variables is equivalent to a disjunction of some αi’s. Thus, in what follows we normally associate types with their defining formulae αi’s (3.3). It is important to remember that these defining formulae for rank-k types have the same quantifier rank, k. From the Ehrenfeucht-Fra¨ıss´e theorem and Theorem 3.15, we obtain: Corollary 3.16. The equivalence relation ≡k is of finite index (that is, has finitely many equivalence classes). As promised in the last section, we now show that games are complete for characterizing the expressive power of FO: that is, the if of Corollary 3.10 can be replaced by iff. Corollary 3.17. A property P is expressible in FO iff there exists a number k such that for every two structures A, B, if A ∈ P and A ≡k B, then B ∈ P. Proof. If P is expressible by an FO sentence Φ, let k = qr(Φ). If A ∈ P, then A |= Φ, and hence for B with A ≡k B, we have B |= Φ. Thus, B ∈ P. Conversely, if A ∈ P and A ≡k B imply B ∈ P, then any two structures with the same rank-k type agree on P, and hence P is a union of types, and thus definable by a disjunction of some of the αi’s defined by (3.3). Thus, a property P is not expressible in FO iff for every k, one can find two structures, Ak ≡k Bk, such that Ak has P and Bk does not. 3.5 Proof of the Ehrenfeucht-Fra¨ıss´e Theorem We shall prove the equivalence of 1 and 2 in the Ehrenfeucht-Fra¨ıss´e theorem, as well as a new important condition, the back-and-forth equivalence. Before stating this condition, we briefly analyze the equivalence relation ≡0. When does the duplicator win the game without even starting? This happens iff (∅, ∅) is a partial isomorphism between two structures A and B. That is, if c is the tuple of constant symbols, then cA i = cA j iff cB i = cB j for every i, j, and for each relation symbol R, the tuple (cA i1 , . . . , cA ik ) is in RA iff the tuple (cB i1 , . . . , cB ik ) is in RB . In other words, (∅, ∅) is a partial isomorphism between A and B iff A and B satisfy the same atomic sentences. 36 3 Ehrenfeucht-Fra¨ıss´e Games We now use this as the basis for the inductive definition of back-and-forth relations on A and B. More precisely, we define a family of relations ≃k on pairs of structures of the same vocabulary as follows: • A ≃0 B iff A ≡0 B; that is, A and B satisfy the same atomic sentences. • A ≃k+1 B iff the following two conditions hold: forth: for every a ∈ A, there exists b ∈ B such that (A, a) ≃k (B, b); back: for every b ∈ B, there exists a ∈ A such that (A, a) ≃k (B, b). We now prove the following extension of Theorem 3.9. Theorem 3.18. Let A and B be two structures in a relational vocabulary σ. Then the following are equivalent: 1. A and B agree on FO[k]. 2. A ≡k B. 3. A ≃k B. Proof. By induction on k. The case of k = 0 is obvious. We first show the equivalence of 2 and 3. Going from k to k + 1, assume A ≃k+1 B; we must show A ≡k+1 B. Assume for the first move the spoiler plays a ∈ A; we find b ∈ B with (A, a) ≃k (B, b), and thus by the hypothesis (A, a) ≡k (B, b). Hence the duplicator can continue to play for k moves, and thus wins the k + 1-move game. The other direction is similar. With games replaced by the back-and-forth relation, we show the equivalence of 1 and 3. Assume A and B agree on all quantifier-rank k+1 sentences; we must show A ≃k+1 B. We prove the forth case; the back case is identical. Pick a ∈ A, and let αi define its rank-k 1-type. Then A |= ∃xαi(x). Since qr(αi) = k, this is a sentence of quantifier-rank k+1; hence B |= ∃xαi(x). Let b be the witness for the existential quantifier; that is, tpk(A, a) = tpk(B, b). Hence for every σ1 sentence Ψ of qr(Ψ) = k, we have (A, a) |= Ψ iff (B, b) |= Ψ, and thus (A, a) and (B, b) agree on quantifier-rank k sentences. By the hypothesis, this implies (A, a) ≃k (B, b). For the implication 3 → 1, we need to prove that A ≃k+1 B implies that A and B agree on FO[k +1]. Every FO[k +1] sentence is a Boolean combination of ∃xϕ(x), where ϕ ∈ FO[k], so it suffices to prove the result for sentences of the form ∃xϕ(x). Assume that A |= ∃xϕ(x), so A |= ϕ(a) for some a ∈ A. By forth, find b ∈ B such that (A, a) ≃k (B, b); hence (A, a) and (B, b) agree on FO[k] by the hypothesis. Hence, B |= ϕ(b), and thus B |= ∃xϕ(x). The converse (that B |= ∃xϕ(x) implies A |= ∃xϕ(x)) is identical, which completes the proof. 3.6 More Inexpressibility Results 37 ⇒ ⇒ Fig. 3.3. Reduction of parity to connectivity 3.6 More Inexpressibility Results So far we have used games to prove that even is not expressible in FO, in both ordered and unordered settings. Next, we show inexpressibility of graph connectivity over finite graphs. In Sect. 3.1 we used compactness to show that connectivity of arbitrary graphs is inexpressible, leaving open the possibility that it may be FO-definable over finite graphs. We now show that this cannot happen. It turns out that no new game argument is needed, as the proof uses a reduction from even over linear orders. Assume that connectivity of finite graphs is definable by an FO sentence Φ, in the vocabulary that consists of one binary relation symbol E. Next, given a linear ordering, we define a directed graph from it as described below. First, from a linear ordering < we define the successor relation succ(x, y) ≡ (x < y) ∧ ∀z (z ≤ x) ∨ (z ≥ y) . Using this, we define an FO formula γ(x, y) such that γ(x, y) is true iff one of the following holds: • y is the successor of the successor of x: ∃z succ(x, z) ∧ succ(z, y) , or • x is the predecessor of the last element, and y is the first element: ∃z (succ(x, z) ∧ ∀u(u ≤ z)) ∧ ∀u(y ≤ u), or • x is the last element and y is the successor of the first element (the FO formula is similar to the one above). Thus, γ(x, y) defines a new graph on the elements of the linear ordering; the construction is illustrated in Fig. 3.3. Now observe that the graph defined by γ is connected iff the size of the underlying linear ordering is odd. Hence, taking ¬Φ, and substituting γ for every occurrence of the predicate E, we get a sentence that tests even for linear orderings. Since this is impossible, we obtain the following. Corollary 3.19. Connectivity of finite graphs is not FO-definable. 38 3 Ehrenfeucht-Fra¨ıss´e Games . . . G2 k G1 k . . . . . . . . . Fig. 3.4. Graphs G1 k and G2 k So far all the examples of inexpressibility results proved via EhrenfeuchtFra¨ıss´e games were fairly simple. Unfortunately, this is a rather unusual situation; typically game proofs are hard, and often some nontrivial combinatorial arguments are required. We now present an additional example of a game proof, as well as a few more problems that could possibly be handled by games, but are better left until we have seen more powerful techniques. These show how the difficulty of game proofs can rapidly increase as the problems become more complex. Suppose that we want to test if a graph is a tree. By trees we mean directed rooted trees. This seems to be impossible in FO. To prove this, we follow the general methodology: that is, for each k we must find two graphs, G1 k ≡k G2 k, such that one of them is a tree, and the other one is not. We choose these graphs as follows: G1 k is the graph of a successor relation of length 2m, and G2 k has two connected components: one is the graph of a successor relation of length m, and the other one is a cycle of length m. We did not say what m is, and it will be clear from the proof what it should be: at this point we just say that m depends only on k, and is sufficiently large. Clearly G1 k is a tree (of degree 1), and G2 k is not, so we must show G1 k ≡k G2 k. In each of these two graphs there are two special points: the start and the endpoint of the successor relation. Clearly these must be preserved in the game, so we may just assume that the game starts in a position where these points were played. That is, we let a−1, a0 be the start and the endpoint of G1 k, and b−1, b0 be the start and the endpoint of the successor part of G2 k. We let ai’s stand for the points played in G1 k, and bi’s for the points played in G2 k. What do we put in the inductive hypothesis? The approach we take is very similar to the first proof of Theorem 3.6. We define the distance between two elements as the length of the shortest path between them. Notice that in the case of G2 k, the distance could be infinity, as the graph has two connected 3.6 More Inexpressibility Results 39 components. We then show that the duplicator can play in a way that ensures the following conditions after each round i: 1. if d(aj, al) ≤ 2k−i , then d(bj, bl) = d(aj, al). 2. if d(aj, al) > 2k−i , then d(bj, bl) > 2k−i . (3.4) These are very similar to conditions (3.2) used in the proof of Theorem 3.6. How do we prove that the duplicator can maintain these conditions? Suppose i rounds have been played, and the spoiler makes his move in round i+1. If the spoiler plays close (at a distance at most 2k−(i+1) ) to a previously played point, we can apply the proof of Theorem 3.6 to show that the duplicator has a response. But what if the spoiler plays at a distance greater than 2k−(i+1) from all the previously played points? In the proof of Theorem 3.6 we were able to place that move into some interval on a linear ordering and use some knowledge of that interval to find the response – but this does not work any more, since our graphs now have a different structure. Nevertheless, there is a way to ensure that the duplicator can maintain the winning conditions: simply by choosing m “very large”, we can always be sure that if fewer than k rounds of the game have been played, there is a point at a distance greater than 2k−(i+1) from all the previously played points in the graph. We leave it to the reader to calculate m for a given k (it is not that much different from the bound we had in Theorem 3.6). Thus, the duplicator can maintain all the conditions (3.4). In the proof of Theorem 3.6, one of the conditions of (3.2) stated that the moves in the game define a partial isomorphism. Here, we do not have this property, but we can still derive that after k rounds, the duplicator achieves a partial isomorphism. Indeed, suppose all k rounds have been played, and we have two elements ai, aj such that there is an edge between ai and aj. This means that d(ai, aj) = 1, and, by (3.4), d(bi, bj) = 1. Therefore, there is an edge between bi and bj. Conversely, let there be an edge between bi and bj. If there is no edge between ai and aj, then d(ai, aj) > 1, and, by (3.4), d(bi, bj) > 1, which contradicts our assumption that there is an edge between them. Thus, we have shown that G1 k ≡k G2 k, which proves the following. Proposition 3.20. It is impossible to test, by an FO sentence, if a finite graph is a tree. This proof is combinatorially slightly more involved than other game proofs we have seen, and yet it uses trees with only unary branching. So it does not tell us whether testing the property of being an n-ary tree, for n > 1, is expressible. Moreover, one can easily imagine that the combinatorics in a game argument even for binary trees will be much harder. And what if we are interested in more complex properties? For example, testing if a graph is: 40 3 Ehrenfeucht-Fra¨ıss´e Games • a balanced binary tree (the branching factor is 2, and all the maximal branches are of the same length); • a binary tree with all the maximal branches of different length; • or even a bit different: assuming that we know that the input is a binary tree, can we check, in FO, if it is balanced? It would thus be nice to have some easily verifiable criteria that guarantee a winning strategy for the duplicator, and that is exactly what we shall do in the next chapter. 3.7 Bibliographic Notes Examples of using compactness for proving some very easy inexpressibility results over finite models are taken from V¨a¨an¨anen [239] and Gaifman and Vardi [89]. Characterization of the expressive power of FO in terms of the back-andforth equivalence is due to Fra¨ıss´e [84]; the game description of the back-andforth equivalence is due to Ehrenfeucht [62]. Theorem 3.6 is a classical application of Ehrenfeucht-Fra¨ıss´e games, and was rediscovered many times, cf. Gurevich [117] and Rosenstein [209]. The composition method, used in the second proof of Theorem 3.6, will be discussed elsewhere in the book (e.g., exercise 3.15 in this chapter, as well as Chap. 7). For a recent survey, see Makowsky [177]. The proof of inexpressibility of connectivity is standard, see, e.g., [60, 133]. Types are a central concept of model theory, see [35, 125, 201]. The proof of the Ehrenfeucht-Fra¨ıss´e theorem given here is slightly different from the proof one finds in most texts (e.g., [60, 125]); an alternative proof using what is called Hintikka formulae is presented in Exercise 3.11. Some of the exercises for this chapter show that several classical theorems in model theory (not only compactness) fail over finite models. For this line of work, see Gurevich [116], Rosen [207], Rosen and Weinstein [208], Feder and Vardi [78]. Sources for exercises: Exercise 3.11: Ebbinghaus and Flum [60] Exercises 3.12 and 3.13: Gurevich [116] Exercise 3.14: Ebbinghaus and Flum [60] Exercise 3.17: Cook and Liu [41] Exercise 3.18: Pezzoli [199] 3.8 Exercises 41 3.8 Exercises Exercise 3.1. Use compactness to show that the following is not FO-expressible over finite structures in the vocabulary of one unary relation symbol U: for a structure A, both |UA | and |A − UA | are even. Exercise 3.2. Prove Lemma 3.4 for an arbitrary vocabulary. Exercise 3.3. Prove Corollary 3.11. Exercise 3.4. Using Ehrenfeucht-Fra¨ıss´e games, show that acyclicity of finite graphs is not FO-definable. Exercise 3.5. Same as in the previous exercise, for the following properties of finite graphs: 1. Planarity. 2. Hamiltonicity. 3. 2-colorability. 4. k-colorability for any k > 2. 5. Existence of a clique of size at least n/2, where n is the number of nodes. Exercise 3.6. We now consider a query closely related to even. Let σ be a vocabulary that includes a unary relation symbol U. We then define a Boolean query parityU as follows: a finite σ-structure A satisfies parityU iff |UA |= 0 (mod 2). Prove that if σ = {<, U}, where < is interpreted as a linear ordering on the universe, then parityU is not FO-definable. Exercise 3.7. Theorem 3.6 tells us that L1 ≡k L2 for two linear orders of length at least 2k . Is the bound 2k tight? If it is not, what is the tight bound? Exercise 3.8. Just as for linear orders, the following can be proved for Gn, the graph of successor relation on {1, . . . , n}. There is a function f : N → N such that Gn ≡k Gm whenever n, m ≥ f(k). Calculate f(k). Exercise 3.9. Consider sets of the form XΦ = {n ∈ N | Ln |= Φ}, where Φ is an FO sentence, and Ln is a linear order with n elements. Describe these sets. Exercise 3.10. Find an upper bound, in terms of k, on the number of rank-k types. Exercise 3.11. The goal of this exercise is to give another proof of the EhrenfeuchtFra¨ıss´e theorem. In this proof, one constructs formulae defining rank-k types explicitly, by specifying inductively a winning condition for the duplicator. Assume that σ is relational. For any σ-structure A and a ∈ Am , we define inductively formulae αk A,a(x1, . . . , xm) as follows: • α0 A,a(x) = V χ(x) where the conjunction is taken over all atomic or negated atomic χ such that A |= χ(a). Note that the conjunction is finite. 42 3 Ehrenfeucht-Fra¨ıss´e Games • Assuming αk ’s are defined, we define αk+1 A,a (x) = “ ^ c∈A ∃z αk A,ac(x, z) ” ∧ “ ∀z _ c∈A αk A,ac(x, z) ” . Prove that the following are equivalent: 1. (A, a) ≡k (B, b); 2. (A, a) ≃k (B, b); 3. for every ϕ(x) with qr(ϕ) ≤ k, we have A |= ϕ(a) iff B |= ϕ(b); 4. B |= αk A,a(b). Using this, prove the following statement. Let Q be a query definable in FO by a formula of quantifier rank k. Then Q is definable by the following formula: _ a∈Q(A) αk A,a(x). Note that the disjunction is finite, by Lemma 3.13. Exercise 3.12. Beth’s definability theorem is a classical result in mathematical logic: it says that a property is definable implicitly iff it is definable explicitly. Explicit definability of a k-ary query Q on σ-structures means that there is a formula ϕ(x1, . . . , xk) such that ϕ(A) = Q(A). Implicit definability means that there is a sentence Φ in the language of σ expanded with a single k-ary relation P such that for every σ-structure A, there exists a unique set P ⊆ Ak such that (A, P) |= Φ and P = Q(A). Prove that Beth’s theorem fails over finite models. Hint: P is a unary query that returns the set of even elements in a linear order. Exercise 3.13. Craig’s interpolation is another classical result from mathematical logic. Let σ1 , σ2 be two vocabularies, and σ = σ1 ∩ σ2 . Let Φi be a sentence over σi , i = 1, 2. Assume that Φ1 ⊢ Φ2 . Craig’s theorem says that there exists a sentence Φ over σ such that Φ1 ⊢ Φ and Φ ⊢ Φ2 . Using techniques similar to those in the previous exercise, prove that Craig’s interpolation fails over finite models. Exercise 3.14. This exercise demonstrates another example of a result from mathematical logic that fails over finite models. The Los-Tarski theorem says that a sentence which is preserved under extensions (that is, A ⊆ B and A |= Φ implies B |= Φ) is equivalent to an existential sentence: a sentence built from atomic and negated atomic formulae by using ∨, ∧, and ∃. Prove that the Los-Tarski theorem fails over finite models. Exercise 3.15. Winning strategies for complex structures can be composed from winning strategies for simpler structures. Two commonly used examples of such compositions are the subject of this exercise. Given two structures A, B of the same vocabulary σ, their Cartesian product A× B is defined as a σ-structure whose universe is A×B, each constant c is interpreted as a pair (cA , cB ), and each m-ary relation P is interpreted as {((a1, b1), . . . , (am, bm)) | (a1, . . . , am) ∈ PA , (b1, . . . , bm) ∈ PB }. If the vocabulary contains only relation symbols, the disjoint union A ‘ B for two structures with A ∩ B = ∅ has the universe A ∪ B, and each relation P is interpreted as PA ∪ PB . Assume A1 ≡k A2 and B1 ≡k B2. Show that: 3.8 Exercises 43 • A1 × B1 ≡k A2 × B2; • A1 ‘ B1 ≡k A2 ‘ B2. Exercise 3.16. The n×m grid is a graph whose set of nodes is {(i, j) | i ≤ n, j ≤ m} for some n, m ∈ N, and whose edges go from (i, j) to (i + 1, j) and to (i, j + 1). Use composition of Ehrenfeucht-Fra¨ıss´e games to show that there are no FO sentences testing if n = m (n > m) for the n × m grid. Exercise 3.17. Consider finite structures which are disjoint unions of finite linear orderings. Such structures occur in AI applications under the name of blocks world. Use Ehrenfeucht-Fra¨ıss´e games to show that the theory of such structures is decidable, and finitely axiomatizable. Exercise 3.18. Fix a relational vocabulary σ that has at least one unary and one ternary relation. Prove that the following is Pspace-complete. Given k, and two σ-structures A and B, is A ≡k B? What happens if k is fixed? Exercise 3.19.∗ A sentence Φ of vocabulary σ is called positive if no symbol from σ occurs under the scope of an odd number of negations in Φ. We say that a sentence Φ is preserved under surjective homomorphisms if A |= Φ and h(A) = B implies B |= Φ, where h : A → B is a homomorphism such that h(A) = B. Lyndon’s theorem says that if Φ is preserved under surjective homomorphisms (where A, B could be arbitrary structures), then Φ is equivalent to a positive sentence. Does Lyndon’s theorem hold in the finite? That is, if Φ is preserved under surjective homomorphisms over finite structures, is it the case that, over finite structures, Φ is equivalent to a positive sentence? 4 Locality and Winning Games Winning games becomes nontrivial even for fairly simple examples. But often we can avoid complicated combinatorial arguments, by using rather simple sufficient conditions that guarantee a winning strategy for the duplicator. For first-order logic, most such conditions are based on the idea of locality, best illustrated by the example in Fig. 4.1. Suppose we want to show that the transitive closure query is not expressible in FO. We assume, to the contrary, that it is definable by a formula ϕ(x, y), and then use the locality of FO to conclude that such a formula can only see up to some distance r from its free variables, where r is determined by ϕ. Then we take a successor relation A long enough so that the distance from a and b to each other and the endpoints is bigger than 2r – in that case, ϕ cannot see the difference between (a, b) and (b, a), but our assumption implies that A |= ϕ(a, b) ∧ ¬ϕ(b, a) since a precedes b. The goal of this chapter is to formalize this type of reasoning, and use it to provide winning strategies for the duplicator. Such strategies will help us find easy criteria for FO-definability. Throughout the chapter, we assume that the vocabulary σ is purely relational; that is, contains only relation symbols. All the results extend easily to the case of vocabularies that have constant symbols (see Exercise 4.1), but restricting to purely relational vocabularies often makes notations simpler. 4.1 Neighborhoods, Hanf-locality, and Gaifman-locality We start by defining neighborhoods that formalize the concept of “seeing up to distance r from the free variables”. Definition 4.1. Given a σ-structure A, its Gaifman graph, denoted by G(A), is defined as follows. The set of nodes of G(A) is A, the universe of A. There is an edge (a1, a2) in G(A) iff a1 = a2, or there is a relation R in σ such that for some tuple t ∈ RA , both a1, a2 occur in t. 46 4 Locality and Winning Games ... ... ... ... ... ... ... ...a b r r Fig. 4.1. A local formula cannot distinguish (a, b) from (b, a) Note that G(A) is an undirected graph. If A is an undirected graph to start with, then G(A) is simply A together with the diagonal {(a, a) | a ∈ A}. If A is a directed graph, then G(A) simply forgets about the orientation (and adds the diagonal as well). By the distance dA(x, y) we mean the distance in the Gaifman graph: that is, the length of the shortest path from x to y in G(A). If there is no such path, then dA(x, a) = ∞. It is easy to verify that the distance satisfies all the usual properties of a metric: dA(x, y) = 0 iff x = y, dA(x, y) = dA(y, x), and dA(x, z) ≤ dA(x, y) + dA(y, z), for all x, y, z. If we are given two tuples, a = (a1, . . . , an) and b = (b1, . . . , bm), and an element c, then dA(a, c) = min 1≤i≤n dA(ai, c), dA(a, b) = min 1≤i≤n, 1≤j≤m dA(ai, bj). Furthermore, ac stands for the n + 1-tuple (a1, . . . , an, c), and ab stands for the n + m-tuple (a1, . . . , an, b1, . . . , bm). Recall that we use the notation σn for σ expanded with n constant symbols. Definition 4.2. Let σ contain only relation symbols, and let A be a σstructure, and a = (a1, . . . , an) ∈ An . The radius r ball around a is the set BA r (a) = {b ∈ A | dA(a, b) ≤ r}. The r-neighborhood of a in A is the σn-structure NA r (a), where: • the universe is BA r (a); • each k-ary relation R is interpreted as RA restricted to BA r (a); that is, RA ∩ (BA r (a))k ; • n additional constants are interpreted as a1, . . . , an. Note that since we define a neighborhood around an n-tuple as a σnstructure, for any isomorphism h between two isomorphic neighborhoods NA r (a1, . . . , an) and NB r (b1, . . . , bn), it must be the case that h(ai) = bi, 1 ≤ i ≤ n. 4.1 Neighborhoods, Hanf-locality, and Gaifman-locality 47 Definition 4.3. Let A, B be σ-structures, where σ only contains relation symbols. Let a ∈ An and b ∈ Bn . We write (A, a) ⇆d (B, b) if there exists a bijection f : A → B such that for every c ∈ A, NA d (ac) ∼= NB d (bf(c)). We shall often deal with the case of n = 0; then A⇆dB means that for some bijection f : A → B, NA d (c) ∼= NB d (f(c)) for all c ∈ A. The ⇆d relation says, in a sense, that locally two structures look the same, with respect to a certain bijection f; that is, f sends each element c into f(c) that has the same neighborhood. The lemma below summarizes some properties of this relation: Lemma 4.4. 1. (A, a)⇆d(B, b) ⇒|A|=|B |. 2. (A, a)⇆d(B, b) ⇒ (A, a)⇆d′ (B, b), for d′ ≤ d. 3. (A, a)⇆d(B, b) ⇒ NA d (a) ∼= NB d (b). Recall that a neighborhood of an n-tuple is a σn-structure. By an isomorphism type of such structures we mean an equivalence class of ∼= on STRUCT[σn]. We shall use the letter τ (with sub- and superscripts) to denote isomorphism types. Instead of saying that a structure belongs to τ, we shall say that it is of the isomorphism type τ. If τ is an isomorphism type of σn-structures, and a ∈ An , we say that a d-realizes τ in A if NA d (a) is of type τ. If d is understood from the context, we say that a realizes τ. The following is now easily proved from the definition of the ⇆d relation. Lemma 4.5. Let A, B ∈ STRUCT[σ]. Then A ⇆d B iff for each isomorphism type τ of σ1-structures, the number of elements of A and B that drealize τ is the same. We now formulate the first locality criterion. Definition 4.6 (Hanf-locality). An m-ary query Q on σ-structures is Hanflocal if there exists a number d ≥ 0 such that for every A, B ∈ STRUCT[σ], a ∈ Am , b ∈ Bm , (A, a) ⇆d (B, b) implies a ∈ Q(A) ⇔ b ∈ Q(B) . The smallest d for which the above condition holds is called the Hanf-locality rank of Q and is denoted by hlr(Q). 48 4 Locality and Winning Games . . . . . . one cycle of length 2m G2 m two cycles of length m G1 m . . . .. . . . Fig. 4.2. Connectivity is not Hanf-local Most commonly Hanf-locality is used for Boolean queries; then the definition says that for some d ≥ 0, for every A, B ∈ STRUCT[σ], the condition A ⇆d B implies that A and B agree on Q. Using Hanf-locality for proving that a query Q is not definable in a logic L then amounts to showing: • that every L-definable query is Hanf-local, and • that Q is not Hanf-local. We now give the canonical example of using Hanf-locality. We show, by a very simple argument, that graph connectivity is not Hanf-local; it will then follow that graph connectivity is not expressible in any logic that only defines Hanf-local Boolean queries. Assume to the contrary that the graph connectivity query Q is Hanf-local, and hlr(Q) = d. Let m > 2d + 1, and choose two graphs G1 m and G2 m as shown in Fig. 4.2. Their sets of nodes have the same cardinality. Let f be an arbitrary bijection between the nodes of G1 m and G2 m. Since each cycle is of length > 2d + 1, the d-neighborhood of any node a is the same: it is a chain of length 2d with a in the middle. Hence, G1 m ⇆d G2 m, and they must agree on Q, but G2 m is connected, and G1 m is not. Thus, graph connectivity is not Hanf-local. While Hanf-locality works well for Boolean queries, a different notion is often helpful for m-ary queries, m > 0. Definition 4.7 (Gaifman-locality). An m-ary query Q, m > 0, on σstructures, is called Gaifman-local if there exists a number d ≥ 0 such that for every σ-structure A and every a1, a2 ∈ Am , 4.2 Combinatorics of Neighborhoods 49 NA d (a1) ∼= NA d (a2) implies a1 ∈ Q(A) ⇔ a2 ∈ Q(A) . The minimum d for which the above condition holds is called the locality rank of Q, and is denoted by lr(Q). Note the difference between Hanf- and Gaifman-locality: the former relates two different structures, while the latter is talking about definability in one structure. The methodology for proving inexpressibility of queries using Gaifmanlocality is then as follows: • first we show that all m-ary queries, m > 0, definable in a logic L are Gaifman-local, • then we show that a given query Q is not Gaifman-local. We shall see many examples of logics that define only Gaifman-local queries. At this point, we give a typical example of a query that is not Gaifman-local. The query is transitive closure, and we already saw that it is not Gaifman-local. Recall Fig. 4.1. Assume that the transitive closure query Q is Gaifman-local, and let lr(Q) = r. If a, b are at a distance > 2r +1 from each other and the start and the endpoints, then the r-neighborhoods of (a, b) and (b, a) are isomorphic, since each is a disjoint union of two chains of length 2r. We know that (a, b) belongs to the output of Q; hence by Gaifman-locality, (b, a) is in the output as well, which contradicts the assumption that Q defines transitive closure. These examples demonstrate that locality tools are rather easy to use to obtain inexpressibility results. Our goal now is to show that FO-definable queries are both Hanf-local and Gaifman-local. 4.2 Combinatorics of Neighborhoods The main technical tool for proving locality is combinatorial reasoning about neighborhoods. We start by presenting simple properties of neighborhoods; proofs are left as an exercise for the reader. Lemma 4.8. • Assume that A, B ∈ STRUCT[σ] and h : NA r (a) → NB r (b) is an isomorphism. Let d ≤ r. Then h restricted to BA d (a) is an isomorphism between NA d (a) and NB d (b). • Assume that A, B ∈ STRUCT[σ] and h : NA r (a) → NB r (b) is an isomorphism. Let d + l ≤ r and x be a tuple from BA l (a). Then h(BA d (x)) = BB d (h(x)), and NA d (x) and NB d (h(x)) are isomorphic. • Let A, B ∈ STRUCT[σ] and let a1 ∈ An , b1 ∈ Bn for n ≥ 1, and a2 ∈ Am , b2 ∈ Bm for m ≥ 1. Assume that NA r (a1) ∼= NB r (b1), NA r (a2) ∼= NB r (b2), and dA(a1, a2), dB(b1, b2) > 2r+1. Then NA r (a1a2) ∼= NB r (b1b2). 50 4 Locality and Winning Games From now on, we shall use the notation a ≈A,B r b for NA r (a) ∼= NB r (b), omitting A and B when they are understood. We shall also write d(·, ·) instead of dA(·, ·) when A is understood. The main technical result of this section is the lemma below. Lemma 4.9. If A⇆dB and a ≈A,B 3d+1 b, then (A, a)⇆d(B, b). Proof. We need to define a bijection f : A → B such that ac ≈A,B d bf(c) for every c ∈ A. Since a ≈A,B 3d+1 b, there is an isomorphism h : NA 3d+1(a) → NB 3d+1(b). Then the restriction of h to BA 2d+1(a) is an isomorphism between NA 2d+1(a) and NB 2d+1(b). Since |A|=|B |, we obtain |A − BA 2d+1(a)| = |B − BB 2d+1(b)| . Now consider an arbitrary isomorphism type τ of a d-neighborhood of a single point. Assume that c ∈ BA 2d+1(a) realizes τ in A. Since h is an isomorphism of 3d + 1-neighborhoods, BA d (c) ⊆ BA 3d+1(a) and thus h(c) ∈ BB 2d+1(b) realizes τ. Similarly, if c ∈ BB 2d+1(b) realizes τ, then so does h−1 (c) ∈ BA 2d+1(a). Hence, the number of elements in BA 2d+1(a) and BB 2d+1(b) that realize τ is the same. Since A⇆dB, the number of elements of A and of B that realize τ is the same. Therefore, |{a ∈ A − BA 2d+1(a) | a d-realizes τ}| = |{b ∈ B − BB 2d+1(b) | b d-realizes τ}| (4.1) for every τ. Using (4.1), we can find a bijection g : A − BA 2d+1(a) → B − BB 2d+1(b) such that c ≈d g(c) for every c ∈ A − BA 2d+1(a). We now define f by f(c) = h(c) if c ∈ BA 2d+1(a) g(c) if c ∈ BA 2d+1(a). It is clear that f is a bijection A → B. We claim that ac ≈d bf(c) for every c ∈ A. This is illustrated in Fig. 4.3. If c ∈ BA 2d+1(a), then BA d (c) ⊆ BA 3d+1(a), and ac ≈d bh(c) because h is an isomorphism. If c ∈ BA 2d+1(a), then f(c) = g(c) ∈ BB 2d+1(b), and c ≈d g(c). Since d(c, a), d(g(c), b) > 2d + 1, by Lemma 4.8, ac ≈d bg(c). The following corollary is very useful in establishing locality of logics. Corollary 4.10. If (A, a)⇆3d+1(B, b), then there exists a bijection f : A → B such that ∀c ∈ A (A, ac) ⇆d (B, bf(c)). 4.3 Locality of FO 51 A A B B 3d + 1 2d + 1 d h 3d + 1 2d + 1 d d g a a b b d d d Fig. 4.3. Illustration of the proof of Lemma 4.9 Proof. By the definition of the ⇆ relation, there exists a bijection f : A → B, such that for any c ∈ A, ac ≈A,B 3d+1 bf(c). Since A⇆3d+1B, we have A⇆dB. By Lemma 4.9, (A, ac)⇆d(B, bf(c)). 4.3 Locality of FO We now show that FO-definable queries are both Hanf-local and Gaifmanlocal. In fact, it suffices to prove the former, due to the following result. Theorem 4.11. If Q is a Hanf-local non-Boolean query, then Q is Gaifmanlocal, and lr(Q) ≤ 3 · hlr(Q) + 1. Proof. Suppose Q is an m-ary query on STRUCT[σ], m > 0, and hlr(Q) = d. Let A be a σ-structure, and let a1 ≈A 3d+1 a2. Since A⇆dA, by Lemma 4.9, 52 4 Locality and Winning Games (A, a1) ⇆d (A, a2), and hence a1 ∈ Q(A) iff a2 ∈ Q(A), which proves lr(Q) ≤ 3d + 1. Theorem 4.12. Every FO-definable query Q is Hanf-local. Moreover, if Q is defined by an FO[k] formula (that is, an FO formula whose quantifier rank is at most k), then hlr(Q) ≤ 3k − 1 2 . Proof. By induction on the quantifier rank. If k = 0, then (A, a)⇆0(B, b) means that (a, b) defines a partial isomorphism between A and B, and thus a and b satisfy the same atomic formulas. Hence hlr(Q) = 0, if Q is defined by an FO[0] formula. Suppose Q is defined by a formula of quantifier rank k +1. Such a formula is a Boolean combination of formulae of the form ∃zϕ(x, z) where qr(ϕ) ≤ k. Note that it follows immediately from the definition of Hanf-locality that if ψ is a Boolean combination of ψ1, . . . , ψl, and for all i ≤ l, hlr(ψi) ≤ d, then hlr(ψ) ≤ d. Thus, it suffices to prove that the Hanf-locality rank of the query defined by ∃zϕ is at most 3d + 1, where d is the Hanf-locality rank of the query defined by ϕ. To see this, let (A, a) ⇆3d+1 (B, b). By Corollary 4.10, we find a bijection f : A → B such that (A, ac) ⇆d (B, bf(c)) for every c ∈ A. Since hlr(ϕ) = d, we have A |= ϕ(a, c) iff B |= ϕ(b, f(c)). Hence, A |= ∃z ϕ(a, z) ⇒ A |= ϕ(a, c) for some c ∈ A ⇒ B |= ϕ(b, f(c)) ⇒ B |= ∃z ϕ(b, z). The same proof shows B |= ∃z ϕ(b, z) implies A |= ∃z ϕ(a, z). Thus, a and b agree on the query defined by ∃zϕ(x, z), which completes the proof. Combining Theorems 4.11 and 4.12, we obtain: Corollary 4.13. Every FO-definable m-ary query Q, m > 0, is Gaifmanlocal. Moreover, if Q is definable by an FO[k] formula, then lr(Q) ≤ 3k+1 − 1 2 . Since we know that graph connectivity is not Hanf-local and transitive closure is not Gaifman-local, we immediately obtain, without using games, that these queries are not FO-definable. We can give rather easy inexpressibility proofs for many queries. Below, we provide two examples. 4.3 Locality of FO 53 d d d d d−1 d−1 d d+1 Fig. 4.4. Balanced binary trees are not FO-definable Balanced Binary Trees This example was mentioned at the end of Chap. 3. Suppose we are given a graph, and we want to test if it is a balanced binary tree. We now sketch the proof of inexpressibility of this query in FO; details are left as an exercise for the reader. Suppose a test for being a balanced binary tree is definable in FO, say by a sentence Φ of quantifier rank k. Then we know that it is a Hanf-local query, with Hanf-locality rank at most r = (3k − 1)/2. Choose d to be much larger than r, and consider two trees shown in Fig. 4.4. In the first tree, denoted by T1, the subtrees hanging at all four nodes on the second level are balanced binary trees of depth d; in the second tree, denoted by T2, they are balanced binary trees of depths d − 1, d − 1, d, and d + 1. We claim that T1 ⇆r T2 holds. First, notice that the number of nodes and the number of leaves in T1 and T2 is the same. If d is sufficiently large, these trees realize the following isomorphism types of neighborhoods: • isomorphism types of r-neighborhoods of nodes a at a distance m from the root, m ≤ r; • isomorphism types of r-neighborhoods of nodes a at a distance m from a leaf, m ≤ r; • the isomorphism type of the r-neighborhood of a node a at a distance > r from both the root and all the leaves. Since the number of leaves and the number of nodes are the same, it is easy to see that each type of an r-neighborhood has the same number of nodes realizing it in both T1 and T2, and hence T1 ⇆r T2. But this contradicts Hanf-locality of the balanced binary tree test, since T1 is balanced, and T2 is not. Same Generation The query we consider now is same generation: given a graph, two nodes a and b are in the same generation if there is a node c (common ancestor) such 54 4 Locality and Winning Games r ... ... ... a0 a1 ad b0 b1 bd bd+1 b2d+1 Fig. 4.5. Inexpressibility of same generation that the shortest paths from c to a and from c to b have the same length. This query is most commonly computed on trees; in this case a, b are in the same generation if they are at the same distance from the root. We now give a very simple proof that the same-generation query Qsg is not FO-definable. Assume to the contrary that it is FO-definable, and lr(Qsg) = d. Consider a tree T with root r and two branches, one with nodes a0, a1, . . . , ad (where ai+1 is the successor of ai) and the other one with nodes b0, b1, . . . , bd, . . . , b2d+1, see Fig. 4.5. It is clear that (ad, bd) ≈T d (ad, bd+1), while ad, bd are in the same generation, and ad, bd+1 are not. In most examples seen so far, locality ranks (for either Hanf- or Gaifmanlocality) were exponential in the quantifier rank. We now show a simple exponential lower bound for the locality rank; precise bounds will be given in Exercise 4.11. Suppose that σ is the vocabulary of undirected graphs: that is, σ = {E} where E is binary. Define the following formulae: • d0(x, y) ≡ E(x, y), • d1(x, y) ≡ ∃z (d0(x, z) ∧ d0(y, z)), . . ., • dk+1(x, y) ≡ ∃z(dk(x, z) ∧ dk(y, z)). For an undirected graph, dk(a, b) holds iff there is a path of length 2k between a and b; that is, if the distance between a and b is at most 2k . Hence, lr(dk) ≥ 2k−1 . However, qr(dk) = k, which shows that locality rank can be exponential in the quantifier rank. 4.4 Structures of Small Degree In this section, we shall see a large class of structures for which very simple criteria for FO-definability can be obtained. These are structures in which all the degrees are bounded by a constant. If we deal with undirected graphs, degrees are the usual degrees of nodes; if we deal with directed graphs, they are in- and out-degrees. In general, we use the following definition. 4.4 Structures of Small Degree 55 Definition 4.14. Let σ be a relational vocabulary, R an m-ary symbol in σ, and A ∈ STRUCT[σ]. For a ∈ A and i ≤ m, define degreeA R,i(a) as the cardinality of the set {(a1, . . . , ai−1, a, ai+1, . . . , am) ∈ RA | a1, . . . , ai−1, ai+1, . . . , am ∈ A}. That is, degreeA R,i(a) is the number of tuples in RA that have a in the ith position. Define deg set(A) to be the set of all the numbers of the form degreeA R,i(a), where a ∈ A, R ∈ σ, and i is at most the arity of R. That is, deg set(A) = {degreeA R,i(a) | a ∈ A, R ∈ σ, i ≤ arity(R)}. Finally, STRUCTl[σ] stands for {A ∈ STRUCT[σ] | deg set(A) ⊆ {0, . . . , l}}. In other words, STRUCTl[σ] consists of σ-structures in which all degrees do not exceed l. We shall also be applying deg set to outputs of queries: by deg set(Q(A)), for an m-ary query Q, we mean the set of all degrees realized in the structure whose only m-ary relation is Q(A); that is, deg set( A, Q(A) ). When we talk of structures of small degree, we mean STRUCTl[σ] for some fixed l ∈ N. There is another way of defining structures of small degree, essentially equivalent to the way we use here. Instead of defining degrees for m-ary relations, one can use only the definition of degrees for nodes of an undirected graph, and define structures of small degrees as structures A where deg set(G(A)) ⊆ {0, . . ., l} for some l ∈ N. Recall that G(A) is the Gaifman graph of A, so in this case we are talking about the usual degrees in a graph. However, this is essentially the same as the definition of STRUCTl[σ]. Lemma 4.15. For every relational vocabulary σ, there exist two functions fσ, gσ : N → N such that 1. deg set(G(A)) ⊆ {0, . . ., fσ(l)} for every A ∈ STRUCTl[σ], and 2. A ∈ STRUCTgσ(l)[σ] for every A with deg set(G(A)) ⊆ {0, . . . , l}. One reason to study structures of small degrees is that many queries behave particularly nicely on them. We capture this notion of nice behavior by the following definition. Definition 4.16. Let σ be relational. An m-ary query Q on σ-structures, m > 0, has the bounded number of degrees property (BNDP) if there exists a function fQ : N → N such that for every l ≥ 0 and every A ∈ STRUCTl[σ], |deg set(Q(A))| ≤ fQ(l). 56 4 Locality and Winning Games Notice a certain asymmetry of this definition: our assumption is that all the numbers in deg set(A) are small, but the conclusion is that the cardinality of deg set(Q(A)) is small. We cannot possibly ask for all the numbers in deg set(Q(A)) to be small and still say anything interesting about FO-definable queries: consider, for example, the query defined by ϕ(y, z) ≡ ∃x(x = x). On every structure A with | A |= n > 0, it defines the complete graph on n nodes, where every node has the same degree n. Hence, some degrees in deg set(Q(A)) do depend on A, but the number of different degrees is determined by deg set(A) and the query. It is usually very easy to show that a query does not have the BNDP. Consider, for example, the transitive closure query. Assume that its input is a successor relation Gn on n nodes. Then deg set(Gn) = {0, 1}. The transitive closure of Gn is a linear order Ln on n nodes, and deg set(Ln) = {0, . . ., n−1}, showing that the transitive closure query does not have the BNDP. We next show that the BNDP is closely related to locality concepts. Theorem 4.17. Let Q be a Gaifman-local m-ary query, m > 0. Then Q has the BNDP. Proof. Let Q be Gaifman-local with lr(Q) = d. We assume, without loss of generality, that m ≥ 2, since unary queries clearly have the BNDP. Next, we need the following claim. Let nd(k) be defined inductively by nd(0) = d, nd(k + 1) = 3 · nd(k) + 1. That is, nd(k) = 3k · d + (3k − 1)/2 for k ≥ 0. Claim 4.18. Let a ≈A nd(k) b. Then there is a bijection f : Ak → Ak such that ac ≈A d bf(c) for every c ∈ Ak . The proof of Claim 4.18 is by induction on k. For k = 0 there is nothing to prove. Assume that it holds for k, and prove it for k + 1. Let r = nd(k); then nd(k+1) = 3r+1. Let a ≈A 3r+1 b. Then, by Lemma 4.9, (A, a) ⇆r (A, b). That is, there exists a bijection g : A → A such that for every c ∈ A, ac ≈A r bg(c). By the induction hypothesis, we then know that for each c ∈ A, there exists a bijection gc : Ak → Ak such that for every e ∈ Ak , ace ≈A d bg(c)gc(e). We thus define a bijection f : Ak+1 → Ak+1 as follows: if c = ce, where e ∈ Ak , then f(c) = g(c)gc(e). Clearly, ac ≈A d bf(c). This proves the claim. Now we prove the BNDP. First, note that for every vocabulary σ, there exists a function Gσ : N × N → N such that for every A ∈ STRUCTl[σ], the size of BA d (a) is at most Gσ(l, d). Thus, there exists a function Fσ : N × N → N such that every structure A in STRUCTl[σ] can realize at most Fσ(l, d) isomorphism types of d-neighborhoods of a point. Now consider Q(A), for A ∈ STRUCTl[σ], and note that for any two a, b ∈ A with a ≈A nd(m−1) b, 4.5 Locality of FO Revisited 57 |{c ∈ Am−1 | ac ∈ Q(A)}| = |{c ∈ Am−1 | bc ∈ Q(A)}|, (4.2) by Claim 4.18. In particular, (4.2) implies that the degrees of a and b in Q(A) (in the first position of an m-tuple) are the same. This is because degree Q(A) 1 (c), the degree of an element c, corresponding to the first position of the m-ary relation Q(A), is precisely the cardinality of the set {c ∈ Am−1 | cc ∈ Q(A)}. Thus, the number of different degrees in Q(A) corresponding to the first position in the m-tuple is at most Fσ(l, nd(m − 1)), and hence |deg set(Q(A))| ≤ m · Fσ(l, nd(m − 1)). (4.3) Since the upper bound in (4.3) depends on l, m, d, and σ only, this proves the BNDP. Corollary 4.19. Every FO-definable query has the BNDP. Balanced Binary Trees Revisited We now revisit the balanced binary tree test, and give a simple proof of its inexpressibility in FO. In fact, we show that this test is inexpressible even if it is restricted to binary trees. That is, there is no FO-definable Boolean query Qbbt such that, for a binary tree T , the output Qbbt(T ) is true iff T is balanced. Assume, to the contrary, that such a query is FO-definable. We now construct a binary FO-definable query Q which fails the BNDP – this would contradict Corollary 4.19. The new query Q works as follows. It takes as an input a binary tree T , and for every two nonleaf nodes a, b finds their successors a′ , a′′ and b′ , b′′ . It then constructs a new tree Ta,b by removing the edges from a to a′ , a′′ and from b to b′ , b′′ , and instead by adding the edges from a to b′ , b′′ and from b to a′ , a′′ . It then puts (a, b) in the output if Qbbt(Ta,b) is true (see Fig. 4.6). Clearly, Q is FO-definable, if Qbbt is. Assume that T itself is a balanced binary tree; that is a structure in STRUCT2[σ]. Then for two nonleaf nodes a, b, the pair (a, b) is in Q(T ) iff a, b are at the same distance from the root. Hence, for a balanced binary tree T of depth n, the graph Q(T ) is a disjoint union of n − 1 cliques of different sizes, and thus | deg set(Q(T )) |= n − 1. Hence, Q fails the BNDP, which proves that Qbbt is not FO-definable. 4.5 Locality of FO Revisited In this section, we start by analyzing the proof of Hanf-locality of FO, and discover that it establishes a stronger statement than that of Theorem 4.12. We characterize a new notion of expressibility via a stronger version of Ehrenfeucht-Fra¨ıss´e games, which will later be used to prove bounds on logics 58 4 Locality and Winning Games a b a′ a′′ b′ b′′ Fig. 4.6. Changing successors of nodes in a balanced binary tree with counting quantifiers. The question that we ask then is: are there more precise and restrictive locality criteria that can be stated for FO? The answer to this is positive, and we shall present two such results: Gaifman’s theorem, and the threshold equivalence criterion. First, we show how to avoid the restriction that no constant symbols occur in σ; that is, we extend the notions of the r-ball and r-neighborhood to the case of arbitrary relational vocabularies σ (vocabularies without function symbols). Let c = (c1, . . . , cn) list all the constant symbols of σ. Then BA r (a) = {b ∈ A | dA(b, a) ≤ r or dA(b, cA ) ≤ r}. The r-neighborhood of a, with |a|= m, is defined as the structure NA r (a) in the vocabulary σm (σ extended with m constants), whose universe is BA r (a), the interpretations of σ-relations and constants are inherited from A, and the m extra constants are interpreted as a. One can check that all the results proved so far extend to the setting that allows constants (see Exercise 4.1). From now on, we apply all the locality concepts to relational vocabularies. We can also use the notion of locality to state when A ≡0 B; that is, when the duplicator wins the Ehrenfeucht-Fra¨ıss´e game on A and B without even starting. This happens if and only if (∅, ∅) is a partial isomorphism, or, equivalently, NA 0 (∅) ∼= NB 0 (∅). We now define a new equivalence relation ≃bij k as follows. • A ≃bij 0 B if A ≡0 B; 4.5 Locality of FO Revisited 59 • A ≃bij k+1 B if there is a bijection f : A → B such that forth: for each a ∈ A, we have (A, a) ≃bij k (B, f(a)); back: for each b ∈ B, we have (A, f−1 (b)) ≃bij k (B, b). One can easily see that just one of forth and back suffices: that is, forth and back are equivalent, since f is a bijection. The notion of the back-and-forth described in Sect. 3.5 was equivalent to the Ehrenfeucht-Fra¨ıss´e game. We can also describe the new notion of back-and-forth as a game, called a bijective Ehrenfeucht-Fra¨ıss´e game (or just bijective game). Let A and B be two structures in a relational vocabulary. The k-round bijective game is played by the same two players, the spoiler and the duplicator. If |A| = |B |, then the duplicator loses before the game even starts. In the ith round, the duplicator first selects a bijection fi : A → B. Then the spoiler moves in exactly the same way as in the Ehrenfeucht-Fra¨ıss´e game: that is, he plays either ai ∈ A or bi ∈ B. The duplicator responds by either f(ai) or f−1 (bi). As in the Ehrenfeucht-Fra¨ıss´e game, the duplicator wins if, after k rounds, the moves (a, b) form a winning position: that is, (a, cA ) and (b, cB ) are a partial isomorphism between A and B. If the duplicator has a winning strategy in the k-round bijective game on A and B, we write A ≡bij k B. Clearly, it is harder for the duplicator to win the bijective game; that is, A ≡bij k B implies A ≡k B. In the bijective game, the duplicator does not simply come up with responses to all the possible moves by the spoiler, but he has to establish a one-to-one correspondence between the spoiler’s moves and his responses. The following is immediate from the definitions. Lemma 4.20. A ≃bij k B iff A ≡bij k B. By Corollary 4.10, (A, u)⇆3d+1(B, v) implies the existence of a bijection f : A → B such that (A, uc) ⇆d (B, vf(c)) for all c ∈ A. Since the winning condition in the bijective game is that NA 0 (a) ∼= NB 0 (b), where a and b are the moves of the game on A and B, by induction on k we conclude: Corollary 4.21. If (A, a) ⇆(3k−1)/2 (B, b), then (A, a) ≡bij k (B, b). Bijective games, as will be seen, characterize the expressive power of a certain logic. Since the bijective game is harder to win for the duplicator than the ordinary Ehrenfeucht-Fra¨ıss´e game, such a logic must be more expressive than FO. Hence, the tool of Hanf-locality will be applicable to a certain extension of FO. We shall see how it works when we discuss logics with counting in Chap. 8. Since the most general locality-based bounds apply to more restricted games than the ordinary Ehrenfeucht-Fra¨ıss´e games, and hence to more expressive logics, it is natural to ask whether more specific locality criteria can be stated for FO. We now present two such criteria. 60 4 Locality and Winning Games We start with Gaifman’s theorem. First, a few observations are needed. If σ is a relational vocabulary, and m is the maximum arity of a relation symbol in it, m ≥ 2, then the Gaifman graph G(A) is definable by a formula of quantifier rank m − 2. (Note that for the case of unary relations, the Gaifman graph is simply {(a, a) | a ∈ A} and hence is definable by the formula x = y.) We show this for the case of a single ternary relation R; a general proof should be obvious. The Gaifman graph is then defined by the formula (x = y) ∨ ∃z R(x, y, z) ∨ R(x, z, y) ∨ R(y, x, z) ∨ R(y, z, x) ∨ R(z, x, y) ∨ R(z, y, x) . Since the Gaifman graph is FO-definable, so is the r-ball of any tuple x. That is, for any fixed r, there is a formula d≤r (y, x) such that A |= d≤r (b, a) iff dA(b, a) ≤ r. Similarly, there are formulae d=r and d>r . We can next define local quantification ∃y ∈ Br(x) ϕ ∀y ∈ Br(x) ϕ simply as abbreviations: ∃y ∈ Br(x) ϕ stands for ∃y d≤r (y, x) ∧ ϕ , and ∀y ∈ Br(x) ϕ stands for ∀y d≤r (y, x) → ϕ . For a fixed r, we say that a formula ψ(x) is r-local around x, and write this as ψ(r) (x), if all quantification in ψ is of the form ∃y ∈ Br(x) or ∀y ∈ Br(x). Theorem 4.22 (Gaifman). Let σ be relational. Then every FO formula ϕ(x) over σ is equivalent to a Boolean combination of the following: • local formulae ψ(r) (x) around x; • sentences of the form ∃x1, . . . , xs s i=1 α(r) (xi) ∧ 1≤i2r (xi, xj) . Furthermore, • the transformation from ϕ to such a Boolean combination is effective; • if ϕ itself is a sentence, then only sentences of the above form appear in the Boolean combination; • if qr(ϕ) = k, and n is the length of x, then the bounds on r and s are r ≤ 7k , s ≤ k + n. Notice that Gaifman-locality of FO is an immediate corollary of Gaifman’s theorem (hence the name). However, the proof we presented earlier is much simpler than the proof of Gaifman’s theorem (Exercise 4.9), and the bounds obtained are better. 4.5 Locality of FO Revisited 61 Thus, Gaifman-locality can be strengthened for the case of FO formulae. Then what about Hanf-locality? The answer, as it turns out, is positive, if one’s attention is restricted to structures in which degrees are bounded. We start with the following definition. Definition 4.23 (Threshold equivalence). Given two structures A, B in a relational vocabulary, we write A ⇆thr d,m B if for every isomorphism type τ of a d-neighborhood of a point either • both A and B have the same number of points that d-realize τ, or • both A and B have at least m points that d-realize τ. Thus, if m were allowed to be infinity, A ⇆thr d,∞ B would be the usual definition of A ⇆d B. In the new definition, however, we are only interested in the number of elements that d-realize a type of neighborhood up to a threshold: below the threshold, the numbers must be the same, but above it, they do not have to be. Theorem 4.24. For each k, l > 0, there exist d, m > 0 such that for A, B ∈ STRUCTl[σ], A ⇆thr d,m B implies A ≡k B. Proof. The proof is very similar to the proof of Hanf-locality of FO. We define inductively r0 = 0, ri+1 = 3ri+1, take d = rk−1, and prove that the duplicator can play the Ehrenfeucht-Fra¨ıss´e game on A and B in such a way that after i rounds (or: with k − i rounds remaining), NA rk−i (ai) ∼= NB rk−i (bi), (4.4) where ai, bi are points played in the first i rounds of the game. It only remains to specify m. Recall from the proof of Theorem 4.17 that there is a function Gσ : N × N such that the maximum size of a radius d neighborhood of a point in a structure in STRUCTl[σ] is Gσ(d, l). We take m to be k · Gσ(rk, l). The rest is by induction on i. For the first move, suppose the spoiler plays a ∈ A. By A ⇆thr rk,m B, the duplicator can find b ∈ B with NA rk (a) ∼= NB rk (b). Now assume (4.4) holds after i rounds. That is, NA 3r+1(ai) ∼= NB 3r+1(bi), where r = rk−(i+1). We have to show that (4.4) holds after i + 1 rounds (i.e., with k − (i + 1) rounds remaining). Suppose in round i + 1 the spoiler plays a ∈ A (the case of a move in B is identical). If a ∈ BA 2r+1(ai), the response is by the isomorphism between NA 3r+1(ai) and NB 3r+1(bi), which guarantees (4.4). If a ∈ BA 2r+1(ai), let τ be the isomorphism type of the r-neighborhood of a. To ensure (4.4), all we need is to find b ∈ B such that b r-realizes τ in B, and dB(b, bi) > 2r + 1 – then such an element b would be the response of the duplicator. 62 4 Locality and Winning Games Assume that there is no such element b. Since there is an element a ∈ A that r-realizes τ in A, there must be an element b′ ∈ B that r-realizes τ in B. Then all such elements b′ must be in NB 2r+1(bi). Let there be s of them. Notice that the cardinality of NB 2r+1(bi) does not exceed m = k · Gσ(rk, l). This is because the length of bi is at most k, the size of each rk neighborhood is at most Gσ(rk, l), and 2r + 1 ≤ rk. Therefore, s ≤ m, and from A⇆thr d,mB we see that there are exactly s elements a′ ∈ A that r-realize τ in A. But by the isomorphism between NA 3r+1(ai) and NB 3r+1(bi) we know that NA 2r+1(ai) alone contains s such elements, and hence there are at least s + 1 of them in A. This contradiction shows that we can find b that r-realizes τ in B outside of NB 2r+1(bi), which completes the proof of (4.4) and the theorem. The threshold equivalence is a useful tool when in the course of proving inexpressibility of a certain property, one constructs pairs of structures Ak, Bk whose universes have different cardinalities: then Hanf-locality is inapplicable. For example, consider the following query over graphs. Suppose the input graph is a simple cycle with loops on some nodes (i.e., it has edges (a1, a2), (a2, a3), . . . , (an−1, an), (an, a1), with all ais distinct, as well as some edges of the form (ai, ai)). The question is whether the number of loops is even. An attempt to prove that it is not FO-definable using Hanf-locality does not succeed: for any d > 0, and any two structures A, B with A ⇆d B, the numbers of nodes with loops in A and B are equal. However, the threshold equivalence helps us. Assume that the above query Q is expressible by a sentence of quantifier rank k. Then apply Theorem 4.24 to k and 2 (the maximum degree in graphs described above), and find d, m > 0. We now construct a graph Gd,n for any n > 0, as a cycle on which the distance between any two consecutive nodes with loops is 2d+2, and the number of such nodes with loops is n. One can then easily check that Gd,m+1 ⇆thr d,m Gd,m+2 and hence the two must agree on Q. This is certainly impossible, showing that Q is not FO-definable. Note that in this example, Gd,m+1 ⇆r Gd,m+2 for any r > 0, since the cardinalities of Gd,m+1 and Gd,m+2 are different, and hence Hanf-locality is not applicable. 4.6 Bibliographic Notes The first locality result for FO was Hanf’s theorem, formulated in 1965 by Hanf [120] for infinite models. The version for the finite case was presented by Fagin, Stockmeyer, and Vardi in [76]. In fact, [76] proves what we call the threshold equivalence for FO, and what we call Hanf-locality is stated as a corollary. 4.7 Exercises 63 Gaifman’s theorem is from [88]; Gaifman-locality, inspired by it, was introduced by Hella, Libkin, and Nurmonen [123], who also proved Theorem 4.11. The proof of Hanf-locality for FO follows Libkin [167]. The bounded number of degrees property (BNDP) is from Libkin and Wong [169] (where it was called BDP, and proved only for FO-definable queries over graphs). Dong, Libkin and Wong [57] showed that every Gaifman-local query has the BNDP, and a simpler proof was given by Libkin [166]. Bijective games were introduced by Hella [121], and the connection between them and Hanf-locality is due to Nurmonen [188]; the presentation here follows [123]. Sources for exercises: Exercise 4.9: Gaifman [88] Exercises 4.10, 4.11, and 4.12: Libkin [166] Exercise 4.13: Dong, Libkin, and Wong [57] Exercise 4.14: Schwentick and Barthelmann [217] Exercise 4.15: Schwentick [215] 4.7 Exercises Exercise 4.1. Verify that all the results in Sects. 4.1–4.4 extend to vocabularies with constant symbols. Exercise 4.2. Prove Lemma 4.4. Exercise 4.3. Prove Lemma 4.5. Exercise 4.4. Prove Lemma 4.8. Exercise 4.5. Prove Lemma 4.15. Exercise 4.6. Use Hanf-locality to give a simple proof that graph acyclicity and testing if a graph is a tree are not FO-definable. Exercise 4.7. Consider colored graphs: that is, structures of vocabulary {E, U1, . . . , Uk} where E is binary and U1, . . . , Uk are unary (i.e., Ui defines the set of nodes of color i). Prove that neither connectivity nor transitive closure are FO-definable over colored graphs. Exercise 4.8. Provide a complete proof that testing if a binary tree is balanced is not FO-definable. Exercise 4.9. Prove Theorem 4.22. Exercise 4.10. In all the proofs in this chapter we obtained bounds on locality ranks of the order O(3k ), where k is the quantifier rank. And yet the exponential lower bound was O(2k ). The goal of this exercise is to reduce the upper bound from O(3k ) to O(2k ), at the expense of a slightly more complicated proof. 64 4 Locality and Winning Games Let x = (x1, . . . , xn), and let I = {I1, . . . , Im} be a partition of {1, . . . , n}. The subtuple of x that consists of the components whose indices are in Ij is denoted by xI j . Let r > 0. Given two structures, A and B, and a ∈ An , b ∈ Bn , we say that a and b are (I, r)-similar if the following hold: • NA r (aI j ) ∼= NB r (bI j ) for all j = 1, . . . , m; • d(aI j , aI l ) > r for all l = j; • d(bI j , bI l ) > r for all l = j. We call a and b r-similar if there exists a partition I such that a and b are (I, r)similar. A formula ϕ has the r-separation property if A |= ϕ(a) ↔ ϕ(b) whenever a and b are r-similar. Your first task is to prove that a formula has the separation property iff it is Gaifman-local. Next, prove the following. If r > 0, A⇆rB, and a, b are 2r-similar, then there exists a bijection f : A → B such that, for every c ∈ A, the tuples ax and bf(c) are r-similar. Use this result to show that lr(ϕ) ≤ 2k for every FO formula ϕ of quantifier rank k. Exercise 4.11. Define functions Hanf rankFO, Gaifman rankFO : N → N as follows: Hanf rankFO(n) = max{hlr(ϕ) | ϕ ∈ FO, qr(ϕ) = n}, Gaifman rankFO(n) = max{lr(ϕ) | ϕ ∈ FO, qr(ϕ) = n}. Assume that the vocabulary is purely relational. Prove that for every n > 1, Hanf rankFO(n) = 2n−1 − 1 and Gaifman rankFO(n) = 2n − 1. Exercise 4.12. Exponential lower bounds for locality rank were achieved on formulae of quantifier rank n with the total number of quantifiers exponential in n. Could it be that locality rank is polynomial in the number of quantifiers? Your goal is to show that the answer is negative. More precisely, show that there exist FO formulae with n quantifiers and locality rank O( √ 2 n ). Exercise 4.13. The BNDP was formulated in a rather asymmetric way: the assumption was that ∀i ∈ deg set(A) (i ≤ l), and the conclusion that |deg set(Q(A))|≤ fQ(l). A natural way to make it more symmetric is to introduce the following property of a query Q: there exists a function f′ Q : N → N such that |deg set(Q(A))| ≤ f′ Q(|deg set(A)|) for ever structure A. Prove that there are FO-definable queries on finite graphs that violate the above property. Exercise 4.14. Recall that a formula ϕ(x) is r-local around x if all the quantification is of the form ∃y ∈ Br(x) and ∀y ∈ Br(x). We now say that ϕ(x) is basic r-local around x if it is a Boolean combination of formulae of the form α(xi), where xi is a component of x, and α(xi) is r-local around xi. A formula is local (or basic local) around x if it is r-local (or basic r-local) around x for some r. 4.7 Exercises 65 Prove that every FO formula ϕ(x) that is local around x is logically equivalent to a formula that is basic local around x. Use this result to prove that any FO sentence is logically equivalent to a sentence of the form ∃x1 . . . ∃xn∀y ϕ(x1, . . . , xn, y), where ϕ(x1, . . . , xn, y) is local around (x1, . . . , xn, y). Exercise 4.15. This exercise presents a sufficient condition that guarantees a winning strategy by the duplicator. It shows that if two structures look similar (meaning that the duplicator has a winning strategy), and are extended to bigger structures in a “similar way”, then the duplicator has a winning strategy on the bigger structures as well. Let A, B be two structures of the same vocabulary that contains only relation symbols. Let A0, B0 be their substructures, with universes A0 and B0, respectively, and let A1 and B1 be substructures of A and B whose universes are A − A0 and B − B0. For every a ∈ A, dA(a, A0) is, as usual, min{dA (a, a0) | a0 ∈ A0}, and dB (b, B0) is defined similarly. Let A(r) (B(r)) be the substructure of A (respectively, B) whose universe is {a | dA(a, A0) ≤ r} (respectively, {b | dB (b, B0) ≤ r}). We write A(r) ≡dist k B(r) if A(r) ≡k B(r) and, whenever ai, bi are moves in the ith round, dA(ai, A0) = dB (bi, B0). We also write A1 ∼= dist B1 if there is an isomorphism h : A1 → B1 such that dA(a, A0) = dB (h(a), B0) for every a ∈ A − A0. Now assume that the following two conditions hold: 1. A(2k) ≡dist k B(2k), and 2. A1 ∼=dist B1. Prove that A ≡k B. Exercise 4.16. Let σ consist of one binary relation E, and let Φ be a σ-sentence. Prove that it is decidable whether Φ has a model in STRUCT1[σ]; that is, one can decide if there is a finite graph G in which all in- and out-degrees are 0 and 1 such that G |= Φ. 5 Ordered Structures We know how to prove basic results about FO; so now we start adding things to FO. One way to make FO more expressive is to include additional operations on the universe. For example, in database applications, data items stored in a database are numbers, strings, etc. Both numbers and strings could be ordered; on numbers we have arithmetic operations, on strings we have concatenation, substring tests, and so on. As query languages routinely use those operations, one may want to study them in the context of FO. In this chapter, we describe a general framework of adding new operations on the domain of a finite model. The main concept is that of invariant queries, which do not depend on a particular interpretation of the new operations. We show that such an addition could increase the expressiveness of a logic, even for properties that do not mention those new operations. We then concentrate on one operation of special importance: a linear order on the finite universe. We study FO(<) – that is, FO with an additional linear order < on the universe, and study its expressive power. Adding ordering will be of importance for almost all logics that we study (the only exception is fragments of second-order logic, where linear orderings are definable). We shall observe the following general phenomenon: for any logic that cannot define a linear ordering, adding one increases the expressive power, even for invariant queries. 5.1 Invariant Queries We start with an example. Suppose we have a vocabulary σ, and an additional vocabulary σ<,+ = {<, +}, where < is a binary relation symbol, and + is a ternary relation symbol. The intended interpretation is as follows. Given a set A, the relation < is interpreted as a linear ordering on it, say a1 < . . . < an, if A = {a1, . . . , an}. Then + is interpreted as {(ai, aj, ak) | ai, aj, ak ∈ A and i + j = k}. 68 5 Ordered Structures Recall that the query even(A) testing if | A |= 0 (mod 2) is not expressible over σ-structures: we proved this by using Ehrenfeucht-Fra¨ıss´e games. Now assume that we are allowed to use σ<,+ symbols in the query. Then we can write: Φ = ¬∃x (x = x) ∨ ∃x∃y (x + x = y) ∧ ¬∃z (y < z) . That is, either the universe is empty, or y is the largest element of the universe and y = x + x for some x. Then Φ tests if |A|= 0 (mod 2). However, one has to be careful with this statement. We cannot write A |= Φ iff even(A) for a σ-structure A, simply because Φ is not a sentence of vocabulary σ. The structure in which Φ is checked is an expansion of A with an interpretation of predicate symbols in σ<,+. That is, if A<,+ is a structure with universe A in which <, + are interpreted as was shown above, then (A, A<,+) |= Φ iff even(A). Here by (A, A<,+) we mean the structure whose universe is A, the symbols from σ are interpreted as in A, and <, + are interpreted as in A<,+. Before giving a general definition, we make another important observation. If we find any other interpretation for symbols < and +, as long as < is a linear ordering on A and + is the addition corresponding to <, the result of the query defined by Φ will be the same. This is the idea of invariance: no matter how the extra relations are interpreted, the result of the query is the same. We now formalize this concept. Recall that if σ and σ′ are two disjoint vocabularies, A ∈ STRUCT[σ], A′ ∈ STRUCT[σ′ ], and A, A′ have the same universe A, then (A, A′ ) stands for a structure of vocabulary σ ∪ σ′ , in which the universe is A, and the interpretation of σ (respectively, σ′ ) is inherited from A (A′ ). Definition 5.1. Let σ and σ′ be two disjoint vocabularies, and let C be a class of σ′ -structures. Let A ∈ STRUCT[σ]. A formula ϕ(x) in the language of σ∪σ′ is called C-invariant on A if for any two C structures A′ and A′′ on A we have ϕ[(A, A′ )] = ϕ[(A, A′′ )]. A formula ϕ is C-invariant if it is C-invariant on every σ-structure. If ϕ(x) is C-invariant, we associate with it an m-ary query Qϕ, where m =|x|. It is given by a ∈ Qϕ(A) iff (A, A′ ) |= ϕ(a), where A′ is some σ′ -structure in C whose universe is A. By invariance, it does not matter which C-structure A′ is used. We shall write FO + C for a class of all queries on σ ∪ σ′ -structures, and 5.2 The Power of Order-invariant FO 69 (FO + C)inv for the class of queries Qϕ, where ϕ is a C-invariant formula over σ ∪ σ′ . The most important case for us is when C is the class of finite linear orderings. In that case, we write < instead of C and use the notation (FO+<)inv. We refer to queries in this class as order-invariant queries. Notice that (FO+<)inv refers to a class of queries, rather than a logic. In fact, we shall see in Chap. 9 (Exercise 9.3) that it is undecidable whether an FO sentence is <-invariant. Coming back to our example of expressing even with < and +, the sentence Φ is a C<,+-invariant sentence, where C<,+ is the class of finite structures A, <, + , with a1 < . . . < an being a linear order on A, and + defined as {(ai, aj, ak) | i + j = k}. The Boolean query QΦ defined by this invariant sentence is precisely even. In some cases, establishing bounds on FO + C and (FO + C)inv is easy. For example, the proof that the bounded number of degrees property (BNDP) holds for FO shows that adding any structure of bounded degree would not violate the BNDP. Thus, we have the following result. Proposition 5.2. Let C ⊆ STRUCTl[σ′ ] for a fixed l ≥ 0. Then (FO + C) queries have the BNDP. In particular, (FO + C) cannot express the transitive closure query. The situation becomes much more interesting when degrees are not bounded; for example, when C is the class of linear orderings. We study it in the next section. 5.2 The Power of Order-invariant FO While queries in (FO+C)inv are independent of any particular structure from C, the mere presence of such a structure can have an impact on the expressive power. In fact, this can be demonstrated for the class of (FO+<)inv queries. The main result we prove here is the following. Theorem 5.3 (Gurevich). There are (FO+<)inv queries that are not FOdefinable. That is, FO (FO+<)inv. In the rest of the section we present the proof of this theorem. The proof is constructive: we explicitly generate the separating query, show that it belongs to (FO+<)inv, and then prove that it is not FO-definable. 70 5 Ordered Structures We consider structures in the vocabulary σ = {⊆} where ⊆ is a binary relation symbol. The intended interpretation of σ-structures of interest to us is finite Boolean algebras: that is, 2X , ⊆ , where X is a finite set. We first show that there is a sentence ΦBA such that A |= ΦBA iff A is of the form 2X , ⊆ for a finite X. For that, we shall need the following abbreviations: • ⊥(x) ≡ ∀z (x ⊆ z) (intended interpretation of x then is the empty set); • ⊤(x) ≡ ∀z (z ⊆ x) (x is the maximal element with respect to ⊆); • x ∪ y = z ≡ (x ⊆ z) ∧ (y ⊆ z) ∧ ∀u (x ⊆ u) ∧ (y ⊆ u) → (z ⊆ u) ; • x ∩ y = z ≡ (z ⊆ x) ∧ (z ⊆ y) ∧ ∀u (u ⊆ x) ∧ (u ⊆ y) → (u ⊆ z) ; • atom(x) ≡ ¬⊥(x) ∧ ∀z z ⊆ x → (z = x ∨ ⊥(z)) (i.e., x is an atom, or a singleton set); • x = ¯y ≡ ∀z (x ∪ y = z → ⊤(z)) ∧ ∀z (x ∩ y = z → ⊥(z)) (x is the complement of y). The sentence ΦBA is now the usual axiomatization for atomic Boolean algebras; that is, it is a conjunction of sentences that assert that ⊆ is a partial ordering, ∪ and ∩ exist, are unique, and satisfy the distributivity law and the absorption law (x ∩ (x ∪ y) = x); that the least and the greatest elements ⊥ and ⊤ are unique; and that complements are unique and satisfy De Morgan’s laws. Clearly, this can be stated as an FO sentence. We now formulate the separating query Qeven atom: Qeven atom(A) = true ⇔ A |= ΦBA and |{a | A |= atom(a)}|= 0 (mod 2). That is, it checks if the number of atoms in the finite Boolean algebra A is even. Lemma 5.4. Qeven atom ∈ (FO+<)inv. Proof. Let < be an ordering on the universe of A. It orders the atoms of the Boolean algebra: a0 < . . . < an−1. To check if the number of atoms is even, we check if there is a set that contains all the atoms in even positions (i.e., a0, a2, a4, . . .) and does not contain an−1. For that, we define the following formulae: • firstatom(x) ≡ atom(x) ∧ ∀y (atom(y) → x ≤ y). • lastatom(x) ≡ atom(x) ∧ ∀y (atom(y) → y ≤ x). • nextatom(x, y) ≡ atom(x) ∧ atom(y) ∧ (x < y) ∧ ¬∃z atom(z) ∧ (x < z) ∧ (z < y) . 5.2 The Power of Order-invariant FO 71 That is, firstatom(x) is true of a0, lastatom(x) is true of an−1, and nextatom(x, y) is true of any pair (ai−1, ai), 0 < i ≤ n − 1. Based on these, we express Qeven atom by the sentence below: ∃z   ∀x firstatom(x) → x ⊆ z ∧ ∀x lastatom(x) → ¬(x ⊆ z) ∧ ∀x, y nextatom(x, y) → ((x ⊆ z) ↔ ¬(y ⊆ z))   . That is, the above sentence is true iff the set containing the even atoms a0, a2, . . . does not contain an−1. Note that the set z may be different for each different interpretation of the linear ordering <, but the sentence still tests if the number of atoms is even, which is a property independent of a particular ordering. Lemma 5.5. Qeven atom is not FO-definable (in the vocabulary {⊆}). Proof. We shall use a game argument. Notice that locality does not help us here: in 2X , ⊆ , for any two sets C, D ⊆ X, the distance between them is at most 2, since ∅ ⊆ C, D. The proof illustrates the idea of composing a larger Ehrenfeucht-Fra¨ıss´e game from smaller and simpler games, already seen in Chap. 3. In the proof, we shall be using games on Boolean algebras. We first observe that if 2X , ⊆ ≡k 2Y , ⊆ , then we can assume, without any loss of generality, that the duplicator has a winning strategy in which he responds to the empty set by the empty set, to X by Y , and to Y by X. Indeed, suppose the spoiler plays ∅ in 2X , and the duplicator responds with Y ′ = ∅ in 2Y . If there is one more round left in the game, the spoiler would play the empty set in 2Y , and the duplicator has no response in 2X , contradicting the assumption that he has a winning strategy. Thus, in every round but the last, the duplicator must respond to ∅ by ∅. If the spoiler plays ∅ in 2X in the last round, it is contained in all the other moves played in 2X , and the duplicator can respond by ∅ in 2Y to maintain partial isomorphism. The proof for the other cases is similar. Next, we need the following composition result. Claim 5.6. Let 2X1 , ⊆ ≡k 2Y1 , ⊆ and 2X2 , ⊆ ≡k 2Y2 , ⊆ . Assume that X1 ∩ X2 = Y1 ∩ Y2 = ∅. Then 2X1∪X2 , ⊆, X1, X2 ≡k 2Y1∪Y2 , ⊆, Y1, Y2 . (5.1) Proof of Claim 5.6. Let Ai, Bi, i ≤ k, be the moves by the spoiler and the duplicator in the game (5.1). Let A1 i = Ai ∩ X1, A2 i = Ai ∩ X2, and likewise B1 i = Bi ∩ Y1, B2 i = Bi ∩ Y2. The winning strategy for the duplicator is as follows. Suppose i−1 rounds have been played, and in the ith round the spoiler plays Ai ⊆ X1 ∪ X2 (the case of the spoiler playing in Y1 ∪ Y2 is symmetric). The duplicator considers the position ((A1 1, . . . , A1 i−1), (B1 1 , . . . , B1 i−1)) 72 5 Ordered Structures in the game on 2X1 , ⊆ and 2Y1 , ⊆ , and finds his response B1 i ⊆ Y1 to A1 i . Similarly, he finds B2 i ⊆ Y2 as the response to A2 i in the position ((A2 1, . . . , A2 i−1), (B2 1, . . . , B2 i−1)) in the game on 2X2 , ⊆ and 2Y2 , ⊆ . His response to Ai is then Bi = B1 i ∪ B2 i . Clearly, playing in such a way, the duplicator preserves the ⊆ relation. Furthermore, it follows from the observation made before the claim that this strategy also preserves the constants: that is, if the spoiler plays X1, then the duplicator responds by Y1, etc. Hence, the duplicator has a winning strategy for (5.1). This proves the claim. The lemma now follows from the claim below. Claim 5.7. Let |X |, |Y |≥ 2k . Then 2X , ⊆ ≡k 2Y , ⊆ . Indeed, assume Qeven atom is definable by an FO-sentence of quantifier rank k. Take any X of odd cardinality and any Y of even cardinality, greater than 2k . By Claim 5.7, 2X , ⊆ ≡k 2Y , ⊆ , and hence they must agree on Qeven atom which is clearly false. Proof of Claim 5.7. It will be proved by induction on k. The cases of k = 0, 1 are obvious. Going from k to k+1, suppose we have X, Y with |X |, |Y |≥ 2k+1 . Assume, without loss of generality, that the spoiler plays A ⊆ X in 2X , ⊆ . There are three possibilities. 1. | A |< 2k . Pick an arbitrary B ⊆ Y with | B |=| A |. Then both | X − A | and |Y −B | exceed 2k . Thus, by the induction hypothesis, 2X−A , ⊆ ≡k 2Y −B , ⊆ . Furthermore, 2A , ⊆ ∼= 2B , ⊆ , which implies a weaker fact that 2A , ⊆ ≡k 2B , ⊆ . By Claim 5.6, 2X , ⊆, A ≡k 2Y , ⊆, B , meaning that after the duplicator responds to A with B, he can continue playing for k more rounds. This ensures a winning position, for the duplicator, after k + 1 rounds. 2. |X − A|< 2k . Pick an arbitrary B ⊆ Y with |Y − B |=|X − A|. Then the proof follows case 1. 3. | A |≥ 2k and | X − A |≥ 2k . Since | Y |≥ 2k+1 , we can find B ⊆ Y with |B |≥ 2k and |Y − B |≥ 2k . Then, by the induction hypothesis, 2A , ⊆ ≡k 2B , ⊆ , 2X−A , ⊆ ≡k 2Y −B , ⊆ , and we again conclude 2X , ⊆, A ≡k 2Y , ⊆, B , thus proving the winning strategy for the duplicator in k + 1 moves. 5.3 Locality of Order-invariant FO 73 This completes the proof of the claim, and of Theorem 5.3. Gurevich’s theorem is one of many instances of the proper containment L (L+<)inv, which holds for many logics of interest in finite model theory. We shall see similar results for logics with counting, fixed point logics, several infinitary logics, and some restrictions of second-order logic. 5.3 Locality of Order-invariant FO We know how to establish some expressivity bounds on invariant queries: for example, if extra relations are of bounded degree, then invariant queries have the BNDP. There are important classes of auxiliary relations that are of bounded degree. For example, the class Succ of successor relations: that is, graphs of the form {(a0, a1), (a1, a2), . . . , (an−1, an)} where all ai’s are distinct. Then the BNDP applies to FO + Succ, because for any A ∈ Succ, deg set(A) = {0, 1}. Adding order instead of successor destroys the BNDP, because for an ordering L on n elements, deg set(L) = {0, . . ., n − 1}. Moreover, while FO+< is local, locality does not tell us anything interesting. With a linear ordering, the distance between any two distinct elements is 1. Therefore, if a structure A is ordered by <, then N (A,<) 1 (a) = (A, <, a). Hence, every query is trivially Gaifman-local with locality rank 1. Gaifman-locality is a useful concept when applied to “sparse” structures, and structures with a linear order are not such. However, invariant queries do not talk about the order: they simply use it, but they are defined on σstructures for σ that does not need to include an ordering. Hence, if we could establish locality of order-invariant FO-definable queries, it would give us very useful bounds on the expressive power of (FO+<)inv. All the locality proofs we presented earlier would not work in this case, since FO formulae defining invariant queries do use the ordering. Nevertheless, the following is true. Theorem 5.8 (Grohe-Schwentick). Every m-ary query in (FO+ <)inv, m ≥ 1, is Gaifman-local. This theorem gives us easy bounds for FO+<. For example, to show that the transitive closure query is not definable in FO+ <, one notices that it is an invariant query. Hence, if it were expressible in FO+ <, it would have been an (FO+<)inv query, and thus Gaifman-local. We know, however, that transitive closure is not Gaifman-local. The proof of the theorem is quite involved, and we shall prove a slightly easier result (that is still sufficient for most inexpressibility proofs). We say that an m-ary query Q, m > 0, is weakly local if there exists a number d ≥ 0 such that for any structure A and any a1, a2 ∈ Am with a1 ≈A d a2 and BA d (a1) ∩ BA d (a2) = ∅ 74 5 Ordered Structures it is the case that a1 ∈ Q(A) iff a2 ∈ Q(A). That is, the only difference between weak locality and the usual Gaifmanlocality is that for the former, the neighborhoods are required to be disjoint. The result that we prove is the following. Proposition 5.9. Every unary query in (FO+<)inv is weakly local. The proof will demonstrate all the main ideas required to prove Theorem 5.8; completing the proof of the theorem is the subject of Exercises 5.8 and 5.9. The statement of Proposition 5.9 is also very powerful, and suffices for many bounds on the expressive power of FO+<. Suppose, for example, that we want to show that the same-generation query over colored trees is not in FO+<. Since same generation is order-invariant, it suffices to show that it is not weakly local, and thus not in (FO+<)inv. We consider colored trees as structures of the vocabulary (E, C), where E is binary and C is unary, and assume, towards a contradiction, that a binary query Qsg (same generation) is definable in FO+< by a formula ϕ(x, y). Let ψ(x) ≡ ∃y C(y) ∧ ϕ(x, y) . Then ψ defines a unary order-invariant query, testing if there is a node y in the set C such that (x, y) is in the output of Qsg. To show that it is not weakly local, assume to the contrary that it is, and construct a tree T as follows. Let d witness the weak locality of the query defined by ψ. Then T has three branches coming from the root, two of length d + 1 and one of length d + 2. Let the leaves be a, b, c, with c being the leaf of the branch of length d + 2. The set C is then {a}. Note that b ≈T d c and their balls of radius d are disjoint, and yet (T, <) |= ψ(b) ∧ ¬ψ(c) for any ordering <. Hence, ψ is not weakly local, and thus Qsg is not definable in FO+<. We now move to the proof of Proposition 5.9. First, we present the main idea of the proof. For that, we define the radius r sphere, r > 0, of a tuple a in a structure A as SA r (a) = BA r (a) − BA r−1(a). That is, SA r (a) is the set of elements at distance exactly r from a. As usual, the superscript A will be omitted when irrelevant or understood. We fix, for the proof, the vocabulary of the structure to be that of graphs; that is, σ = (E), where E is binary. This will simplify notation without any loss of generality. Given a structure A and a ∈ A, its d-ball can be thought of as a sequence of r-spheres, r ≤ d, where E-edges could go between Si(a) and Si+1(a), or between two elements of the same sphere. Let Q be a unary (FO+<)inv query on STRUCT[σ], defined by a formula ϕ(x) of quantifier rank k. Fix a sufficiently large d (exact bounds will be clear 5.3 Locality of Order-invariant FO 75 from the proof), and consider a ≈A d b, with Bd(a) and Bd(b) disjoint. Let h be an isomorphism h : Nd(a) → Nd(b). We now fix a linear ordering ≺a on Bd(a) such that dA(a, x) < dA(a, y) implies x ≺a y. In particular, a is the smallest element with respect to ≺a. We let ≺b be the image of ≺a under h. Let ≺0 be a fixed linear ordering on A − Bd(a, b). We now define a preorder ≺ as follows: x ≺ y iff x ≺a y, x, y ∈ Bd(a) or x ≺b y, x, y ∈ Bd(b) or h(x) ≺b y, x ∈ Bd(a), y ∈ Bd(b) or x ≺b h(y), x ∈ Bd(b), y ∈ Bd(a) or x ≺0 y, x, y ∈ Bd(a, b) or x ∈ Bd(a, b), y ∈ Bd(a, b). In other words, ≺ is a preorder that does not distinguish elements x and h(x), but it makes both x and h(x) less than y and h(y) whenever x ≺a y holds. Furthermore, each element of Bd(a, b) is less than each element of the complement, A − Bd(a, b), which in turn is ordered by ≺0. Our goal is to find two linear orderings, ≤a and ≤b on A, such that (A, a, ≤a) ≡k (A, b, ≤b). (5.2) This would imply a ∈ Q(A) iff (A, ≤a) |= ϕ(a) iff (A, ≤b) |= ϕ(b) iff b ∈ Q(A). (5.3) These orderings will be refinements of ≺, and will be defined sphere-by-sphere. For the ≤a ordering, a is the smallest element, and for the ≤b ordering, b is the smallest. On Sd(a)∪Sd(b), the orderings ≤a and ≤b must coincide (otherwise the spoiler will win easily). Note that ≺ is a preorder: the only pairs it does not order are pairs of the form (x, h(x)). To define ordering on them, we select two “sparse” sets of integers J = {j1, . . . , jm} and L = {l1, . . . , lm+1} with 0 < j1 < . . . < jm < d and 0 < l1 < l2 < . . . < lm+1 < d. “Sparse” here means that the difference between two consecutive integers is at least 2k + 1 (other conditions will be explained in the detailed proof). Assume that x ∈ Sr(a), y ∈ Sr(b), and y = h(x), for r ≤ d. Then x ≤a y ⇔ |{j ∈ J | j < r}| is even, y ≤a x ⇔ |{j ∈ J | j < r}| is odd, (5.4) and x ≤b y ⇔ |{l ∈ L | l < r}| is odd, y ≤b x ⇔ |{l ∈ L | l < r}| is even. (5.5) Thus, the parity of the number of ji’s or li’s below r tells us whether the order on pairs (x, h(x)) prefers the element from Bd(a) or Bd(b). Note that a is the 76 5 Ordered Structures least element with respect to ≤a (in particular, a ≤a b), and b is the least element for ≤b, but since the number of switches of preferences differs by one for ≤a and ≤b, on Sd(a, b) both orderings are the same. Of course a switch can be detected by a first-order formula, but we have many of them, and they happen at spheres that are well separated. The key idea of the proof is to use the sparseness of J and L to show that the difference between them cannot be detected by the spoiler in k moves. This will ensure (A, a, ≤a) ≡k (A, b, ≤b). We now present the complete proof; that is, we show how to construct two orderings, ≤a and ≤b, such that (5.2) holds. First, we may assume, without loss of generality, that no sphere Sr(a, b), r ≤ d, is empty. If any Sr(a, b) were empty, A would have been a disjoint union of Bd(a), Bd(b), and A − Bd(a, b), with no E-edges between these sets. Then, using NA d (a) ∼= NA d (b), it is easy to find orderings ≤a and ≤b such that (A, a, ≤a) and (A, b, ≤b) are isomorphic, and hence (A, a, ≤a) ≡k (A, b, ≤b) holds. To define the radius d for a given k (the quantifier rank of a formula defining Q), we need some additional notation. Let σ(r) be the vocabulary (E, <, U−r, U−r+1, . . . , U−1, U0, U1, . . . , Ur−1, Ur), where all the Ui’s are unary. Let t be the number of rank-(k + 1) types of σ(r) structures, where r = 2k (this number, as we know, depends only on k). Let Σ be a finite alphabet of cardinality t. Recall that a string s of length n over Σ is represented as a structure Ms of the vocabulary (<, A1, . . . , At) with the universe {1, . . ., n} ordered by <, and each unary Ai interpreted as the set of positions between 1 and n where the symbol is the ith symbol of Σ. We call a subset X = {x1, . . . , xp} of {1, . . ., n} r-sparse if min i=j |xi − xj |> r, xi > r, n − xi > r, for all i ≤ p. Next, we need the following lemma. Lemma 5.10. For every t, k ≥ 0, there exists a number d > 0 such that, given any string s ∈ Σ∗ of length n ≥ d, where |Σ |= t, there exist two subsets J, L ⊆ {1, . . . , n} such that • |L|=|J | +1 > 2k ; • J and L are 2k + 1-sparse; and • (Ms, J) ≡k+2 (Ms, L). The proof is a standard Ehrenfeucht-Fra¨ıss´e game argument, and is left to the reader as an exercise (Exercise 5.6). We now let d be given by Lemma 5.10, for k the quantifier rank of a formula defining Q, and t the number of rank-(k + 1) types of σ(2k)-structures. Fix a ≈A d b, with Bd(a) and Bd(b) disjoint, and let h be an isomorphism Nd(a) → Nd(b). For i, r ≤ d, let Ri r(a) be a σ(r)-structure whose universe is the union 5.3 Locality of Order-invariant FO 77 i+r j=i−r Sj(a) (if j < 0 or j > d, we take the corresponding sphere to be empty), and each Up is interpreted as Si+p(a), and the ordering is ≺a, the fixed linear ordering on Bd(a) such that dA(x, a) < dA(y, a) implies x ≺a y (restricted to the universe of the structure). Structures Ri r(b) are defined similarly, with the ordering being ≺b, the image of ≺a under the isomorphism h. Note that Ri r(b) ∼= Ri r(a). Let Σ be the set of rank-(k +1) types of σ(2k)-structures. Define a string s of length d + 1 which, in position i = 1, . . . , d + 1, has the rank-(k + 1) type of Ri−1 2k (a). Applying Lemma 5.10, we get two 2k + 1-sparse sets J, L such that (Ms, J) ≡k (Ms, L). Let J = {j1, . . . , jm} with j0 = 0 < j1 < . . . < jm < d and L = {l1, . . . , lm+1} with l0 = 0 < l1 < l2 < . . . < lm+1 < d. Using these J and L, define ≤a and ≤b as in (5.4) and (5.5). Let Nd,J(a) and Nd,L(a) be two structures in the vocabulary (E, <, U, c) with the universe Bd(a). In both, the binary predicate E is inherited from A, the ordering < is ≺a, and the constant c is a. The only difference is the unary predicate U: it is interpreted as j∈J Sj(a) in Nd,J (a), and as l∈L Sl(a) in Nd,L(a). Let Aa stand for (A, ≤a, a) and Ab for (A, ≤b, b). The winning strategy for the duplicator on Aa and Ab is based on the following lemma. Lemma 5.11. The duplicator has a winning strategy in the k-round game on Nd,J(a) and Nd,L(a). Moreover, if p1, . . . , pk are the moves on Nd,J (a), and q1, . . . , qk are the moves on Nd,L(a), then the following conditions can be guaranteed by the winning strategy: 1. If pi ∈ Sr(a) and d − r ≤ 2k−i , then qi = pi. 2. If (r1, . . . , rk) and (r′ 1, . . . , r′ k) are such that each pi is in the sphere Sri (a) and qi is in the sphere Sr′ i (a), then ((r1, . . . , rk), (r′ 1, . . . , r′ k)) define a partial isomorphism between (Ms, J) and (Ms, L). The idea of Lemma 5.11 is illustrated in Fig. 5.1. We have two structures, (Ms, J) and (Ms, L), which are linear orders with extra unary predicates, and two additional unary predicates, J and L of different parity, which are shown as short horizontal segments. Using the fact that (Ms, J) ≡k+2 (Ms, L), we prove that Nd,J(a) ≡k Nd,L(a). These are shown in Fig. 5.1 as two big circles, with concentric circles inside representing spheres Sr with r being in J or L, respectively. These spheres form the interpretation for an extra unary predicate in the vocabulary of structures Nd,J(a) and Nd,L(a). Next, we show that Proposition 5.9 follows from Lemma 5.11; after that, we prove Lemma 5.11. 78 5 Ordered Structures a• (Ms, J) (Ms, L) ≡k+2 ≡k Nd,J (a) Nd,L(a) •a Fig. 5.1. Games involved in the proof of Proposition 5.9 From Nd,J(a) ≡k Nd,L(a) to Aa ≡k Ab. We now show how Lemma 5.11 implies Proposition 5.9; that is, Aa ≡k Ab. The idea for the winning strategy on Aa and Ab is that it almost mimics the one in Nd,J(a) ≡k Nd,L(a). We shall denote moves in Aa by a1, . . ., and moves in Ab by b1, . . .. Suppose the spoiler plays ai ∈ Aa (the case of a move in Ab is symmetric). If ai ∈ Bd(a, b), then bi = ai, and we also set pi = qi = a. If ai ∈ Bd(a, b), we define pi ∈ Bd(a) to be ai if ai ∈ Bd(a), and h−1 (ai) if ai ∈ Bd(b). The duplicator then determines the response qi to pi, according to the Nd,J (a) ≡k Nd,L(a) winning strategy. The response bi is going to be either qi itself, or h(qi), and we use sets J and L to determine if bi lives in Bd(a) or Bd(b). We define two mappings vJ : Bd(a, b) → {0, 1} and vL : Bd(a, b) → {0, 1} such that for every x ∈ Bd(a), vJ (x) + vJ (h(x)) = vL(x) + vL(h(x)) = 1. For x ∈ Bd(a), find r ≤ d such that x ∈ Sr(a). Then vJ (x) = 0 if |{j ∈ J | j < r}| is even, 1 otherwise, and vJ (h(x)) = 1 − vJ (x). Similarly, for x ∈ Bd(b), we find r such that x ∈ Sr(b) and set vL(x) = 0 if |{l ∈ L | l < r}| is even, 1 otherwise, and define vL(x) = 1 − vL(h(x)) for x ∈ Bd(a). We now look at qi and h(qi); we know that vL(qi) + vL(h(qi)) = 1. We choose bi to be one of qi or h(qi) such that vL(bi) = vJ (ai). 5.3 Locality of Order-invariant FO 79 This describes the strategy; now we prove that it works. Dealing with the constant is easy: if the spoiler plays a in Aa, then the duplicator has to respond with b in Ab and vice versa. We now move to the E-relation. Since the parity of |J | and |L| is different, condition 1 of Lemma 5.11 implies that for any move in Bd(a, b)−Bd−2m (a, b) with m moves to go, the response is the identity. Hence, if E(ai, aj) holds, and one or both of ai, aj are outside of Bd(a, b), then E(bi, bj) holds (and vice versa). Therefore, it suffices to consider the case when E(ai, aj) holds, and ai, aj ∈ Bd(a, b). Assume, without loss of generality, that ai, aj ∈ Bd(a). Then E(pi, pj) holds, and hence E(qi, qj) holds. Given the duplicator’s strategy, to conclude that E(bi, bj) holds, we must show that both bi and bj belong to the same ball – Bd(a) or Bd(b). The elements ai and aj could come either from the same sphere Sr(a), or from two consecutive spheres Sr(a) and Sr+1(a). In the first case, if they come from the same sphere, vJ (ai) = vJ (aj) and thus vL(bi) = vL(bj). Furthermore, since ai and aj are in the same sphere, we conclude that pi and pj are in the same sphere, and hence, by the winning strategy of Lemma 5.11, qi and qj are in the same sphere. This, together with vL(bi) = vL(bj), means that bi and bj are in the same ball. Assume now that ai ∈ Sr(a) and aj ∈ Sr+1(a). From condition 2 of Lemma 5.11, for some r′ ≤ d we have qi ∈ Sr′ (a) and qj ∈ Sr′+1(a). Now there are two cases. In the first case, vJ (ai) = vJ (aj). Then there are two possibilities. If r, r + 1 ∈ J, then r′ , r′ + 1 ∈ L (by condition 2 of Lemma 5.11), and hence vL(bi) = vJ (ai) = vJ (aj) = vL(bj) implies that bi, bj are in the same ball, and E(bi, bj) holds. The other possibility is that r + 1 ∈ J, r ∈ J. Then r′ + 1 ∈ L, r′ ∈ L, and again we conclude E(bi, bj). The second case is when vJ (ai) = vJ (aj). This could only happen if r is in J (and thus r + 1 ∈ J). Then again by condition 2 of Lemma 5.11, r′ ∈ L, r′ + 1 ∈ L. Suppose vJ (ai) = 0. Then vL(bi) = 0, and vL(bj) = vJ (aj) = 1. Since bi ∈ Sr′ (a, b) and bj ∈ Sr′+1(a, b), and r′ ∈ L, both bi and bj must belong to the same ball (Bd(a) or Bd(b)), and hence E(bi, bj) holds. Thus, E(ai, aj) implies E(bi, bj); the proof of the converse – that E(bi, bj) implies E(ai, aj) – is identical. Finally, assume that ai ≤a aj. If ai ∈ Sr(a, b), aj ∈ Sr′ (a, b) and r < r′ , then, by condition 2 of Lemma 5.11, bi ∈ Sr0 (a, b), bj ∈ Sr′ 0 (a, b) for some r0 < r′ 0, and hence bi ≤b bj. Thus, it remains to consider the case of ai, aj being in the same sphere; that is, ai, aj ∈ Sr(a, b). If pi = pj, then pi ≺a pj and hence qi ≺a pj, which in turn implies bi ≤b bj. The final possibility is that of pi = pj; then either (1) ai ∈ Sr(a) and aj = h(ai), or (2) aj ∈ Sr(a) and ai = h(aj). We prove case (1); the proof of case (2) is identical. Note that the orderings ≤a and ≤b are defined in such a way that whenever x = h(y), then ≤a orders them according to vJ ; that is, if vJ (x) < vJ (y), then x ≤a y, and if vJ (y) < vJ (x), then y ≤a x. The ordering ≤b behaves likewise 80 5 Ordered Structures with respect to the function vL. Hence, if aj = h(ai) and ai ≤a aj, then vJ (ai) = 0 and vJ (aj) = 1. From Lemma 5.11, qi = qj, and thus bi and bj are related by the isomorphism h. Since vL(bi) = 0 and vL(bj) = 1, we know that bi ≤b bj. This concludes the proof that ai ≤a aj implies bi ≤b bj; the proof of the converse is identical. Thus, we have proved, using Lemma 5.11, that Aa ≡k Ab, which is precisely what is needed to conclude (weak) locality of Q. It thus remains to prove Lemma 5.11. Proof of Lemma 5.11. We shall refer to moves in the game on Nd,J (a) and Nd,L(a) as pi (in Nd,J (a)) and qi (in Nd,L(a)), and to moves in the game on (Ms, J) and (Ms, L), provided by Lemma 5.10, as ei for (Ms, J) and fi for (Ms, L). For two elements x, y in the universe of Ms (which is {1, . . . , d + 1}), the distance between them is |x − y |. The next claim shows that after i rounds, distances up to 2k−i between played elements, and elements of the sets J and L, are preserved. Claim 5.12. Let e1, . . . , ei and f1, . . . , fi be elements played in the first i rounds of the game on (Ms, J) and (Ms, L). Then: • if |ej1 − ej2 |≤ 2k−i , then |fj1 − fj2 |=|ej1 − ej2 |; • if |ej1 − ej2 |> 2k−i , then |fj1 − fj2 |> 2k−i ; • if min x∈J,j≤i |x − ej |≤ 2k−i , then min x∈J,j≤i |x − ej |= min y∈L,j≤i |y − fj |; • if min x∈J,j≤i |x − ej |> 2k−i , then min y∈L,j≤i |y − fj |> 2k−i . Proof of Claim 5.12. Since we know that (Ms, J) ≡k+2 (Ms, L), it suffices to show that for any x, y, p ≤ k, and any r ≤ 2p , there is a formula of quantifier rank p + 1 that tests if | x − y |= r, and there is a formula of quantifier rank p + 2 that tests if the minimum distance from x to an element of the set (interpreted as J and L in the models) is exactly r. We prove the first statement; the second is an easy exercise for the reader. We define α0(x, y) ≡ (x = y); this tests if the distance is zero. To test if the distance is one, we see if x is the successor of y or y is the successor of x: α1(x, y) ≡ x < y ∧ ¬∃z (x < z ∧ z < y) ∨ y < x ∧ ¬∃z (y < z ∧ z < x) . Now, suppose for each r ≤ 2p , we have a formula αr(x, y) in FO[p + 1] testing if the distance is r. We now show how to test distances up to 2p+1 using FO[p + 2] formulae. Suppose 2p < r ≤ 2p+1 . The formula αr is of the form (x < y) ∧ α′ r(x, y) ∨ (y < x) ∧ α′′ r (x, y) . We present α′ r(x, y) below. Let r1, r2 ≤ 2p be such that r1 + r2 = r. Then α′ r(x, y) ≡ ∃z (x < z) ∧ (z < y) ∧ αr1 (x, z) ∧ αr2 (z, y) . 5.3 Locality of Order-invariant FO 81 Clearly, this increases the quantifier rank by 1. This proves the claim. Given x ∈ Sr(a) and y ∈ Sr′ (a), define δ(x, y) as r−r′ . Given x1, . . . , xm in Bd(a), and u ≥ 0, we define a structure Su[x1, . . . , xm] as follows. Its universe is {x | −u ≤ δ(x, xi) ≤ u, i ≤ m}. It inherits binary relations E and ≺ from Bd(a). Note that the universe of Su[x1, . . . , xm] is a union of spheres. Suppose these are spheres Sr1 (a), . . . , Srw (a), with r1 < . . . < rw. Then the vocabulary of Su[x1, . . . , xm] contains w unary predicates U1, . . . , Uw, interpreted as Sr1 (a), . . . , Srw (a). Furthermore, SJ u[x1, . . . , xm] and SL u [x1, . . . , xm] extend Su[x1, . . . , xm] by means of an extra unary relation U interpreted as the union of spheres Sri (a) with ri ∈ J (ri ∈ L, respectively). We shall be interested in the parameter u of the form 2k−i , i ≤ k, and now define a relation SJ 2k−i [x1, . . . , xm] ∼k−i SL 2k−i [y1, . . . , ym]. The first condition is as follows: If the universe of SJ 2k−i [x1, . . . , xm] is a union of w spheres, Sr1 (a) ∪ . . . ∪ Srw (a), then the universe of SL 2k−i [y1, . . . , ym] is a union of w spheres, Sr′ 1 (a) ∪ . . . ∪ Sr′ w (a), and rj ∈ J iff r′ j ∈ L. (5.6) Define ∆u(r1, . . . , rw) as {j > 1 | rj+1 − rj > u}. The second condition is: ∆2k−i (r1, . . . , rw) = ∆2k−i (r′ 1, . . . , r′ w). (5.7) For 1 ≤ j < j′ ≤ w + 1, define the restriction SJ u[x1, . . . , xm]j′ j to include only the spheres from Srj (a) up to Srj′−1 (a) (and likewise for SL u [y1, . . . , ym]j′ j ). The next condition is: For each consecutive j, j′ ∈ {1, w + 1} − ∆2k−i (r1, . . . , rw), SJ 2k−i [x1, . . . , xm]j′ j ≡i SL 2k−i [y1, . . . , ym]j′ j . (5.8) We now write SJ 2k−i [x1, . . . , xm] ∼k−i SL 2k−i [y1, . . . , ym] if (5.6), (5.7), and (5.8) hold. Our goal is to show that the duplicator can play in such a way that, after i moves, SJ 2k−i [p0, p1, . . . , pi] ∼k−i SL 2k−i [q0, q1, . . . , qi], (5.9) where p0 = q0 = a. The proof is by induction on i. The case of i = 0 (i.e., SJ 2k−i [p0] ∼k SL 2k−i [q0]) is immediate from the sparseness of J and L. We also set e0 = f0 = 1. Now suppose (5.9) holds, and the spoiler plays pi+1 ∈ Nd,J (a), such that pi+1 ∈ Sr(a) (the case of the move qi+1 ∈ Nd,L(a) is symmetric). The duplicator sets ei+1 ∈ {1, . . ., d+1} to be r +1, and finds the response fi+1 to ei+1 82 5 Ordered Structures in the game on (Ms, J) and (Ms, L), from position ((e0, . . . , ei), (f0, . . . , fi)). Let fi+1 = r′ + 1; then the response qi+1 will be found in Sr′ (a). Assume that SJ 2k−i [p0, p1, . . . , pi] is the union of spheres Sr1 (a) ∪ . . . ∪ Srw (a), and SL 2k−i [q0, q1, . . . , qi] is the union of spheres Sr′ 1 (a) ∪ . . . ∪ Sr′ w (a). We distinguish two cases. Case 1. In this case | δ(pi+1, pj) |> 2k−(i+1) for all j ≤ w (i.e., | ei+1 − ej |> 2k−(i+1) ). From Claim 5.12, we conclude | δ(qi+1, qj) |> 2k−(i+1) for all j. Since ei+1 and fi+1 satisfy all the same unary predicates over (Ms, J) and (Ms, L), we see that there is an element qi+1 in Sr′ (a) such that S2k [pi+1] ≡k+1 S2k [qi+1] and hence S2k−(i+1) [pi+1] ≡k−(i+1) S2k−(i+1) [qi+1]. Moreover, by Claim 5.12, r ± l ∈ J iff r′ ± l ∈ L, for every l ≤ 2k−(i+1) , and hence SJ 2k−(i+1) [pi+1] ≡k−(i+1) SL 2k−(i+1) [qi+1]. From here SJ 2k−(i+1) [p0, p1, . . . , pi+1] ∼k−(i+1) SL 2k−(i+1) [q0, q1, . . . , qi+1] follows easily. This implies (5.8), and (5.6), (5.7) follow from the construction. The final note to make about this case is that if d − r ≤ 2k−(i+1) , then qi+1 can be chosen to be equal to pi+1, while preserving (5.9). Case 2. In this case | δ(pi+1, pj0 ) |≤ 2k−(i+1) for some j0 ≤ w. Find two consecutive j, j′ ∈ ∆2k−i (r1, . . . , rw) such that pi+1 is in SJ 2k−i [p0, . . . , pi]j′ j . From Claim 5.12, |δ(qi+1, qj)|≤ 2k−(i+1) . We then use (5.8) and find qi+1 in Sr′ (a) so that SJ 2k−i [p0, . . . , pi, pi+1]j′ j ≡k−(i+1) SL 2k−i [q0, . . . , qi, qi+1]j′ j . (5.10) Conditions (5.6) and (5.7) for 2k−(i+1) now follow from Claim 5.12, and condition (5.8) then follows from (5.10), since for every sphere which is a part of one of the structures mentioned in (5.10), there is a unary predicate interpreted as that sphere. Finally, if d + 1 − ei+1 ≤ 2k−(i+1) , then d + 1 − ej0 ≤ 2k−i , and thus pj0 = qj0 and the structures SJ 2k−i [p0, . . . , pi]j′ j and SL 2k−i [q0, . . . , qi]j′ j are actually isomorphic. Hence, responding to pi+1 with qi+1 = pi+1 will preserve the isomorphism of structures of the form SJ 2k−(i+1) [p0, . . . , pi, pi+1]l′ l and SL 2k−(i+1) [q0, . . . , qi, qi+1]l′ l containing the sphere with pi+1 = qi+1. This finally shows that the duplicator plays in such a way that (5.9) is preserved. After k moves, the moves of the game (p, q) form a partial isomorphism. Indeed, if pi1 , pi2 are in different structures SJ 1 [p]j′ j and SJ 1 [p]l′ l , then qi1 , qi2 are in different structures SL 1 [q]j′ j and SL 1 [q]l′ l , and hence there is no 5.5 Exercises 83 E-relation between them. Furthermore, since ei1 < ei2 iff fi1 < fi2 , we see that pi1 ≺ pi2 iff qi1 ≺ qi2 . If pi1 , pi2 are in the same structure SJ 1 [p]j′ j , then qi1 , qi2 are in SL 1 [q]j′ j , and hence by (5.8), the E and ≺ relations between them are preserved. Finally, since ei ∈ J iff fi ∈ L, we have pi ∈ U iff qi ∈ U. This shows that (p, q) is a partial isomorphism between Nd,J(a) and Nd,L(a), and thus finishes the proof of Lemma 5.11 and Proposition 5.9. 5.4 Bibliographic Notes While the concept of invariant queries is extremely important in finite model theory, over arbitrary models it is not interesting, as Exercise 5.1 shows. The separating example of Theorem 5.3 is due to Gurevich, although he never published it (it appeared as an exercise in [3]). Another separating example is given in Exercise 5.2. Locality of invariant FO-definable queries is due to Grohe and Schwentick [113]. Their original proof is the subject of Exercises 5.8 and 5.9; the proof presented here is a slight simplification of that proof. It uses the concept of weak locality, introduced in Libkin and Wong [170]. Sources for exercises: Exercise 5.1: Ebbinghaus and Flum [60] Exercise 5.2: Otto [192] Exercises 5.3 and 5.4: Libkin and Wong [170] Exercises 5.7–5.9: Grohe and Schwentick [113] Exercise 5.11: Rossman [210] 5.5 Exercises Exercise 5.1. Prove that over arbitrary structures, FO = (FO+<)inv. Hint: use the interpolation theorem. Exercise 5.2. The goal of this exercise it to give another separation example for FO (FO+ <)inv. We consider structures in the vocabulary σ = (U1, U2, E, R, S) where U1, U2 are unary and E, R, S are binary. We consider a class C of structures A ∈ STRUCT[σ] that satisfy the following conditions: 1. U1 and U2 partition the universe A. 2. E ⊆ U1 × U1 and S ⊆ U2 × U2. 3. The restriction of A to U2, S is a Boolean algebra (we refer to its set of atoms as X). 4. |X |=|U1 |= 2m; moreover, if U1 = {u1, . . . , u2m} and X = {x1, . . . , x2m}, then R = m[ i=1 {u2i−1, u2i} × {x2i−1, x2i}. 84 5 Ordered Structures First, prove that the class C is FO-definable. Next, consider the following Boolean query Q on C: Q(A) = true iff U1, E is connected. Prove that Q ∈ (FO+<)inv on C, but that Q is not FO-definable on C. Exercise 5.3. Give an example of a query that is weakly local, but is not Gaifman- local. Exercise 5.4. Prove that weak locality implies the BNDP for binary queries. Does this implication hold for m-ary queries, where m > 2? Exercise 5.5. Using Proposition 5.9, prove that acyclicity and k-colorability are not definable in FO+<. Exercise 5.6. Prove Lemma 5.10. Exercise 5.7. In the proof of weak locality of invariant queries presented in this chapter, we only dealt with nonoverlapping neighborhoods. To deal with the case of overlapping Bd(a) and Bd(b), prove the following. Let d′ = 5d + 1, and let a ≈A d′ b. Then there exists a set X containing {a, b} and an automorphism g on NA d (X) such that g(a) = b. Exercise 5.8. Prove that every unary query in (FO+<)inv is Gaifman-local. The main ingredients have already been presented in this chapter, but for the case of nonoverlapping neighborhoods. To deal with the case of overlapping neighborhoods Nd(a) and Nd(b), define d′ , g, and X as in Exercise 5.7. Now note that each sphere Sr(X) is a union of g-orbits; that is, sets of the form {gi (v) | i ∈ Z}. For each orbit O, we fix a node cO and define a linear ordering ≤0 on O by cO ≤0 g(cO) ≤0 g2 (cO) ≤0 . . .. Let ≤m be the image of ≤0 under gm . The definition of ≤a and ≤b is almost the same as the definition we used in the proof of Proposition 5.9. We start with a fixed order on orbits that respects distance from X. It generates a preorder on Bd(X), which we refine to two different orders in the following way. On S0(X), we let ≤a be ≤0 and ≤b be ≤1= g(≤0). Then, for suitably defined J and L (cf. the proof of Proposition 5.9), we do the following. Let J = {j1, . . . , jm}, j1 < . . . < jm. For all spheres Sr(X), r < j1, the order on each orbit is ≤0, but on Sj1 (X) we use ≤1 instead. We continue to use ≤1 until Sj2−1(X), and on Sj2 (X) we switch to ≤2, and so on. For ≤b, we do the same, except that we use the set L instead. We choose J and L so that | J |=| L | +1, which means that on Sd(X), both ≤a and ≤b coincide. The goal of the exercise is then to turn this sketch (together with the proof of Proposition 5.9) into a proof of locality of unary queries in (FO+<)inv. Exercise 5.9. The goal of this exercise is to complete the proof of Theorem 5.8. Using Exercise 5.8, show that every m-ary query in (FO+ <)inv, for m > 1, is Gaifman-local. Exercise 5.10. Calculate the locality rank of an order-invariant query produced in the proof of Theorem 5.8. You will probably have to use Exercise 3.10. 5.5 Exercises 85 Exercise 5.11. We know that FO (FO+ <)inv. What about (FO + Succ)inv? Clearly FO ⊆ (FO + Succ)inv ⊆ (FO+<)inv, and at least one containment must be proper. Find the exact relationship between these three classes of queries. Exercise 5.12.∗ Consider again the vocabulary σ<,+ and a class C<,+ of σ<,+structures where < is interpreted as a linear ordering, and + as the addition corresponding to <. Prove that every query in (FO + C<,+)inv is local. 6 Complexity of First-Order Logic The goal of this chapter is to study the complexity of queries expressible in FO. We start with the general definition of different ways of measuring the complexity of a logic over finite structures: these are data, expression, and combined complexity. We then connect FO with Boolean circuits and establish some bounds on the data complexity. We also consider the issue of uniformity for a circuit model, and study it via logical definability. We then move to the combined complexity of FO, and show that it is much higher than the data complexity. Finally, we investigate an important subclass of FO queries – conjunctive queries – which play a central role in database theory. 6.1 Data, Expression, and Combined Complexity Let us first consider the complexity of the model-checking problem: that is, given a sentence Φ in a logic L and a structure A, does A satisfy Φ? There are two parameters of this question: the sentence Φ, and the structure A. Depending on which of them are considered parameters of the problem, and which are fixed, we get three different definitions of complexity for a logic. Complexity theory defines its main concepts via acceptance of string languages by computational devices such as Turing machines. To talk about complexity of logics on finite structures, we need to encode finite structures and logical formulae as strings. For formulae, we shall assume some natural encoding: for example, enc(ϕ), the encoding of a formula ϕ, could be its syntactic tree (represented as a string). For the notion of data complexity, defined below, the choice of a particular encoding of formulae does not matter. There are several different ways to encode structures. The one we use here is the one most often used, but others are possible, and sometimes provide additional useful information about the running time of query-evaluation al- gorithms. Suppose we have a structure A ∈ STRUCT[σ]. Let A = {a1, . . . , an}. For encoding a structure, we always assume an ordering on the universe. In some 88 6 Complexity of First-Order Logic structures, the order relation is a part of the vocabulary; in others, it is not, and then we arbitrarily choose one. The order in this case will have no effect on the result of queries, but we need it to represent the encoding of a structure on the tape of a Turing machine, to be able to talk about computability and complexity of queries. Thus, we choose an order on the universe, say, a1 < a2 < . . . < an. Each k-ary relation RA will be encoded by an nk -bit string enc(RA ) as follows. Consider an enumeration of all k-tuples over A, in the lexicographic order (i.e., (a1, . . . , a1), (a1, . . . , a1, a2), . . . , (an, . . . , an, an−1), (an, . . . , an)). Let aj be the jth tuple in this enumeration. Then the jth bit of enc(RA ) is 1 if aj ∈ RA , and 0 if aj ∈ RA . We shall assume without any loss of generality that σ contains only relation symbols, since a constant can be encoded as a unary relation containing one element. If σ = {R1, . . . , Rp}, then the basic encoding of a structure is the concatenation of the encodings of relations: enc(RA 1 ) · · · enc(RA p ). In some computational models (e.g., circuits), the length of the input is a parameter of the model and thus |A| can easily be calculated from the basic encoding; in others (e.g., Turing machines), |A| must be known by the device in order to use the encoding of a structure. For that purpose, we define an enc(A) which is simply the concatenation of 0n 1 and all the enc(RA i )’s: enc(A) = 0n 1 · enc(RA 1 ) · · · enc(RA p ). (6.1) The length of this string, denoted by A , is A = (n + 1) + p i=1 narity(Ri) . (6.2) Definition 6.1. Let K be a complexity class, and L a logic. We say that • the data complexity of L is K if for every sentence Φ of L, the language {enc(A) | A |= Φ} belongs to K; • the expression complexity of L is K if for every finite structure A, the language {enc(Φ) | A |= Φ} belongs to K; and • the combined complexity of L is K if the language {(enc(A), enc(Φ)) | A |= Φ} belongs to K. 6.2 Circuits and FO Queries 89 • Furthermore, we say that the combined complexity of L is hard for K (or K-hard) if the language {(enc(A), enc(Φ)) | A |= Φ} is a K-hard problem. The data complexity is K-hard if for some Φ, {enc(A) | A |= Φ} is a hard problem for K, and the expression complexity is K-hard if for some A, {enc(Φ) | A |= Φ} is K-hard. • A problem that is both in K and K-hard is complete for K, or K-complete. Thus, we can talk about data/expression/combined complexity being K- complete. Given our standard choice of encoding, we shall sometimes omit the notation enc(·), instead writing {A | A |= Φ} ∈ K, etc. The notion of data complexity is most often used in the database context: the structure A corresponds to a large relational database, and the much smaller sentence Φ is a query that has to be evaluated against A; hence Φ is ignored in this definition. The notions of expression and combined complexity are often used in verification and model-checking, where a complex specification needs to be evaluated on a description of a finite state machine; in this case the specification Φ may actually be more complex than the structure A. We shall also see that for most logics of interest, all the hardness results for the combined complexity will be shown on very simple structures, thereby giving us matching bounds for the combined and expression complexity. Thus, we shall concentrate on the data and combined complexity. We defined the notion of complexity for sentences only. The notion of data complexity has a natural extension to formulae with free variables defining non-Boolean queries. Suppose an m-ary query Q is definable by a formula ϕ(x1, . . . , xm). Then the data complexity of Q is the complexity of the language {(enc(A), enc({a})) | a ∈ Q(A)}. This is the same as the data complexity of the sentence (∃!x S(x)) ∧ (∀x (S(x) → ϕ(x))), where S is a new m-ary relation symbol not in σ (we assume that the logic L is closed under the Boolean connectives and first-order quantification). Recall that the quantifier ∃!x means “there exists a unique x”. Thus, as long as L has the right closure properties, we can only consider data complexity with respect to sentences. 6.2 Circuits and FO Queries In this section we show how to code FO sentences over finite structures by Boolean circuits. This coding will give us bounds for both the data and combined complexity of FO. Definition 6.2. A Boolean circuit with n inputs x1, . . . , xn is a tuple C = (V, E, λ, o), where 90 6 Complexity of First-Order Logic 1. (V, E) is a directed acyclic graph with the set of nodes V (which we call gates) and the set of edges E. 2. λ is a function from V to {x1, . . . , xn} ∪ {∨, ∧, ¬} such that: • λ(v) ∈ {x1, . . . , xn} implies that v has in-degree 0; • λ(v) = ¬ implies that v has in-degree 1. 3. o ∈ V . The in-degree of a node is called its fan-in. The size of C is the number of nodes in V ; the depth of C is the length of the longest path from a node of in-degree 0 to o. A circuit C computes a Boolean function with n inputs x1, . . . , xn as follows. Suppose we are given values of x1, . . . , xn. Initially, we compute the values associated with each node of in-degree 0: for a node labeled xi, it is the value of xi; for a node labeled ∨ it is false; and for a node labeled ∧ it is true. Next, we compute the value of each node by induction: if we have a node v with incoming edges from v1, . . . , vl, and we know the values of a1, . . . , al associated with v1, . . . , vl, then the value a associated with v is: • a1 ∨ . . . ∨ al if λ(v) = ∨; • a1 ∧ . . . ∧ al if λ(v) = ∧; • ¬a1 if λ(v) = ¬ (in this case we know that l = 1). The output of the circuit is the value assigned to the node o. An example of a circuit computing the Boolean function x1 ∧ ¬x2 ∧ x3 ∨ ¬ x3 ∧ ¬x4 is shown in Fig. 6.1; the output node is depicted as a double circle. Note that a circuit with no inputs is possible, and its in-degree zero gates are labeled ∨ or ∧. Such a circuit always outputs a constant (i.e., true or false). We next define families of circuits and languages in {0, 1}∗ they accept. Definition 6.3. A family of circuits is a sequence C = (Cn)n≥0 where each Cn is a circuit with n inputs. It accepts the language L(C) ⊆ {0, 1}∗ defined as follows. Let s be a string of length n. It can be viewed as a Boolean vector xs such that the ith component of xs is the ith symbol in s. Then s ∈ L(C) iff Cn outputs 1 on xs. A family of circuits C is said to be of polynomial size if there is a polynomial p : N → N such that the size of each Cn is at most p(n). For a function f : N → N, we say that C is of depth f(n) if the depth of Cn is at most f(n). We say that C is of constant depth if there is d > 0 such that for all n, the depth of Cn is at most d. The class of languages accepted by polynomial-size constant-depth families of circuits is called nonuniform AC0 . 6.2 Circuits and FO Queries 91 6 6 6 6 6 6  I  * x1 x2 x3 x4 ¬ ¬ ∧ ∧ ¬ ∨ Fig. 6.1. Boolean circuit computing (x1 ∧ ¬x2 ∧ x3) ∨ ¬(x3 ∧ ¬x4) For example, the language that consists of strings containing at least two ones is in nonuniform AC0 : each circuit Cn, n > 1, has ∧-gates for every pair of inputs xi and xj, and then the outputs of those ∧-gates form the input for one ∨-gate. A class of structures C ⊆ STRUCT[σ] is in nonuniform AC0 if so is the language {enc(A) | A ∈ C}. An example of a class of structures that is not FO-definable, but belongs to nonuniform AC0 , is the class even of structures of the empty vocabulary: that is, { A, ∅ | | A | mod 2 = 0}. The coding of such a structure with |A|= n is simply 0n 1; hence Ck always returns true for odd k (as it corresponds to structures of even cardinality), and false for even k. Next, we extend FO as follows. Let P be a collection, finite or infinite, of numerical predicates; that is, subsets of Nk . For example, they may include <, + considered as a ternary predicate {(i, j, l) | i + j = l}, etc. For P including the linear order, we define FO(P) as an extension of FO with atomic formulae of the form P(x1, . . . , xk), for a k-ary P ∈ P. The semantics is defined as follows. Suppose A is a σ-structure, and its universe A is ordered by < as a0 < . . . < an−1. Then A |= P(ai1 , . . . , aik ) iff the tuple of numbers (i1, . . . , ik) belongs to P. For example, let P2 ⊂ N consist of the even numbers. Then the query even is expressed as an FO({<, P2}) sentence as follows: ∀x ∀y (y ≤ x) → P2(x) . We are now interested in the class FO(All) where All stands for the family of all numerical predicates; that is, all subsets of N, N2 , N3 , etc. We now show the connection between FO(All) and nonuniform AC0 . 92 6 Complexity of First-Order Logic Theorem 6.4. Let C be a class of structures definable by an FO(All) sentence. Then C is in nonuniform AC0 . That is, FO(All) ⊆ nonuniform AC0 . Furthermore, for every FO(All) sentence Φ, there is a family of circuits of depth O( Φ ) accepting {A | A |= Φ}. Proof. We describe each circuit Ck in the family C accepting {A | A |= Φ}. If k is not of the form A for some structure A, then Ck always returns false. Assume k is given by (6.2); that is, k is the size of the encodings of structures A with an n-element universe. We then convert Φ into a quantifier-free sentence Φ′ over the vocabulary σ, predicate symbols in All, and constants 0, . . . , n − 1 as follows. Inductively, we replace each quantifier ∃xϕ(x, y) or ∀xϕ(x, y) with n−1 c=0 ϕ(c, y) and n−1 c=0 ϕ(c, y), respectively. Notice that the number of connectives ∨, ∧, ¬, , in Φ′ is exactly the same as the number of connectives ∨, ∧, ¬ and quantifiers ∃, ∀ in Φ. We now build the circuit to evaluate Φ′ . Note that Φ′ is a Boolean combination (using connectives ∨, ∧, ¬, , ) of formulae of the form P(i1, . . . , ik), where P is a numerical predicate, and R(i1, . . . , im), where R is an m-ary symbol in σ. The former is replaced by its truth value (which is either a ∨ or a ∧ gate with zero inputs), and the latter corresponds to one bit in enc(A); that is, the input of the circuit. The depth of the resulting circuit is bounded by the number of connectives ∨, ∧, ¬, , in Φ′ , and hence depends only on Φ, and not on k. The size of the circuit Ck is clearly polynomial in k, which completes the proof. Corollary 6.5. The data complexity of FO(All) is nonuniform AC0 . We conclude this section with another bound on the complexity of FO queries. This time we determine the running time of such a query in terms of the sizes of encodings of a query and a structure. Given an FO formula ϕ, its width is the maximum number of free variables in a subformula of ϕ. Proposition 6.6. Let Φ be an FO sentence in vocabulary σ, and let A ∈ STRUCT[σ]. If the width of Φ is k, then checking whether A |= Φ can be done in time O( Φ × A k ). Proof. Assume, without loss of generality, that Φ uses ∧, ¬, and ∃ but not ∨ and ∀. Let ϕ1, . . . , ϕm enumerate all the subformulae of Φ; we know that they 6.3 Expressive Power with Arbitrary Predicates 93 contain at most k free variables. We now inductively construct ϕi(A). If ϕi has ki free variables, then ϕi(A) ⊆ Aki . It will be represented by a Boolean vector of length nki , where n =|A|, in exactly the same way as we code relations in A. If ϕi is an atomic formula R(x1, . . . , xki ), then ϕi(A) is simply the encoding of R in enc(A). If ϕi is ¬ϕj(A), we simply flip all the bits in the representation of ϕj(A). If ϕi is ϕj ∧ ϕl, there are two cases. If the free variables of ϕj and ϕl are the same, then ϕi(A) is obtained as the bit-wise conjunction of ϕj(A) and ϕl(A). Otherwise, ϕi(x, y, z) = ϕj(x, y) ∧ ϕl(x, z), and ϕi(A) is the join of ϕj(A) and ϕl(A), obtained by finding, for all tuples over a ∈ A|x| , tuples b ∈ A|y| and c ∈ A|z| such that the bits corresponding to (a, b) in ϕj(A) and to (a, c) in ϕl(A) are set to 1, and then setting the bit corresponding to (a, b, c) in ϕi(A) to 1. Finally, if ϕi(x) = ∃zϕj(z, x), we simply go over ϕj(A), and if the bit corresponding to (a, a) is set to 1, then we set the bit corresponding to a in ϕi(A) to 1. The reader can easily check that the above algorithm can be implemented in time O( Φ × A k ), since none of the formulae ϕi has more than k free variables. 6.3 Expressive Power with Arbitrary Predicates In the previous section, we introduced a powerful extension of FO – the logic FO(All). Since this logic can use arbitrary predicates on the natural numbers, it can express noncomputable queries: for example, we can test if the size of the universe of A is a number n which codes a pair (k, m) such that the kth Turing machine halts on the mth input (assuming some standard enumeration of Turing machines and their inputs). Nevertheless, we can prove some strong bounds on the expressiveness of FO(All): although we saw that even is FO(All)-expressible, it turns out that the closely related query, parity, is not. Recall that parityU is a query on structures whose vocabulary σ contains one unary relation symbol U. Then parityU (A) ⇔ |UA | mod 2 = 0. We shall omit the subscript U if it is understood from the context. To show that parity is not FO(All)-expressible, we consider the Boolean function parity with n arguments (for each n) defined as follows: parity(x1, . . . , xn) = 1 if |{i | xi = 1}| mod 2 = 0, 0 otherwise. We shall need the following deep result in circuit complexity. 94 6 Complexity of First-Order Logic Theorem 6.7 (Furst-Saxe-Sipser, Ajtai). There is no constant-depth polynomial-size family of circuits that computes parity. Corollary 6.8. parity is not expressible in FO(All). Proof. Assume, to the contrary, that parity is expressible. By Theorem 6.4, there is a polynomial-size constant-depth circuit family C that computes parity on encodings of structures. Such an encoding of a structure A with |A|= n is 0n 1 · s, where s is the string of length n whose ith element is 1 iff the ith element of A is in UA . We now use C to construct a new family of circuits defining parity. The circuit with n inputs x1, . . . , xn works as follows. For each xi, it adds an indegree 0 gate gi labeled ∨, and for xn it also adds an in-degree 0 gate g′ n labeled ∧. Then it puts C2n+1, the circuit with 2n + 1 inputs from C on the outputs of g1, . . . , gn, g′ n followed by x1, . . . , xn, as shown below: 6 6 6 6 6 6 C2n+1 ... .........∨ ∨ ∧ x1 xn Clearly this circuit computes parity(x1, . . . , xn), and by Theorem 6.4 the resulting family of circuits is of polynomial size and bounded depth. This contradicts Theorem 6.7. As another example of inexpressibility in FO(All), we show the following. Corollary 6.9. Graph connectivity is not expressible in FO(All). Proof. We shall follow the idea of the proof of Corollary 3.19; however, that proof used inexpressibility of the query even, which of course is definable in FO(All). We modify the proof to make use of Corollary 6.8 instead. First, we show that for a graph G = (V, E), where E is a successor relation on a set U ⊆ V of nodes, FO(All) cannot test if the cardinality of U is even. Indeed, suppose to the contrary that it can; then this can be done in nonuniform AC0 , by a family of circuits C. We now show how to use C to test parity. Suppose an encoding 0n 1 · s of a unary relation U is given, where U = {i1, . . . , ik} ⊆ {1, . . . , n}. We transform U into a successor relation SU = {(i1, i2), . . . , (ik−1, ik)}. We leave it to the reader to show how to use bounded-depth circuits to transform 0n 1 · s into 0n 1 · s′ where s′ of length n2 codes SU . Then using the circuit Cn2+n+1 from C on 0n 1 · s′ we can test if U is even. Finally, using inexpressibility of parity of a successor relation, we show inexpressibility of connectivity in FO(All) using the same proof as in Corollary 3.19. 6.4 Uniformity and AC0 95 6.4 Uniformity and AC0 We have noticed that nonuniform AC0 is not truly a complexity class: in fact, the function that computes the circuit Cn from n need not even be recursive. It is customary to impose some uniformity conditions that postulate how Cn is obtained. While it is possible to formulate these conditions purely in terms of circuits, we prefer to follow the logic connection, and instead put restrictions on the choice of available predicates in FO(All). We now associate a finite n-element universe of a structure with the set {0, . . . , n−1}, and consider an extension of FO over σ-structures by adding two ternary predicates, +++ and ×××, which are graphs of addition and multiplication. That is, +++ = {(i, j, k) | i + j = k} and ××× = {(i, j, k) | i · j = k}. Note that we have to use +++ and ××× as ternary relations rather than binary functions, to ensure that the result of addition or multiplication is always in the universe of the structure. The resulting logic is denoted by FO(+++,×××). Definition 6.10. The class of structures definable in FO(+++,×××) is called uniform AC0 . We shall normally omit the word uniform; hence, by referring to just AC0 , we mean uniform AC0 . Note that many examples of AC0 queries seen so far only use the standard arithmetic on the natural numbers; for example, even is in AC0 . It turns out that AC0 is quite powerful and can define several interesting numerical relations on the domain {0, . . ., n−1}. One of them, which we shall see quite often, is the bit relation: BIT(x, y) is true ⇔ the yth bit of the binary expansion of x is 1. For example, the binary expansion of x = 18 is 10010, and hence BIT(x, y) is true if y is 1 or 4, and BIT(x, y) is false if y is 0, 2, or 3. We now start building the family of functions definable in FO(+++,×××). Whenever we say that a k-ary function is definable, we actually mean that the graph of this function, a k + 1-ary relation, is definable. However, to make formulae more readable, we often use functions instead of their graphs. First, we note that the linear order is definable by x ≤ y ⇔ ∃z +++(x, z, y) (i.e., ∃z (x+z = y)), and thus the minimum element 0, and the maximum element, denoted by max, are definable. Lemma 6.11. The integer division ⌊x/y⌋ and (x mod y) are definable in FO(+++,×××). 96 6 Complexity of First-Order Logic Proof. If y = 0, then u = ⌊x/y⌋ ⇔ (u · y) ≤ x ∧ (∃v < y (x = u · y + v)) . Furthermore, u = (x mod y) ⇔ ∃v (v = ⌊x/y⌋) ∧ (u + y · v = x) . In particular, we can express divisibility x | y as (x mod y) = 0. Now our goal is to show the following. Theorem 6.12. BIT is expressible in FO(+++,×××). Proof. We shall prove this in several stages. First, note that the following tests if x is a power of 2: pow2(x) ≡ ∀u, v (x = u · v) ∧ (v = 1) → ∃z (v = z + z) . This is because pow2(x) asserts that 2 is the only prime factor of x. Next, we define the predicate BIT′ (x, y) ≡ (⌊x/y⌋ mod 2) = 1. Note that if y = 2z , then BIT′ (x, y) is true iff the zth bit of x is 1. Assume that we can define the predicate y = 2z . Then BIT(x, y) ≡ ∃u u = 2y ∧ BIT′ (x, u) . Thus, it remains to show how to express the binary predicate x = 2y . We do so by coding an iterative computation of 2y . The codes of such computations will be numbers, and as we shall see, those numbers can be as large as x4 . Since we only quantify over {0, . . . , n − 1}, where n is the size of the finite structure, we show below how to express the predicate P2(x, y) ≡ x = 2y ∧ x4 ≤ n − 1. With P2, we can define x = 2y as follows: ∃u∃v     y = 4v ∧ P2(u, v) ∧ x = u4 ∨ y = 4v + 1 ∧ P2(u, v) ∧ x = 2 · u4 ∨ y = 4v + 2 ∧ P2(u, v) ∧ x = 4 · u4 ∨ y = 4v + 3 ∧ P2(u, v) ∧ x = 8 · u4     . We now show how to express P2(x, y). Let y = k−1 i=0 yi · 2i , so that y is yk−1yk−2 . . . y1y0 in binary (we assume that the most significant bit yk−1 is 1). Then 2y = k−1 i=0 2yi·2i . We now define the following recurrences for i < k: p0 = 1 a0 = 0 b0 = 1 pi+1 = 2pi ai+1 = ai + yi · 2i bi+1 = bi · 2yi·2i 6.4 Uniformity and AC0 97 Thus, pi = 2i , ai is the number whose binary representation is yi−1 . . . y0, and bi = 2ai . We define sequences p = (p0, . . . , pk), a = (a0, . . . , ak), b = (b0, . . . , bk). Next, we explain how to code these sequences. Notice that in all three of them, the ith element needs at most 2i bits to be represented in binary. Suppose we have an arbitrary sequence c = (c0, . . . , ck), where each ci has at most 2i bits in binary. Such a sequence will be coded by a number c such that its 2i bits from 2i to 2i+1 − 1 form the binary representation of ci. These codes, when applied to p, a, and b, result in numbers p, a, and b, respectively. These numbers turn out to be relatively small. Since the length of the binary representation of y is k, we know that y ≥ 2k−1 . If x = 2y , then x ≥ 22k−1 and x4 ≥ 22k+1 . The binary representation of p, a, and b has at most 2k+1 − 1 bits, and hence the maximum value of those codes is 22k+1 −1 − 1, which is bounded above by x4 . Hence, for defining P2, codes of all the sequences will be bounded by the size of the universe. How can one extract numbers ci from the code c of c ? Notice that ⌊x/22i ⌋ mod 22i is ci. In general, we define extract(x, u) ≡ ⌊x/u⌋ mod u, and thus ci = extract(c, 22i ). Notice that since (22i )2 = 22i+1 , for u = 22i we have ci = extract(c, u) and ci+1 = extract(c, u2 ). Assume now that we have an extra predicate ppow2(u) which holds iff u is of the form 22i . With this, we express P2(x, y) by stating the existence of a, b, p (coding a, b, p) such that: • extract(p, 2) = 1, extract(a, 2) = 0, extract(b, 2) = 1 (the initial conditions of the recurrences hold). • If u < x and ppow2(u), then extract(p, u2 ) = 2 · extract(p, u) (the recurrence for p is correct). • If u < x and ppow2(u), then either 1. extract(a, u2 ) = extract(a, u) and extract(b, u2 ) = extract(b, u), or 2. extract(a, u2 ) = extract(a, u) + extract(p, u) and extract(b, u2 ) = u · extract(b, u). That is, the recurrences for a and b are coded correctly: the first case corresponds to yi = 0, and hence ai+1 = ai and bi+1 = bi; the second case corresponds to yi = 1, and hence ai+1 = ai +pi and bi+1 = bi ·22i = bi ·u. • There is u such that ppow2(u) holds, extract(a, u) = y, and extract(b, u) = x. That is, the sequences show that 2y = x. Clearly, the above can be expressed as an FO formula. 98 6 Complexity of First-Order Logic All that remains is to show how to express the predicate ppow2(u). This in turn is done in two steps. First, we define a predicate P1(v) that holds iff v is of the form s i=1 22i (i.e., in its binary representation, ones appear only in positions corresponding to powers of 2). With this predicate, we define ppow2(u) ≡ pow2(u) ∧ ∃w P1(w) ∧ BIT′ (w, u). Note that if ppow2(u) holds, one can find w with BIT′ (w, u) such that w ≤ u2 ; given that all numbers for which ppow2(·) is checked are below 4 √ n − 1, the ∃w is guaranteed to range over the finite universe. To express P1, we need an auxiliary formula pow4(u) = pow2(u) ∧ (u mod 3 = 1) testing if u is a power of 4. Now P1(u) is the conjunction of ¬BIT′ (u, 1) ∧ BIT′ (u, 2) and the following formula: ∀v 2 < v ≤ u → BIT′ (u, v) ↔ (pow4(v)∧∃w [(w ·w = v)∧BIT′ (u, w)]) . This formula states that 1-bits in the binary representation of u are 2 and others given by the sequence e1 = 2, e2 = 4, . . . , ei+1 = e2 i ; that is, bits in positions of the form 22i . This defines P1, and thus completes the proof of the theorem. The BIT predicate turns out to be quite powerful. First note the following. Lemma 6.13. Addition is definable in FO(<, BIT). Proof. We use the standard carry-lookahead algorithm. Given x, y, and u, we define carry(x, y, u) to be true if, while adding x, y given as binary numbers, the carry bit with number u is 1: ∃v v < u ∧ BIT(x, v) ∧ BIT(y, v) ∧ ∀w (w < u ∧ w > v) → (BIT(x, w) ∨ BIT(y, w)) . Then x + y = z iff ∀u BIT(z, u) ↔ (BIT(x, u) ⊕ BIT(y, u)) ⊕ carry(x, y, u) , where ϕ ⊕ ψ is an abbreviation for ϕ ↔ ¬ψ. A more complicated result (see Exercise 6.5) states the following. Lemma 6.14. Multiplication is definable in FO(<, BIT). We thus obtain: Corollary 6.15. FO(<, BIT) = FO(+++,×××). Hence, uniform AC0 can be characterized as the class of structures definable in FO(<, BIT). 6.6 Parametric Complexity and Locality 99 6.5 Combined Complexity of FO We have seen that the data complexity of FO(All) is nonuniform AC0 , and the data complexity of FO is AC0 . What about the combined and expression complexity of FO? It turns out that they belong to a much larger class than AC0 . Theorem 6.16. The combined complexity of FO is Pspace-complete. Proof. The membership in Pspace follows immediately from the evaluation method used in the proof of Proposition 6.6. To show hardness, recall the problem QBF, satisfiability of quantified Boolean formulae. Problem: QBF Input: A formula Φ = Q1x1 . . . Qnxn α(x1, . . . , xn), where: each Qi is either ∃ or ∀, and α is a propositional formula in x1, . . . , xn. Question: If all xi’s range over {true, false}, is Φ true? It is known that QBF is Pspace-hard (see the bibliographic notes at the end of the chapter). We now prove Pspace-hardness of FO by reduction from QBF. Given a formula Φ = Q1x1 . . . Qnxn α(x1, . . . , xn), construct a structure A whose vocabulary includes one unary relation U as follows: A = {0, 1}, and UA = {1}. Then modify α by changing each occurrence of xi to U(xi), and each occurrence of ¬xi to ¬U(xi). Let αU be the resulting formula. For example, if α(x1, x2, x3) = (x1 ∧x2)∨(¬x1 ∧x3), then αU is (U(x1)∧U(x2))∨ (¬U(x1) ∧ U(x3)). Then Φ is true ⇔ A |= Q1x1 . . . Qnxn αU (x1, . . . , xn), which proves Pspace-hardness. Since the structure A constructed in the proof of Theorem 6.16 is fixed, we obtain: Corollary 6.17. The expression complexity of FO is Pspace-complete. For most of the logics we study, the expression and combined complexity coincide; however, this need not be the case in general. 6.6 Parametric Complexity and Locality Proposition 6.6 says that checking whether A |= Φ can be done in time O( Φ · A k ), where k is the width of Φ: the maximum number of free variables of a subformula of Φ. In particular, this gives a polynomial time 100 6 Complexity of First-Order Logic algorithm for evaluating FO queries on finite structures, for a fixed sentence Φ. Although polynomial time is good, in many cases it is not sufficient: for example, in the database context where A is very large, even for small k the running time O( A k ) may be prohibitively expensive (in fact, the goal of most join algorithms in database systems is to reduce the running time from the impractical O(n2 ) to O(n log n) – at least if the result of the join is not too large – and running time of the order n10 is completely out of the question). The question is, then, whether sometimes (or always) one can find better algorithms for evaluating FO queries on finite structures. In particular, it would be ideal if one could always guarantee time linear in A . Since the combined complexity of FO queries is Pspace-complete, something must be exponential, so in that case we would expect the complexity to be O g( Φ )· A , where g : N → N is some function. This is the setting of parameterized complexity, where the standard input of a problem is split into the input part and the parameter part, and one looks for fixed parameter tractable problems that admit algorithms with running time O(g(π)·np ) for a fixed p; here π is the size of the parameter, and n is the size of the input. It is known that even some NP-hard problems become fixed parameter tractable if the parameters are chosen correctly. For example, SET COVER is the problem whose input is a set V , a family F of its subsets, and a number k, and the output is “yes” if there is a subset of V of size at most k that intersects every member of F. This problem is NP-complete, but if we choose π = k + maxF ∈F |F | to be the parameter, it becomes solvable in time O(ππ+1 · |F |), thus becoming linear in what is likely the largest part of the input. We now formalize the concept of fixed-parameter tractability. Definition 6.18. Let L be a logic, and C a class of structures. The modelchecking problem for L on C is the problem to check, for a given structure A ∈ C and an L-sentence Φ, whether A |= Φ. We say that the model-checking problem for L on C is fixed-parameter tractable, if there is a constant p and a function g : N → N such that for every A ∈ C and every L-sentence Φ, checking whether A |= Φ can be done in time g( Φ ) · A p . We say that the model-checking problem for L on C is fixed-parameter linear, if p = 1; that is, if there is a function g : N → N such that for every A ∈ C and every L-sentence Φ, checking whether A |= Φ can be done in time g( Φ )· A . We now prove that on structures of bounded degree, model-checking for FO is fixed-parameter linear. The proof is based on Hanf-locality of FO. 6.6 Parametric Complexity and Locality 101 Theorem 6.19. Fix l > 0. Then the model-checking problem for FO on STRUCTl[σ] is fixed-parameter linear. Proof. We use threshold equivalence and Theorem 4.24. Given l and Φ, we can find numbers d and m such that for every A, B ∈ STRUCTl[σ], it is the case that A⇆thr d,mB implies that A and B agree on Φ. We know that for structures of fixed degree l, the upper bound on the number of isomorphism types of radius d neighborhoods of a point is determined by d, l, and σ. We assume that τ1, . . . , τM enumerate isomorphism types of all the structures of the form NA d (a) for A ∈ STRUCTl[σ]. Let ni(A) =| {a | NA d (a) of type τi} |. With each structure A, we now associate an M-tuple t(A) = (t1, . . . , tM ) such that ti = ni(A), if ni(A) ≤ m, ∗ otherwise. Let T be the set of all M-tuples whose elements come from {1, . . ., m} ∪ {∗}. Note that the number of such tuples is (m + 1)M , which depends only on l and Φ, and that each t(A) is a member of T . From Theorem 4.24, t(A) = t(B) implies that A and B agree on Φ. Let T0 be the set of t ∈ T such that for some structure A ∈ STRUCTl[σ], we have A |= Φ and t(A) = t. We leave it as an exercise for the reader (see Exercise 6.7) to show that T0 is computable. The idea of the algorithm then is to compute, for a given structure A, the tuple t(A) in linear time. Once this is done, we check if t ∈ T0. The computation of T0 depends entirely on Φ and l, but not on A; hence the resulting algorithm has linear running time. For simplicity, we present the algorithm for computing t(A) for the case when A is an undirected graph; extension to the case of arbitrary A is straightforward. We compute, for each node i (assuming that nodes are numbered 0, . . . , n − 1), τ(i), the isomorphism type of its d-neighborhood. For this, we first do a pass over the code of A, and construct an array that, for each node i, has the list of all nodes j such that there is an edge (i, j). Note that the size of any such list is at most l. Next, we construct the radius d neighborhood of each node by looking up its neighbors, then the neighbors of its neighbors, etc., in the array constructed in the first step. After d iterations, we have radius d neighborhood, whose size is bounded by a number that depends on the Φ and l but not on A. Now for each i, we find j ≤ M such that τ(i) = τj; since the enumeration τ1, . . . , τM does not depend on A, each such step takes constant time. Finally, we do one extra pass over (τ(i))i and compute t(A). Hence, t(A) is computed in linear time. As we already explained, to check if A |= Φ, we check if t ∈ T0, which takes constant time. Hence, the entire algorithm has linear running time. Can one prove a similar result for FO queries on arbitrary structures? The answer is most likely no, assuming some separation results in complexity 102 6 Complexity of First-Order Logic theory (see Exercise 6.9). In fact, these results show that even fixed-parameter tractability is very unlikely for arbitrary structures. Nevertheless, fixed-parameter tractability can be shown for some interesting classes of structures. Recall that a graph H is a minor of a graph G if H can be obtained from a subgraph of G by contracting edges. A class C of graphs is called minor-closed if for any G ∈ C and H a minor of G, we have H ∈ C. Theorem 6.20. If C is a minor-closed class of graphs which does not include all the graphs, then model-checking for FO on C is fixed-parameter tractable. The proof of this (hard) theorem is not given here (see Exercise 6.10). Corollary 6.21. Model-checking for FO on the class of planar graphs is fixedparameter tractable. 6.7 Conjunctive Queries In this section we introduce a subclass of FO queries that plays a central role in database theory. This is the class of conjunctive queries. These are the queries most commonly asked in relational databases; in fact any SQL SELECT-FROM-WHERE query that only uses conjunction of attribute equalities in the WHERE clause is such. Logically this class has a simple characterization. Definition 6.22. A first-order formula ϕ(x) over a relational vocabulary σ is called a conjunctive query if it is built from atomic formulae using only conjunction ∧ and existential quantification ∃. By renaming variables and pushing existential quantifiers outside, we can see that every conjunctive query can be expressed as ϕ(x) = ∃y k i=1 αi(x, y), (6.3) where each αi is either of the form R(u), where R ∈ σ and u is a tuple of variables from x, y, or u = v, where u, v are variables from x, y or constant symbols. We have seen an example of a conjunctive query in Chap. 1: to test if there is a path of length k + 1 between x and x′ in a graph E, one can write ∃y1, . . . , yk R(x, y1) ∧ R(y1, y2) ∧ . . . ∧ R(yk−1, yk) ∧ R(yk, x′ ). To see how conjunctive queries can be evaluated, we introduce the concept of a join of two relations. Suppose we have a formula ϕ(x1, . . . , xm) over vocabulary σ. For each A ∈ STRUCT[σ], this formula defines an m-ary relation ϕ(A) = {a | A |= ϕ(a)}. We can view ϕ(A) as an m-ary relation with 6.7 Conjunctive Queries 103 attributes x1, . . . , xm: that is, a set of finite mappings {x1, . . . , xm} → A. Viewing ϕ(A) as a relation with columns and rows lets us name individual columns. Suppose now we have two relations over A: an m-ary relation S and an l-ary relation R, such that R is viewed as a set of mappings t : X → A and S is viewed as a set of mappings t : Y → A. Then the join of R and S is defined as R ⋊⋉ S = {t : X ∪ Y → A | t|X∈ R, t|Y ∈ S}. (6.4) Suppose that R is ϕ(A) where ϕ has free variables (x, z), and S is ψ(A) where ψ has free variables (y, z). How can one construct R ⋊⋉ S? According to (6.4), it consists of tuples (a, b, c) such that ϕ(a, c) and ψ(b, c) hold. Thus, R ⋊⋉ S = [ϕ ∧ ψ](A). As another operation corresponding to conjunctive queries, consider again a relation R viewed as a set of finite mappings t : X → A, and let Y ⊆ X. Then the projection of R on Y is defined as πY (R) = {t : Y → A | ∃t′ ∈ R : t′ |Y = t}. (6.5) Again, if R is ϕ(A), where ϕ has free variables (x, y), then πy(R) is simply [∃x ϕ(x, y)](R). Now suppose we have a conjunctive query ϕ(y) ≡ ∃x α1(u1) ∧ . . . ∧ αn(un) , (6.6) where each αi(ui) is an atomic formula S(ui) for some S ∈ σ, and ui is a list of variables among x, y. Then for any structure A, ϕ(A) = πy α1(A) ⋊⋉ . . . ⋊⋉ αn(A) . (6.7) A slight extension of the correspondence between conjunctive queries and the join and projection operations involves queries of the form ϕ(y) ≡ ∃x α1(u1) ∧ . . . ∧ αn(un) ∧ β(x, u) , (6.8) where β is a conjunction of formulae u1 = u2, where u1 and u2 are variables occurring among u1, . . . , un. Suppose we have a relation R, again viewed as a set of finite mappings t : X → A, and a set C of conditions xi = xj, for xi, xj ∈ X. Then the selection operation, σC(R), is defined as {t : X → A | t ∈ R, t(xi) = t(xj) for all xi = xj ∈ C}. If R is ϕ(A), then σC(R) is simply [ϕ ∧ β](R), where β is the conjunction of all the conditions xi = xj that occur in C. For β being as in (6.8), let Cβ be the list of all equalities listed in β. Then, using the selection operation, the most general form of a conjunctive query above can be translated into 104 6 Complexity of First-Order Logic πy σCβ α1(A) ⋊⋉ . . . ⋊⋉ αn(A) . (6.9) Many common database queries are of the form (6.9): they compute the join of two or more relations, select some tuples from them, and output only certain elements of those tuples. These can be expressed as conjunctive queries. The data complexity of conjunctive queries is the same as for general FO queries: uniform AC0 . For the combined and expression complexity, we can lower the Pspace bound of Theorem 6.16. Theorem 6.23. The combined and expression complexity of conjunctive queries are NP-complete (even for Boolean conjunctive queries). Proof. It is easy to see that the combined complexity is NP: for the query given by (6.3) and a tuple a, to check if ϕ(a) holds, one has to guess a tuple b and then check in polynomial time if i αi(a, b) holds. For completeness, we use reduction from 3-colorability, defined in Chap. 1 (and known to be NP-complete). Define a structure A = {0, 1, 2}, N , where N is the binary inequality relation: N = {(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)}. Suppose we are given a graph with the set of nodes U = {a1, . . . , an}, and a set of edges E ⊆ U × U. We then define the following Boolean conjunctive query: ∃x1 . . . ∃xn (ai,aj )∈E N(xi, xj). (6.10) Note that for a given graph U, E , the query Φ can be constructed in deterministic logarithmic time. For the query Φ given by (6.10), A |= Φ iff there is an assignment of variables xi, 1 ≤ i ≤ n, to {0, 1, 2} such that for every edge (ai, aj), the corresponding values xi and xj are different. That is, A |= Φ iff U, E is 3-colorable, which provides the desired reduction, and thus proves NP-completeness for the combined (and expression, since A is fixed) complexity of conjunctive queries. As for the data complexity of conjunctive queries, so far we have seen no results that would distinguish it from the data complexity of FO. We shall now see one result that lowers the complexity of conjunctive query evaluation rather significantly, under certain assumptions on the structure of queries. Unlike Theorem 6.19, this result will apply to arbitrary structures. Recall that in general, an FO sentence Φ can be evaluated on a structure A in time O( Φ · A k ), where k is the width of Φ. We shall now lower this to O( Φ · A ) for the class of acyclic conjunctive queries. That is, for a certain class of queries, we shall prove that they are fixed-parameter linear on the class of all finite structures. To define this class of queries, we need a few preliminary definitions. 6.7 Conjunctive Queries 105 Let H be a hypergraph: that is, a set U and a set E of hyper-edges, or subsets of U. A tree decomposition of H is a tree T together with a set Bt ⊆ U for each node t of T such that the following two conditions hold: 1. For every a ∈ U, the set {t | a ∈ Bt} is a subtree of T . 2. Every hyper-edge of H is contained in one of the Bt’s. A hypergraph H is acyclic if there exists a tree decomposition of H such that each Bt, t ∈ T , is a hyper-edge of H. Definition 6.24. Given a conjunctive query ϕ(y) ≡ ∃x α1(u1) ∧ . . . ∧ αn(un) , its hypergraph H(ϕ) is defined as follows. Its set of nodes is the set of all variables used in ϕ, and its hyper-edges are precisely u1, . . . , un. We say that ϕ is acyclic if the hypergraph H(ϕ) is acyclic. For example, let Φ ≡ ∃x∃y∃z R(x, y) ∧ R(y, z). Then H(Φ) is a hypergraph on {x, y, z} with edges {(x, y), (y, z)}. A tree decomposition of H(Φ) would have two nodes, say t1 and t2, with an edge from t1 to t2, and Bt1 = {x, y}, Bt2 = {y, z}. Hence, Φ is acyclic. As a different example, let Φ′ ≡ ∃x∃y∃z R(x, y) ∧ R(y, z) ∧ R(z, x). Then H(Φ′ ) is a hypergraph on {x, y, z} with edges {(x, y), (y, z), (z, x)}. Assume it is acyclic. Then there is some tree decomposition of H(Φ′ ) in which the sets Bt include {x, y}, {y, z}, {x, z}. By a straightforward inspection, there is no way to assign these sets to nodes of a tree so that condition 1 of the definition of tree decomposition would hold. Hence, Φ′ is not acyclic. In general, for binary relations, hypergraph and graph acyclicity coincide. To give an example involving hyper-edges, consider a query Ψ ≡ ∃x∃y∃z∃u∃v R(x, y, z) ∧ R(z, u, v) ∧ S(u, z) ∧ S(x, y) ∧ S(v, w) . Its hypergraph has hyper-edges {x, y, z}, {z, u, v}, {u, z}, {x, y}, {v, w}. The maximal edges of this hypergraph are shown in Fig. 6.2 (a). This hypergraph is acyclic. Indeed, consider a tree with three nodes, t1, t2, t3, and edges (t1, t2) and (t1, t3). Define Bt1 as {z, u, v}, Bt2 as {x, y, z}, and Bt3 as {v, w} (see Fig. 6.2 (b)). This defines an acyclic tree decomposition of H(Ψ). If, on the other hand, we consider a query Ψ′ ≡ ∃x∃y∃z∃u∃v R(x, y, z) ∧ R(z, u, v) ∧ R(x, v, w) then one can easily check that H(Φ′ ) (shown in Fig. 6.2 (c)) is not acyclic. We now show that acyclic conjunctive queries are fixed-parameter tractable (in fact, fixed-parameter linear) over arbitrary structures. The result below is given for Boolean conjunctive queries; for extension to queries with free variables, see Exercise 6.13. 106 6 Complexity of First-Order Logic x y z u v w t1 t2 t3 {z, u, v} {x, y, z} {v, w} x y z u v w (a) (b) (c) Fig. 6.2. Cyclic and acyclic hypergraphs Theorem 6.25. Let Φ be a Boolean acyclic conjunctive query over σstructures, and let A ∈ STRUCT[σ]. Then checking whether A |= Φ can be done in time O( Φ · A ). Proof. Let Φ be Φ ≡ ∃x1 . . . xm n i=1 αi(ui), where each αi(ui) is of the form S(ui) for S ∈ σ, and ui contains some variables from x. The case when some of the αi’s are variable equalities can be shown by essentially the same argument, by adding one selection over the join of all αi(A)’s. We use a known result that if H is acyclic, then its tree decomposition satisfying the condition that each Bt is a hyper-edge of H can be computed in linear time. Furthermore, one can construct this decomposition so that Bt1 ⊆ Bt2 for any t1 = t2. Hence, we assume that we have such a decomposition (T , (Bt)t∈T ) for H(Φ), computed in time O( Φ ). Let ≺ denote the partial order of T , with the root being the smallest node. From the acyclicity of H, it follows that there is a bijection between maximal, with respect to ⊆, sets ui, and nodes t of T . For each i, let νi be the node t such that ui is contained in Bt. This node is unique: we look for the maximal uj that contains ui, and find the unique node t such that Bt = uj. We now define Rt = ⋊⋉ i∈[1,n] νi=t αi(A). (6.11) Our goal is now to compute the join of all Rt’s, since (6.7) implies that A |= Φ ⇔ ⋊⋉ t∈T Rt = ∅. (6.12) To show that (6.11) and (6.12) yield a linear time algorithm, we need two complexity bounds on computing projections and joins: πX(R) can be computed in time O( R ), and R ⋊⋉ S can be computed in time O( R + S + R ⋊⋉ S ) (see Exercise 6.12). 6.7 Conjunctive Queries 107 To see that each Rt can be computed in linear time, let it be such that uit = Bt (it exists since the query is acyclic). Then Rt = αit (uit ) ⋊⋉ αi1 (ui1 ) ⋊⋉ . . . ⋊⋉ αik (uik ), where all uij ⊆ uit , j ≤ k. Hence Rt ⊆ αit (A). Using the above bounds for computing joins and projections, we conclude that the entire family Rt, t ∈ T , can be computed in time O( Φ · A ). We define Pt = ⋊⋉ v t Rv, where is the partial order of T , with the root r being the smallest element. If t is a leaf of T , then Pt = Rt. Otherwise, let t be a node with children t1, . . . , tl. Then Pt = Rt ⋊⋉ ⋊⋉ 1≤i≤l ⋊⋉ v ti Rv = Rt ⋊⋉ ⋊⋉ 1≤i≤l Pti . (6.13) Using (6.13) inductively, we compute Pr = ⋊⋉tRt in time O( T · maxt Rt ). We saw that Rt ≤ A for each t, and, furthermore, T can be computed from Φ in linear time. Hence, Pr can be found in time O( Φ · A ), which together with (6.12) implies that A |= Φ can be tested with the same bounds. This completes the proof. There is another interesting way to connect tree decompositions with tractability of conjunctive queries. Suppose we have a conjunctive query ϕ(x) given by (6.3). We define its graph G(ϕ), whose set of vertices is the set of variables used in ϕ, with an edge between two variables u and v if there is an atom αi such that both u and v are its free variables. For example, if ϕ(x, y) ≡ ∃z∃v R(x, y, z) ∧ S(z, v), then G(ϕ) has undirected edges (x, y), (x, z), (y, z), and (z, v). A tree decomposition of G(ϕ) is a tree decomposition, as defined earlier, when we view G(ϕ) as a hypergraph. In other words, it consists of a tree T , and a set Bt of nodes of G(ϕ) for each t ∈ T , such that 1. {t | v ∈ Bt} forms a subtree of T for each v, and 2. for every edge (u, v), both u and v are in one of the Bt’s. The width of a tree decomposition is maxt |Bt | −1. The treewidth of G(ϕ) is the minimum width of a tree decomposition of G(ϕ). It is easy to see that the treewidth of a tree is 1. For k > 0, let CQk be the class of conjunctive queries ϕ such that the treewidth of G(ϕ) is at most k. Then the following can be shown. Theorem 6.26. Let k > 0 be fixed, and let ϕ be a query from CQk. Then, for every structure A, one can compute ϕ(A) in polynomial time in Φ + A + ϕ(A) . In particular, Boolean queries from CQk can be evaluated in polynomial time in Φ + A . 108 6 Complexity of First-Order Logic In other words, conjunctive-query evaluation becomes tractable for queries whose graphs have bounded treewidth. Exercise 6.15 shows that the converse holds, under certain complexity-theoretic assumptions. 6.8 Bibliographic Notes The notions of data, expression, and combined complexity are due to Vardi [244], see also [3]. Representation of first-order formulae by Boolean circuits is fairly standard, see, e.g., books [133] and [247]. Proposition 6.6 was explicitly shown by Vardi [245]. Theorem 6.7 is perhaps the deepest result in circuit complexity. It was proved by Furst, Saxe, and Sipser [86] (see also Ajtai [10] and Denenberg, Gurevich, and Shelah [55]). The notion of uniformity and its connection with logical descriptions of complexity classes was studied by Barrington, Immerman, and Straubing [16]. Proofs of FO(<, BIT) = FO(+++,×××) are given in [133] and – partially – in [247]. The proof of expressibility of BIT (Theorem 6.12) follows closely the presentation in Buss [29] and Cook [40]. Pspace-completeness of FO (expression complexity) and of QBF is due to Stockmeyer [222]. The idea of using parameterized complexity as a refinement of the notions of the data and expression complexity was proposed by Yannakakis [250], and developed by Papadimitriou and Yannakakis [196]. Parameterized complexity is treated in a book by Downey and Fellows [58]; see also surveys by Grohe [109, 111]. Theorem 6.19 is from Seese [219], Theorem 6.20 is from Flum and Grohe [81]. The notion of conjunctive queries is a fundamental one in database theory, see [3]. NP-completeness of conjunctive queries (combined complexity) is due to Chandra and Merlin [34]. Fixed-parameter linearity of acyclic conjunctive queries is due to Yannakakis [249]; the presentation here follows closely Flum, Frick, and Grohe [80]. A linear time algorithm for producing tree decompositions of hypergraphs, used in Theorem 6.25, is due to Tarjan and Yannakakis [228]. Flum, Frick, and Grohe [80] show how to extend the notion of acyclicity to FO formulae. Theorem 6.26 and Exercise 6.15 are from Grohe, Schwentick, and Segoufin [114]. See also Gottlob, Leone, and Scarcello [96] for additional results on the complexity of acyclic conjunctive queries. 6.9 Exercises 109 Sources for exercises: Exercise 6.4: Dawar et al. [50] Exercise 6.5: Immerman [133], Vollmer [247] Exercise 6.9: Papadimitriou and Yannakakis [196] Exercise 6.12: Flum, Frick, and Grohe [80] Exercises 6.13 and 6.14: Flum, Frick, and Grohe [80] Yannakakis [249] Exercise 6.15: Grohe, Schwentick, and Segoufin [114] Exercise 6.16: Flum and Grohe [81] Exercise 6.18: Gottlob, Leone, and Scarcello [96] Exercise 6.19: Chandra and Merlin [34] 6.9 Exercises Exercise 6.1. Show that none of the following is expressible in FO(All): transitive closure of a graph, testing for planarity, acyclicity, 3-colorability. Exercise 6.2. Prove that ⌊ √ x⌋ is expressible in FO(+++,×××). Exercise 6.3. Consider two countable undirected graphs. For the first one, the universe is N, and we have an edge between i and j iff BIT(i, j) or BIT(j, i) is true. In the other graph, the universe is N+ = {n ∈ N | n > 0} and there is an edge between n and m, for n > m, iff n is divisible by pm, the mth prime. Prove that these graphs are isomorphic. Hint: if you find it hard to do all the calculations required for the proof, you may want to wait until Chap. 12, which introduces some powerful logical tools that let you prove results of this kind without using any number theory at all (see Exercise 12.9, part a). Exercise 6.4. Show that the standard linear order is expressible in FO(BIT). Conclude that FO(+++,×××) = FO(BIT). Exercise 6.5. Prove Lemma 6.14. You may find it useful to show that the following predicate is expressible in FO(+++,×××): BitSum(x, y) iff the number of ones in the binary representation of x is y. Exercise 6.6. Prove that QBF is Pspace-complete. Exercise 6.7. We stated in the proof of Theorem 6.19 that the set of tuples t ∈ T for which there exists a structure A with t(A) = t and A |= Φ is computable. Prove this statement, using the assumption that A is of bounded degree. Derive bounds on the constant in the O( A ) running time. Exercise 6.8. Give an example of a two-element structure over which the expression complexity of conjunctive queries is NP-hard. Recall that in the proof of Theorem 6.23, we used a structure whose universe had three elements. 110 6 Complexity of First-Order Logic Exercise 6.9. In this exercise, we refer to parameterized complexity class W [1] whose definition can be found in [58, 81]. This class is believed to contain problems which are not fixed-parameter tractable. Prove that checking A |= Φ, with Φ being the parameter, is W [1]-hard, even if Φ is a conjunctive query. Thus, it is unlikely that FO (or even conjunctive queries) are fixed-parameter tractable. Exercise 6.10. Derive Theorem 6.20 from the following facts. H is an excluded minor of a class of graphs C if no G ∈ C has H as a minor. If such an H exists, then C is called a class of graphs with an excluded minor. • If C is a minor-closed class of graphs, membership in C can be verified in Ptime (see Robertson and Seymour [205]). • If C is a Ptime-decidable class of graphs with an excluded minor, then checking Boolean FO queries on C is fixed-parameter tractable (see Flum and Grohe [81]). Exercise 6.11. Prove that an order-invariant conjunctive query is FO-definable without the order relation. That is, (CQ+ <)inv ⊆ FO. Exercise 6.12. Prove that R ⋊⋉ S can be evaluated in O( R + S + R ⋊⋉ S ). Exercise 6.13. Extend the proof of Theorem 6.25 to deal conjunctive queries with free variables, by showing that ϕ(A), for an acyclic ϕ, can be computed in time O( ϕ · A · ϕ(A) ). Also show that if the set of free variables of ϕ is contained in one of the Bt’s, for a tree decomposition of H(ϕ), then the evaluation can be done in time O( ϕ · A ). Exercise 6.14. Extend Theorem 6.25 and Exercise 6.13 to conjunctive queries with negation; that is, conjunctive queries in which some atoms are of the form x = y, where x and y are variables. Exercise 6.15. Under the complexity-theoretic assumption that W [1] contains problems which are not fixed-parameter tractable (see Exercise 6.9), the converse to Theorem 6.26 holds: if for a class of graphs C, it is the case that every conjunctive query ϕ with G(ϕ) ∈ C can be evaluated in time polynomial in Φ + A + ϕ(A) , then C has bounded treewidth (i.e., there is a constant k > 0 such that every graph in C has treewidth at most k). Exercise 6.16. We say that a class of structures C ⊆ STRUCT[σ] has bounded treewidth if there is k > 0 such that for every A ∈ C, the treewidth of its Gaifman graph is at most k. Prove that FO is fixed-parameter tractable on classes of structures of bounded treewidth. Exercise 6.17. Give an example of a conjunctive query which is of treewidth 2 but not acyclic. Also, give an example of a family of acyclic conjunctive queries that has queries of arbitrarily large treewidth. Exercise 6.18. Given a hypergraph H, its hypertree decomposition is a triple (T, (Bt)t∈T , (Ct)t∈T ) such that (T, (Bt)t∈T ) is a tree decomposition of H, and each Ct is a set of hyper-edges. It is required to satisfy the following two properties for every t ∈ T: 6.9 Exercises 111 1. Bt ⊆ S Ct; 2. S Ct ∩ S v t Bv ⊆ Bt. The hypertree width of H is defined as the minimum value of maxt∈T | Ct |, taken over all hypertree decompositions of H. Prove the following: (a) A hypergraph is acyclic iff its hypertree width is 1. (b) For each fixed k, conjunctive queries whose hypergraphs have hypertree width at most k can be evaluated in polynomial time. Note that this does not contradict the result of Exercise 6.15 which refers to graph-based (as opposed to hypergraph-based) classes of conjunctive queries. Exercise 6.19. Suppose ϕ1(x) and ϕ2(x) are two conjunctive queries. We write ϕ1 ⊆ ϕ2, if ϕ1(A) ⊆ ϕ2(A) for all A (in other words, ∀x ϕ1(x) → ϕ2(x) is valid in all finite structures). We write ϕ1 = ϕ2 if both ϕ1 ⊆ ϕ2 and ϕ2 ⊆ ϕ1 hold. Prove that testing both ϕ1 ⊆ ϕ2 and ϕ1 = ϕ2 is NP-complete. Exercise 6.20.∗ Use Ehrenfeucht-Fra¨ıss´e games to prove that parity is not expressible in FO(+++,×××). 7 Monadic Second-Order Logic and Automata We now move to extensions of first-order logic. In this chapter we introduce second-order logic, and consider its often used fragment, monadic secondorder logic, or MSO, in which one can quantify over subsets of the universe. We study the expressive power of this logic over graphs, proving that its existential fragment expresses some NP-complete problems, but at the same time cannot express graph connectivity. Then we restrict our attention to strings and trees, and show that, over them, MSO captures regular string and tree languages. We explore the connection with automata to prove further definability and complexity results. 7.1 Second-Order Logic and Its Fragments We have seen a few examples of second-order formulae in Chap. 1. The idea is that in addition to quantification over the elements of the universe, we can also quantify over subsets of the universe, as well as binary, ternary, etc., relations on it. For example, to express the query even, we can say that there are two disjoint subsets U1 and U2 of the universe A such that A = U1 ∪ U2 and there is a one-to-one mapping F : U1 → U2. This is expressed by a formula ∃U1 ∃U2 ∃F ϕ, where ϕ is an FO formula in the vocabulary (U1, U2, F) stating that U1 and U2 form a partition of the universe (∀x (U1(x) ↔ ¬U2(x))), and that F ⊆ U1 ×U2 is functional, onto, and one-to-one. Note that the formula ϕ in this example has three second-order free variables U1, U2, and F. We now formally define second-order logic. Definition 7.1 (Second-order logic). The definition of second-order logic, SO, extends the definition of FO with second-order variables, ranging over subsets and relations on the universe, and quantification over such variables. We 114 7 Monadic Second-Order Logic and Automata assume that for every k > 0, there are infinitely many variables Xk 1 , Xk 2 , . . ., ranging over k-ary relations. A formula of SO can have both first-order and second-order free variables; we write ϕ(x, X) to indicate that x are free firstorder variables, and X are free second-order variables. Given a vocabulary σ that consists of relation and constant symbols, we define SO terms and formulae, and their free variables, as follows: • Every first-order variable x, and every constant symbol c, are first-order terms. The only free variable of a term x is the variable x, and c has no free variables. • There are three kinds of atomic formulae: – FO atomic formulae; that is, formulae of the form – t = t′ , where t, t′ are terms, and – R(t ), where t is a tuple of terms, and R ∈ σ, and – X(t1, . . . , tk), where t1, . . . , tk are terms, and X is a second-order variable of arity k. The free first-order variables of this formula are free first-order variables of t1, . . . , tk; the free second-order variable is X. • The formulae of SO are closed under the Boolean connectives ∨, ∧, ¬, and first-order quantification, with the usual rules for free variables. • If ϕ(x, Y, X) is a formula, then ∃Y ϕ(x, Y, X) and ∀Y ϕ(x, Y, X) are formulae, whose free variables are x and X. The semantics is defined as follows. Suppose A ∈ STRUCT[σ]. For each formula ϕ(x, X), we define the notion A |= ϕ(b, B), where b is a tuple of elements of A of the same length as x, and for X = (X1, . . . , Xl), with each Xi being of arity ni, B = (B1, . . . , Bl), where each Bi is a subset of Ani . We give the semantics only for constructors that are different from those for FO: • If ϕ(x, X) is X(t1, . . . , tk), where X is k-ary and t1, . . . , tk are terms, with free variables among x, then A |= ϕ(b, B) iff the tuple (tA 1 (b), . . . , tA k (b)) is in B. • If ϕ(x, X) is ∃Y ψ(x, Y, X), where Y is k-ary, then A |= ϕ(b, B) if for some C ⊆ Ak , it is the case that A |= ψ(b, C, B). • If ϕ(x, X) is ∀Y ψ(x, Y, X), and Y is k-ary, then A |= ϕ(b, B) if for all C ⊆ Ak , we have A |= ψ(b, C, B). We know that every FO formula can be written in the prenex normal form Q1x1 . . . Qnxn ψ, where Qi’s are ∃ or ∀, and ψ is quantifier-free. Likewise, every SO formula can be written as a sequence of first- and second-order quantifiers, followed by a quantifier-free formula. Furthermore, note the following equivalences: 7.1 Second-Order Logic and Its Fragments 115 ∃x Q ϕ(x, ·) ↔ ∃X Q ∃x (X(x) ∧ ϕ(x, ·)) (7.1) ∀x Q ϕ(x, ·) ↔ ∀X Q ∃!x X(x) → ∀x (X(x) → ϕ(x, ·)) , (7.2) where Q stands for an arbitrary sequence of first- and second-order quantifiers. Using those inductively, we can see that every SO formula is equivalent to a formula in the form Q1X1 . . . QnXnQ1x1 . . . Qlxl ψ, (7.3) where QiXi are second-order quantifiers, Qjxj are first-order quantifiers, and ψ is quantifier-free. We now define some restrictions of the full SO logic of interest to us. The first one is the central notion studied in this chapter. Definition 7.2. Monadic SO logic, or MSO, is defined as the restriction of SO where all second-order variables have arity 1. In other words, in MSO, second-order variables range over subsets of the universe. Rules (7.1) and (7.2) do not take us out of MSO, and hence every MSO formula is equivalent to one in the normal form (7.3), where the second-order quantifiers precede the first-order quantifiers. Definition 7.3. Existential SO logic, or ∃SO, is defined as the restriction of SO that consists of the formulae of the form ∃X1 . . . ∃Xn ϕ, where ϕ does not have any second-order quantification. If, furthermore, all Xi’s have arity 1, the resulting restriction is called existential monadic SO, or ∃MSO. If the second-order quantifier prefix consists only of universal quantifiers, we speak of the universal SO logic, or ∀SO, and its further restriction to monadic quantifiers is referred to as ∀MSO. In other words, an ∃SO formula starts with a second-order existential prefix ∃X1 . . . ∃Xn, and what follows is an FO formula ϕ (in the original vocabulary expanded with X1, . . . , Xn). Formula (1.2) from Chap. 1 stating the 3-colorability of a graph is an example of an ∃MSO formula, while (1.3) stating the existence of a clique of a given size is an example of an ∃SO formula. Definition 7.4. The quantifier rank of an SO formula is defined as the maximum depth of quantifier-nesting, including both first-order and second-order quantifiers. That is, the rules for the quantifier rank for FO are augmented with • qr(∃X ϕ) = qr(∀X ϕ) = qr(ϕ) + 1. 116 7 Monadic Second-Order Logic and Automata 7.2 MSO Games and Types MSO can be characterized by a type of Ehrenfeucht-Fra¨ıss´e game, which is fairly close to the game we have used for FO. As in the case of FO, the game is also closely connected with the notion of type. Let MSO[k] consist of all MSO formulae of quantifier-rank at most k. An MSO rank-k m, l-type is a consistent set S of MSO[k] formulae with m free first-order variables and l free second-order variables such that for every ϕ(x1, . . . , xm, X1, . . . , Xl) from MSO[k], either ϕ ∈ S or ¬ϕ ∈ S. Given a structure A, an m-tuple a ∈ A, and an l-tuple V of subsets of A, the MSO rank-k type of (a, U) in A is the set mso-tpk(A, a, V ) = {ϕ(x, X) ∈ MSO[k] | A |= ϕ(a, V )}. Clearly, mso-tpk(A, a, V ) is an MSO rank-k type. When both a and V are empty, mso-tpk(A) is the set of all MSO[k] sentences that are true in A. Just as for FO, a simple inductive argument shows that for each m and l, up to logical equivalence, there are only finitely many different formulae ϕ(x1, . . . , xm, X1, . . . , Xl) in MSO[k]. Hence, MSO rank-k m, l types (where m and l stand for the number of free first-order and second-order variables, respectively) are essentially finite objects. In fact, just as for FO, one can show the following result for MSO. Proposition 7.5. Fix k, l, m. • There exist only finitely many MSO rank-k m, l types. • Let T1, . . . , Ts enumerate all the MSO rank-k m, l types. There exist MSO[k] formulae αi(x, X), i = 1, . . . , s, such that for every structure A, every m-tuple a of elements of A, and every l-tuple U of subsets of A, it is the case that A |= αi(a, U) iff mso-tpk(A, a, U) = Ti. Furthermore, each MSO[k] formula with m free first-order variables and l free second-order variables is equivalent to a disjunction of some of the αi’s. Hence, just as in the case of FO, we shall associate rank-k types with their defining formulae, which are also of quantifier rank k. We now present the modification of Ehrenfeucht-Fra¨ıss´e games for MSO. Definition 7.6. An MSO game is played by two players, the spoiler and the duplicator, on two structures A and B of the same vocabulary σ. The game has two different kinds of moves: 7.2 MSO Games and Types 117 Point move This is the same move as in the Ehrenfeucht-Fra¨ıss´e game for FO: the spoiler chooses a structure, A or B, and an element of that structure; the duplicator responds with an element in the other structure. Set move The spoiler chooses a structure, A or B, and a subset of that structure. The duplicator responds with a subset of the other structure. Let a1, . . . , ap ∈ A and b1, . . . , bp ∈ B be the point moves played in the k-round game, with V1, . . . , Vs ⊆ A and U1, . . . , Us ⊆ B being the set moves (i.e., p+s = k, and the moves of the same round have the same index). Then the duplicator wins the game if (a, b) is a partial isomorphism of (A, V ) and (B, U). If the duplicator has a winning strategy in the k-round MSO game on A and B, we write A ≡MSO k B. Furthermore, we write (A, a0, V0) ≡MSO k (B, b0, U0) if the duplicator has a winning strategy in the k-round MSO game on A and B starting with position ((a0, V0), (b0, U0)). That is, when k rounds of the game a, b, V , U are played, (a0a, b0b) is a partial isomorphism between (A, V0, V ) and (B, U0, U). This game captures the expressibility in MSO[k]. Theorem 7.7. Given two structures A and B, two m-tuples a0, b0 of elements of A and B, and two l-tuples V0, U0 of subsets of A and B, we have mso-tpk(A, a0, V0) = mso-tpk(B, b0, U0) ⇔ (A, a0, V0) ≡MSO k (B, b0, U0). That is, (A, a0, V0) ≡MSO k (B, b0, U0) iff for every MSO[k] formula ϕ(x, X), A |= ϕ(a0, V0) ⇔ B |= ϕ(b0, U0). The proof is essentially the same as the proof of Theorem 3.9, and is left to the reader as an exercise (see Exercise 7.1). In the case of sentences, Theorem 7.7 gives us the following. Corollary 7.8. If A and B are two structures of the same vocabulary, then A ≡MSO k B iff A and B agree on all the sentences of MSO[k]. As for FO, the method of games is complete for expressibility in MSO. Proposition 7.9. A property P of σ-structures is expressible in MSO iff there is a number k such that for every two σ-structures A, B, if A has the property P and B does not, then the spoiler wins the k-round MSO game on A and B. Proof. Assume P is expressible by a sentence Φ of quantifier rank k. Let α1, . . . , αs enumerate all the MSO rank-k types (without free variables). Then P is equivalent to a disjunction of some of the αi’s. Hence, if A has P and B does not, there is some i such that A |= αi and B |= ¬αi, and thus A ≡MSO k B. 118 7 Monadic Second-Order Logic and Automata Conversely, suppose that we can find k ≥ 0 such that for every A having P and B not having P, we have A ≡MSO k B. Now take any two structures A1 and A2 such that A1 ≡MSO k A2. Suppose A1 has P. If A2 does not have P, we would conclude A1 ≡MSO k A2, which contradicts the assumption; hence A2 has P as well. Thus, P is a union of rank-k MSO types. Since there are finitely many of them, and each is definable by a rank-k MSO sentence, we conclude that P is MSO[k]-definable. Most commonly, we use the contrapositive of this proposition, which tells us when some property is not expressible in MSO. Corollary 7.10. A property P of σ-structures is not expressible in MSO iff for every k ≥ 0, one can find Ak, Bk ∈ STRUCT[σ] such that: • Ak has the property P, • Bk does not have the property P, and • Ak ≡MSO k Bk. Our next goal it to use games to study expressibility in MSO. A useful technique is the composition of MSO games, which allows us to construct more complex games from simpler ones. Similarly to Exercise 3.15, we can show the following. Lemma 7.11. Let A1, A2, B1, B2 be σ-structures, and let A be the disjoint union of A1 and A2, and B the disjoint union of B1 and B2. Assume A1 ≡MSO k B1 and A2 ≡MSO k B2. Then A ≡MSO k B. Proof sketch. Assume the spoiler makes a point move, say a in A. Then a is in A1 or A2. Suppose a is in A1; then the duplicator selects a response b in B1 according to his winning strategy on A1 and B1. Assume the spoiler makes a set move, say U ⊆ A. The universe A is the disjoint union of A1 and A2, the universes of A1 and A2. Let Ui = U ∩ Ai, i = 1, 2. Let Vi be the response of the duplicator to Ui in Bi, i = 1, 2, according to the winning strategy. Then the response to U is V = V1 ∪ V2. It is routine to verify that, using this strategy, the duplicator wins in k rounds. As an application of the composition argument, we prove the following. Proposition 7.12. Let σ = ∅. Then even is not MSO-expressible. Proof. We claim that for every A and B with |A|, |B| ≥ 2k , it is the case that A ≡MSO k B. Clearly this implies that even is not MSO-definable. Since σ = ∅, we shall write U ≡MSO k V instead of the more formal (U, ∅) ≡MSO k (V, ∅). We prove the statement by induction on k. The cases of k = 0 and k = 1 are easy, so we show how to go from k to k + 1. Suppose A and B with | A |, |B| ≥ 2k+1 are given. We only consider a set move by the spoiler, since any point move a can be identified with the set move {a}. Assume that in the first move, the spoiler plays U ⊆ A. We distinguish the following cases: 7.3 Existential and Universal MSO on Graphs 119 1. |U| ≤ 2k . Then pick an arbitrary set V ⊆ B such that |V |=|U |. We have U ∼= V (and thus U ≡MSO k V ), and A − U ≡MSO k B − V – the latter is by the induction hypothesis, since |A − U|, |B − V | ≥ 2k . Combining the two games, we see that from the position (U, V ) on A and B, the duplicator can continue the game for k rounds, and hence A ≡MSO k+1 B. 2. |A−U| ≤ 2k . This case is treated in exactly the same way as the previous one. 3. |U| > 2k and |A − U| > 2k . Since |B| ≥ 2k+1 , we can find a subset V ⊆ B such that both |V | and |B − V | are at least 2k . By the induction hypothesis, we know that U ≡MSO k V and A − U ≡MSO k B − V , and hence from (U, V ), the duplicator can play for k more rounds, thus proving A ≡MSO k+1 B. Suppose now that the vocabulary is expanded by one binary symbol < interpreted as a linear ordering; that is, we deal with finite linear orders. Then even is expressible in MSO. To see this, we let our MSO sentence guess the set that consists of alternating elements a1, a3, . . . , a2n+1, . . . in the ordering a1 < a2 < a3 < . . ., such that the first element is in this set, and the last element is not: ∃X   ∀x (first(x) → X(x)) ∧ ∀x (last(x) → ¬X(x)) ∧ ∀x∀y succ<(x, y) → (X(x) ↔ ¬X(y))   , where first(x) stands for ∀y (y ≥ x), last(x) stands for ∀y (y ≤ x), and succ<(x, y) stands for (x < y) ∧ ¬∃z (x < z ∧ z < y). Thus, as for FO, we have a separation between the ordered and unordered case. Noticing that even is an order-invariant query, we obtain the following. Corollary 7.13. MSO (MSO+ <)inv. Note the close connection between Corollary 7.13 and Theorem 5.3: the latter showed that FO (FO+ <)inv, and the separating example was the parity of the number of atoms of a Boolean algebra. We used the Boolean algebra to simulate monadic second-order quantification; in MSO it comes for free, and hence even worked as a separating query. 7.3 Existential and Universal MSO on Graphs In this section we study two restrictions of MSO: existential MSO, or ∃MSO, and universal MSO, or ∀MSO, whose formulae are respectively of the form ∃X1 . . . ∃Xn ϕ and 120 7 Monadic Second-Order Logic and Automata ∀X1 . . . ∀Xn ϕ, where ϕ is first-order. These also are commonly found in the literature under the names monadic Σ1 1 for ∃MSO and monadic Π1 1 for ∀MSO, where monadic, of course, refers to second-order quantification over sets. In general, Σ1 k consists of formulae whose prefix of second-order quantifiers consists of k blocks, with the first block being existential. For example, a formula ∃X1∃X2∀Y1∃Z1ψ is a Σ1 3 -formula. The class Π1 k is defined likewise, except that the first block of quantifiers is universal. Another name for ∃MSO is monadic NP, and ∀MSO is referred to as monadic coNP. The reason for these names will become clear in Chap. 9, when we prove Fagin’s theorem. We now give an example of a familiar property that separates monadic Π1 1 from monadic Σ1 1 (i.e., ∀MSO from ∃MSO). Proposition 7.14. Graph connectivity is expressible in ∀MSO, but is not expressible in ∃MSO. Proof. A graph is not connected if its nodes can be partitioned into two nonempty sets with no edges between them: ∃X ∃x X(x) ∧ ∃x ¬X(x) ∧ ∀x∀y (X(x) ∧ ¬X(y) → ¬E(x, y)) (7.4) Since (7.4) is an ∃MSO sentence, its negation, expressing graph connectivity, is a universal MSO sentence. For the converse, we use Hanf-locality. Suppose that connectivity is definable by an ∃MSO sentence Φ ≡ ∃X1 . . . ∃Xmϕ. Assume without loss of generality that m > 0. Since ϕ is a first-order sentence (over structures of vocabulary σ extended with X1, . . . , Xn), it is Hanf-local. Let d = hlr(ϕ), the Hanf-locality rank of ϕ. That is, if (G, U1, . . . , Um)⇆d(G′ , V1, . . . , Vm), where G, G′ are graphs and the Ui’s and the Vi’s interpret Xi’s over them, then (G, U1, . . . , Um) and (G′ , V1, . . . , Vm) agree on ϕ. We now set K = 2m(2d+1) and r = (4d+4)K. We claim the following: if G is an m-colored graph (i.e., a graph on which m unary predicates are defined), which is a cycle of length at least r, then there exist two nodes a and b such that the distance between them is at least 2d + 2, and their d-neighborhoods are isomorphic. Indeed, for a long enough cycle, the d-neighborhood of each node a is a chain of length 2d + 1 with a being the middle node. Each node on the chain can belong to some of the Ui’s, and there are 2m possibilities for choosing a subset of indexes 1, . . . , m of Ui’s such that a ∈ Ui. Hence, there are at most K different isomorphism types of d-neighborhoods. If the length of the cycle is at least (4d + 4)K, then there is one type of d-neighborhoods which 7.3 Existential and Universal MSO on Graphs 121 . . . .. . . . . . .. a a′ bb′ Fig. 7.1. Illustration for the proof of Proposition 7.14 is realized by at least 4d + 4 elements, and hence two of those elements will be at distance at least 2d + 2 from each other. Now let G be a cycle of length at least r. Since G is a connected graph, we have G |= Φ. Let U1, . . . , Um witness it; that is, (G, U1, . . . , Um) |= ϕ. Choose a, b such that a ≈ (G,U1,...,Um) d b and d(a, b) > 2d + 1, and let a′ , b′ be their successors (in an arbitrarily chosen orientation of G; the one shown in Fig. 7.1 is the clockwise orientation). We now construct a new graph G′ by removing edges (a, a′ ) and (b, b′ ) from G, and adding edges (a, b′ ) and (b, a′ ). We claim that for every node c, N (G,U1,...,Um) d (c) ∼= N (G′ ,U1,...,Um) d (c). (7.5) First, since a and b are at the distance at least 2d + 2, the d-neighborhood of any point in G or G′ is a chain of length 2d + 1. If c is at the distance d or greater from a and b, its d-neighborhood is the same in (G, U1, . . . , Um) and (G′ , U1, . . . , Um), which means that (7.5) holds. Suppose now that the distance between c and a is d0 < d, and assume that c precedes a in the clockwise orientation of G. Then the d predecessors of c are the same in both structures. Furthermore, since a ≈ (G,U1,...,Um) d b, in both structures the d − d0 successors of a agree on all the Ui’s. Hence, (7.5) holds for c. The remaining cases (again, viewing G in the clockwise order) are those of c preceding b, or following a or a′ and being at the distance less than d from them. In all of those cases the same argument as above proves (7.5). We have thus established a bijection f between the universes of (G, U1, . . . , Um) and (G′ , U1, . . . , Um) (which is in fact the identity) that wit- nesses (G, U1, . . . , Um) ⇆d (G′ , U1, . . . , Um). Since d = hlr(ϕ), we conclude that (G′ , U1, . . . , Um) |= ϕ, and hence G′ |= ∃X1 . . . ∃Xm ϕ; that is, G′ |= Φ. But G′ is not a connected graph, which contradicts our assumption that Φ is an ∃MSO sentence defining graph con- nectivity. 122 7 Monadic Second-Order Logic and Automata Notice that the formula (7.4) from the proof of Proposition 7.14 shows that the negation of graph connectivity is ∃MSO-expressible, which means that ∃MSO can express queries that are not Hanf-local. One can also show that other forms of locality are violated in ∃MSO (see Exercise 7.6). We now consider a related property of reachability. We assume that the language of graphs is augmented by two constants, s and t, and we are interested in the property, called (s, t)-reachability, that asks whether there is a path from s to t in a given graph. We have seen that undirected connectivity is not ∃MSO-definable; surprisingly, undirected (s, t)-reachability is! Proposition 7.15. For undirected graphs without loops, (s, t)-reachability is expressible in ∃MSO. Proof. Consider the sentence ϕ in the language of graphs expanded with one unary relation X that says the following: 1. both s and t are in X, 2. both s and t have an edge to exactly one member of X, and 3. every member of X except s and t has edges to precisely two members of X. Let Φ be ∃X ϕ. We claim that G |= Φ iff there is a path from s to t in G. Indeed, if there is a path from s to t, we can take X to be the shortest path from s to t. Conversely, if (G, X) |= ϕ, then X is a path that starts in s; since the graph G is finite, X must contain the last node on the path, which could be only t. The approach of Proposition 7.15 does not work for directed graphs, because of back edges. Consider, for example, a directed graph which consists of a chain {(s, a1), (a1, a2), (a2, a3), (a3, t)} together with the edge (a3, a1). The only path between s and t consists of edges s, a1, a2, a3, t; however, if we let X = {s, a1, a2, a3, t}, the sentence ϕ from the proof of Proposition 7.15 is false, since a3 has one incoming edge, and two outgoing edges. It seems that the approach of Proposition 7.15 could be generalized if there is a bound on degrees in the input graph, and this is indeed the case (Exercise 7.7). However, in general, one can show a negative result. Theorem 7.16. Reachability for directed graphs is not expressible in ∃MSO. We conclude this section by showing that there are games that characterize expressibility in ∃MSO, much in the same way as Ehrenfeucht-Fra¨ıss´e games and MSO games characterize expressibility in FO and MSO. Definition 7.17. The l, k-Fagin game on two structures A, B ∈ STRUCT[σ] is played as follows. The spoiler selects l subsets U1, . . . , Ul of A. Then the duplicator selects l subsets V1, . . . , Vl of B. After that, the spoiler and the 7.3 Existential and Universal MSO on Graphs 123 duplicator play k rounds of the Ehrenfeucht-Fra¨ıss´e game on (A, U1, . . . , Ul) and (B, V1, . . . , Vl). The winning condition for the duplicator is that after k rounds of the Ehrenfeucht-Fra¨ıss´e game, the elements played on (A, U1, . . . , Ul) and (B, V1, . . . , Vl) form a partial isomorphism between these two structures. A fairly simple generalization of the previous game proofs shows the fol- lowing. Proposition 7.18. A property P of σ-structures is ∃MSO-definable iff there exist l and k such that for every A ∈ STRUCT[σ] having P, and for every B ∈ STRUCT[σ] not having P, the spoiler wins the l, k-Fagin game on A and B. This game, however, is often rather inconvenient for the duplicator to play (after all, we use games to show that a certain property is inexpressible in a logic, so we need the win for the duplicator). A somewhat surprising result (see Exercise 7.9) shows that a different game that is easier for the duplicator to win, also characterizes the expressiveness of ∃MSO. Definition 7.19. Let P be a property of σ-structures (that is, a class of σstructures closed under isomorphism). The P, l, k-Ajtai-Fagin game is played as follows: 1. The duplicator selects a structure A ∈ P. 2. The spoiler selects l subsets U1, . . . , Ul of A. 3. The duplicator selects a structure B ∈ P, and l subsets V1, . . . , Vl of B. 4. The spoiler and the duplicator play k rounds of the Ehrenfeucht-Fra¨ıss´e game on (A, U1, . . . , Ul) and (B, V1, . . . , Vl). The winning condition for the duplicator is that after k rounds of the Ehrenfeucht-Fra¨ıss´e game, the elements played on (A, U1, . . . , Ul) and (B, V1, . . . , Vl) form a partial isomorphism between these two structures. Intuitively, this game is easier for the duplicator to win, because he selects the second structure B and the coloring of it only after he has seen how the spoiler chose to color the first structure A. Proposition 7.20. A property P of σ-structures is ∃MSO-definable iff there exist l and k such that the spoiler has a winning strategy in the P, l, k-AjtaiFagin game. Hence, to show that a certain property P is not expressible in ∃MSO, it suffices to construct, for every l and k, a winning strategy for the duplicator in the P, l, k-Ajtai-Fagin game. This is easier than a winning strategy in the l, k-Fagin game, since the duplicator sees the sets Ui’s before choosing the second structure B for the game. An example is given in Exercise 7.10. 124 7 Monadic Second-Order Logic and Automata 7.4 MSO on Strings and Regular Languages We now study MSO on strings. Recall that a string over a finite alphabet can be represented as a first-order structure. For example, the string s = abaab is represented as {1, 2, 3, 4, 5}, <, Pa, Pb , where < is the usual ordering, and Pa and Pb contain positions in s where a (or b, respectively) occurs: that is, Pa = {1, 3, 4} and Pb = {2, 5}. In general, for a finite alphabet Σ, we define the vocabulary σΣ that contains a binary symbol < and unary symbols Pa for each a ∈ Σ. A string s ∈ Σ∗ of length n is then represented as a structure Ms ∈ STRUCT[σΣ] whose universe is {1, . . . , n}, with < interpreted as the order on the natural numbers, and Pa being the set of positions where the letter a occurs, for each a in Σ. Suppose we have a sentence Φ of some logic L, in the vocabulary σΣ. Such a sentence defines a language, that is, a subset of Σ∗ , given by L(Φ) = {s ∈ Σ∗ | Ms |= Φ}. (7.6) We say that a language L is definable in a logic L if there exists an L-sentence Φ such that L = L(Φ). The following is a fundamental result that connects MSO-definability and regular languages. Theorem 7.21 (B¨uchi). A language is definable in MSO iff it is regular. Proof. We start by showing how to define every regular language L in MSO. If L is regular, then its strings are accepted by a deterministic finite automaton A = (Q, q0, F, δ), where Q = {q0, . . . , qm−1} is the set of states, q0 ∈ Q is the initial state, F ⊆ Q is the set of final states, and δ : Q × Σ → Q is the transition function. We take Φ to be the MSO sentence ∃X0 . . . ∃Xm−1 ϕpart ∧ ϕstart ∧ ϕtrans ∧ ϕaccept. (7.7) In this sentence, we are guessing m sets X0, . . . , Xm−1 that correspond to elements of the universe of Ms (i.e., positions of s) where the automaton A is in the state q0, q1, . . . , qm−1, respectively, and the remaining three first-order formulae ensure that the behavior of A is simulated correctly. That is: • ϕpart asserts that X0, . . . , Xm−1 partition the universe of Ms. This is easy to express in FO: ∀x m−1 i=0 Xi(x) ∧ j=i ¬Xj(x) . 7.4 MSO on Strings and Regular Languages 125 • ϕstart asserts that the automaton starts in state q0: ∀x a∈Σ Pa(x) ∧ ∀y (y ≥ x) → Xδ(q0,a)(x) . Note some abuse of notation: δ(q0, a) = qi for some i, but we write Xδ(q0,a) instead of Xi. • ϕtrans asserts that transitions are simulated correctly: ∀x∀y m−1 i=0 a∈Σ (x ≺ y) ∧ Xi(x) ∧ Pa(y) → Xδ(qi,a)(y) , where x ≺ y means that y is the successor of x. • ϕaccepts asserts that at the end of the string, A enters an accepting state: ∀x ∀y (y ≤ x) → qi∈F Xi(x) . Hence, (7.7) captures the behavior of A, and thus L(Φ) = L. For the converse, let Φ be an MSO sentence in the vocabulary σΣ, and let k = qr(Φ). Let τ0, . . . , τm enumerate all the rank-k MSO types of σΣ structures (more precisely, rank-k 0, 0 types, with zero free first- and secondorder variables, or, in other words, sentences). Let Ψi be an MSO sentence of quantifier rank k defining the type τi. That is, Ms |= Ψi ⇔ mso-tpk(Ms) = τi. Since qr(Φ) = k, the sentence Φ is a disjunction of some of the Ψi’s. We define F ⊆ {τ0, . . . , τm} to be the set of types consistent with Φ. Then Φ is equivalent to τi∈F Ψi. We further assume that τ0 is the type of Mǫ, where ǫ denotes the empty string. That is, this is the only type among the τi’s that is consistent with ¬∃x (x = x). We now define the automaton AΦ = ({τ0, . . . , τm}, τ0, F, δΦ), (7.8) with the set of states S = {τ0, . . . , τm}, the initial state τ0, the set of final states F, and the transition function δΦ : S × Σ → 2S defined as follows: τj ∈ δF (τi, a) ⇔ ∃s ∈ Σ∗ mso-tpk(Ms) = τi and mso-tpk(Ms·a) = τj . (7.9) We now claim that the automaton AΦ is deterministic (i.e., for every τi and a ∈ Σ there is exactly one τj satisfying (7.9)). For that, notice that by a 126 7 Monadic Second-Order Logic and Automata composition argument similar to that of Lemma 7.11, if s1, s2, t1, t2 ∈ Σ∗ are such that Ms1 ≡MSO k Mt1 and Ms2 ≡MSO k Mt2 , then Ms1·s2 ≡MSO k Mt1·t2 . Now suppose that mso-tpk(Ms1 ) = mso-tpk(Ms2 ) = τi. In particular, Ms1 ≡MSO k Ms2 . Then Ms1·a ≡MSO k Ms2·a. Suppose also that we have j1 = j2 such that mso-tpk(Ms1·a) = τj1 and mso-tpk(Ms2·a) = τj2 . Then Ms1·a |= Ψj1 , but since Ms2·a |= Ψj2 and qr(Ψj2 ) = k, we obtain Ms1·a |= Ψj2 , which implies mso-tpk(Ms1·a) = τj2 = τj1 . This contradiction proves that the automaton (7.8) is deterministic. Now by a simple induction on the length of the string we prove that for any string s, after reading s the automaton AΦ ends in the state τi such that mso-tpk(Ms) = τi. For the empty string, this is our choice of τ0. Suppose now that mso-tpk(Ms) = τi and AΦ is in state τi after reading s. By the definition of the transition function δΦ and the fact that it is deterministic, if AΦ reads a, it moves to the state τj such that mso-tpk(Ms·a) = τj, which proves the statement. Therefore, AΦ accepts a string s iff mso-tpk(Ms) is in F, that is, is consistent with Φ. The latter happens iff Ms |= Φ, which proves that the language accepted by AΦ is L(Φ). This completes the proof. We have seen that over graphs, there are universal MSO-sentences which are not expressible in ∃MSO. In contrast, over strings every MSO sentence can be represented by an automaton, and (7.7) shows that the behavior of every automaton can be captured by an ∃MSO sentence. Hence, we obtain the following. Corollary 7.22. Over strings, MSO = ∃MSO. As an application of Theorem 7.21, we prove a few bounds on the expressive power of MSO. We have seen before that MSO over the empty vocabulary cannot express even. What about the power of MSO on linear orderings? Recall that Ln denotes a linear ordering on n elements. From Theorem 7.21, we immediately derive the following. Corollary 7.23. Let X ⊆ N. Then the set {Ln | n ∈ X} is MSO-definable iff the language {an | n ∈ X} is regular. Thus, MSO can test, for example, if the size of a linear ordering is even, or – more generally – a multiple of k for any fixed k. On the other hand, one cannot test in MSO if the cardinality of a linear ordering is a square, or the kth power, for any k > 1; nor is it possible to test if such a cardinality is a power of k > 1. As a more interesting application, we show the following. Corollary 7.24. It is impossible to test in MSO if a graph is Hamiltonian. 7.5 FO on Strings and Star-Free Languages 127 Proof. Let Kn,m denote the complete bipartite graph on sets of cardinalities n and m; that is, an undirected graph G whose nodes can be partitioned into two sets X, Y such that |X| = n, |Y | = m, and the set of edges is {(x, y), (y, x) | x ∈ X, y ∈ Y }. Notice that Kn,m is Hamiltonian iff n = m. Assume that Hamiltonicity is definable in MSO. Let Σ = {a, b}. Given a string s, we define, in FO, the following graph over the universe of Ms: ϕ(x, y) ≡ Pa(x) ∧ Pb(y) ∨ Pb(x) ∧ Pa(y) . That is, ϕ(Ms) is Kn,m, where n is the number of a’s in s, and m is the number of b’s. Thus, if Hamiltonicity were definable in MSO, the language {s ∈ Σ∗ | the number of a’s in s equals the number of b’s} would have been a regular language, but it is well known that it is not (by a pumping lemma argument). 7.5 FO on Strings and Star-Free Languages Since MSO on strings captures regular languages, what can be said about the class of languages captured by FO? It turns out that FO corresponds to a well-known class of languages, which we define below. Definition 7.25. A star-free regular expression over Σ is an expression built from the symbols ∅ and a, for each a in Σ, using the operations of union (+), complement (¯), and concatenation (·). Such a regular expression e denotes a language L(e) over Σ as follows: • L(∅) = ∅; L(a) = {a} for a ∈ Σ. • L(e1 + e2) = L(e1) ∪ L(e2). • L(¯e) = Σ∗ − L(e). • L(e1 · e2) = {s1 · s2 | s1 ∈ L(e1), s2 ∈ L(e2)}. A language denoted by a star-free expression is called a star-free language. Note that some of the regular expressions that use the Kleene star ∗ are actually star-free, because in the definition of star-free expressions one can use the operation of complementation. For example, suppose Σ = {a, b}. Then (a + b)∗ defines a star-free language, denoted by the star-free expression ¯∅. Likewise, e = a∗ b∗ also denotes a star-free language, since it can be characterized as a language in which there is no b preceding an a. A language with a b preceding an a can be defined as (a + b)∗ · ba · (a + b)∗ , and hence L(e) is defined by the star-free expression ¯∅ · b · a · ¯∅. Theorem 7.26. A language is definable in FO iff it is star-free. 128 7 Monadic Second-Order Logic and Automata Proof. We show that every star-free language is definable in FO by induction on the star-free expression. The empty language is definable by false, the language {a} is definable by ∃!x (x = x) ∧ ∀x Pa(x). If e = ¯e1 and L(e1) is definable by Φ, then ¬Φ defines L(e). If e = e1 + e2, with L(e1) and L(e2) definable by Φ1 and Φ2 respectively, then Φ1 ∨ Φ2 defines L(e). Now assume that e = e1 · e2, and again L(e1) and L(e2) are definable by Φ1 and Φ2. Let x be a variable that does not occur in Φ1 and Φ2, and let ϕi(x), i = 1, 2, be the formula obtained from Φ1 by relativizing each quantifier to the set of positions {y | y ≤ x} for ϕ1, and to {y | y > x} for ϕ2. More precisely, we inductively replace each subformula ∃yψ of Φ1 by ∃y (y ≤ x)∧ψ, and each such subformula of Φ2 by ∃y (y > x) ∧ ψ. Then, for a string s and a position p, we have Ms |= ϕ1(p) iff M≤p s |= Φ1, where M≤p s is the substructure of Ms with the domain {1, . . . , p}. Furthermore, Ms |= ϕ2(p) iff M>p s |= Φ2, where M>p s is the substructure of Ms whose universe is the complement of {1, . . . , p}. Hence, s ∈ L(e) iff Ms |= ∃x ϕ1(x) ∧ ϕ2(x), which proves that every star-free language is FO-definable. We now prove the other direction: every FO-definable language is star-free. For technical reasons (to get the induction off the ground), we expand σΣ with a constant max, to be interpreted as the largest element of the universe. Since max is FO-definable, this does not affect the set of FO-definable languages. The proof is now by induction on the quantifier rank k of a sentence Φ. Note that since star-free languages are closed under the Boolean operations, an arbitrary Boolean combination of sentences defining star-free languages also defines a star-free language. For k = 0, we have Boolean combinations of the sentences of the form Pa(max), as well as true and false. The sentence Pa(max) defines the language denoted by ¯∅ · a, true defines L(¯∅), and false defines L(∅). Given the closure under Boolean combinations, for the inductive step it suffices to consider sentences Φ = ∃xϕ(x), where qr(ϕ) = k. Let τ0, . . . , τm enumerate all the rank-k FO-types (again, with respect to sentences: we do not have free variables). We define SΦ = (τi, τj) for some s and a position p, Ms |= ϕ(p), tpk(M≤p s ) = τi and tpk(M>p s ) = τj . Our goal is now to show the following: for every string u, Mu |= Φ iff there exists a position p in u such that for some (τi, τj) in SΦ, we have tpk(M≤p u ) = τi and tpk(M>p u ) = τj. (7.10) First, we notice that this claim implies that the language L(Φ) is star-free. Indeed, each of τi is definable by an FO sentence Ψi of quantifier rank k, and hence by the induction hypothesis, each language L(Ψi) is star-free. Thus, L(Φ) = (τi,τj)∈SΦ L(Ψi) · L(Ψj). 7.6 Tree Automata 129 That is, L(Φ) is a union of concatenations of star-free languages, and hence it is star-free. If Mu |= Φ, then the existence of p and a pair (τi, τj) follows from the definition of SΦ. Conversely, suppose we have a string u and a position p such that (7.10) holds. Since (τi, τj) ∈ SΦ, we can find a string s with a position p′ in it such that Ms |= ϕ(p′ ), tpk(M≤p′ s ) = τi, and tpk(M>p s ) = τj. Hence, M≤p u ≡k M≤p′ s , M>p u ≡k M>p′ s , and thus (Mu, p) ≡k (Ms, p′ ). Since qr(ϕ) = k, it follows that Mu |= ϕ(p), and hence Mu |= Φ, as claimed. This completes the proof. Corollary 7.27. There exist regular languages which are not star-free. Proof. The language denoted by (aa)∗ is regular, but clearly not star-free, since even is not FO-definable over linear orders. 7.6 Tree Automata We now move from strings to trees. Our goal is to define trees as first-order structures, and study MSO over them. We shall connect MSO with the notion of tree automata. Tree automata play an important role in many applications, including rewriting systems, automated theorem proving, verification, and recently database query languages, especially in the XML context. We consider two kinds of trees in this section. Ranked trees have the property that every node which is not a leaf has the same number of children (in fact we shall fix this number to be 2, but all the results can be generalized to any fixed k > 1). On the other hand, in unranked trees different nodes can have a different number of children. We shall start with ranked (binary) trees. Definition 7.28. A tree domain is a subset D of {1, 2}∗ that is prefix-closed; that is, if s ∈ D and s′ is a prefix of D, then s′ ∈ D. Furthermore, if s ∈ D, then either both s · 1 and s · 2 are in D, or none of them is in D. A Σ-tree T is a pair (D, f) where D is a tree domain and f is a function from D to Σ (the labeling function). We refer to the elements of D as the nodes of T . Every nonempty tree domain has the node ǫ, which is called the root. A node s such that s·1, s·2 ∈ D is called a leaf. The first tree in Fig. 7.2 is a binary tree. We show both the nodes and the labeling in that picture. The nodes 111, 112, 12, 21, 22 are the leaves. We represent a tree T = (D, f) as a first-order structure MT = D, ≺, (Pa)a∈Σ, succ1, succ2 130 7 Monadic Second-Order Logic and Automata 1 11 111 112 12 2 21 22 ǫ a a b b a b b a a ǫ 1 11 111 113 112 2 21 22 3 31 32 33 331 Fig. 7.2. Examples of a ranked and an unranked tree of vocabulary σΣ expanded with two binary relations succ1 and succ2. Here ≺ is interpreted as the prefix relation on D (in particular, it is a partial order, rather than a linear order, as was the case with strings), Pa is interpreted as {s ∈ D | f(s) = a}, and succi is {(s, s · i) | s, s · i ∈ D}, for i = 1, 2. We let Trees(Σ) be the set of all Σ-trees. If we have a sentence Φ of some logic, it defines the set of trees (also called a tree language) LT (Φ) = {T ∈ Trees(Σ) | MT |= Φ}. Thus, we shall be talking about tree languages definable in various logics. Definition 7.29 (Tree automata and regular tree languages). A (nondeterministic) tree automaton is a tuple A = (Q, q0, δ, F), where Q is a finite set of states, q0 ∈ Q, F ⊆ Q is the set of final (accepting) states, and δ : Q × Q × Σ → 2Q is the transition function. Given a tree T = (D, f), a run of A on T is a function r : D → Q such that • if s is a leaf labeled a, then r(s) ∈ δ(q0, q0, a); • if r(s · 1) = q, r(s · 2) = q′ and f(s) = a, then r(s) ∈ δ(q, q′ , a). A run is called successful if r(ǫ) ∈ F (the root is in the accepting state). The set of trees accepted by A is the set of all trees T for which there exists a successful run. A tree language is called regular if it is accepted by a tree automaton. 7.6 Tree Automata 131 In a deterministic tree automaton, the transition function is δ : Q × Q × Σ → Q, and the definition of a run is modified as follows: • if s is a leaf labeled a, then r(s) = δ(q0, q0, a); • if r(s · 1) = q, r(s · 2) = q′ and f(t) = a, then r(s) = δ(q, q′ , a). For example, consider a deterministic tree automaton A whose set of states is {q0, qa, qb, q, q′ }, with F = {q′ }, and the transition function has the follow- ing: δ(q0, q0, a) = qa δ(q0, q0, b) = qb δ(qa, qb, b) = q δ(qa, qa, b) = q′ δ(q, qb, a) = q δ(q, q′ , a) = q′ . Then this automaton accepts the ranked tree shown in Fig. 7.2: following the definition of the transition function, we define the run r such that: • for the leaves, r(111) = r(21) = r(22) = qa and r(112) = r(12) = qb; • r(11) = δ(qa, qb, b) = q; • r(1) = δ(q, qb, a) = q; • r(2) = δ(qa, qa, b) = q′ ; and finally, • r(ǫ) = δ(q, q′ , a) = q′ , and since q′ ∈ F, the automaton accepts. We now establish the analog of Theorem 7.21 for trees, by showing that regular tree languages are precisely those definable in MSO. Theorem 7.30. A set of trees is definable in MSO iff it is regular. Proof. The proof is similar to that of Theorem 7.21. To find an MSO definition of the tree language accepted by an automaton A, we guess, for each state q, the set Xq of nodes where the run of A is in state q, and then check, in FO, that each leaf labeled a is in Xq for some q ∈ δ(q0, q0, a), that transitions are modeled properly, and that the root is in one of the accepting states. The sentence looks very similar to (7.7), and is in fact an ∃MSO sentence. The proof of the converse, i.e., that MSO only defines regular languages, again follows the proof in the string case. Suppose an MSO sentence Φ of quantifier rank k is given. We let τ0, . . . , τm enumerate all the rank-k MSO types, with τ0 being the type of the empty tree, and take {τ0, . . . , τm} as the set of states of an automaton AΦ. Since Φ is equivalent to a disjunction of types, we let F = {τi | τi is consistent with Φ}. Finally, τl ∈ δ(τi, τj, a) 132 7 Monadic Second-Order Logic and Automata a T1 T2 τi τj Fig. 7.3. Illustration for the proof of Theorem 7.30 if there are trees T1 and T2 whose rank-k MSO types are τi and τj, respectively, such that the rank-k MSO type of the tree obtained by hanging T1 and T2 as children of a root node labeled a is τl (see Fig. 7.3). Again, similarly to the proof of Theorem 7.21, one can show that AΦ is a deterministic tree automaton accepting the tree language {T | T |= Φ}. Corollary 7.31. Every tree automaton is equivalent to a deterministic tree automaton, and every MSO sentence over trees is equivalent to an ∃MSO sentence. The connection between FO-definability and star-free languages does not, however, extend to trees. There are several interesting logics between FO and MSO, and some of them will be introduced in exercises. We next show how to extend these results to unranked trees. Definition 7.32 (Unranked trees). An unranked tree domain is a subset D of {1, 2, . . .}∗ (finite words over positive integers) that is prefix-closed, and such that for s · i ∈ D and j < i, the string s · j is in D as well. An unranked tree is a pair (D, f), where D is an unranked tree domain, and f is the labeling function f : D → Σ. Thus, a node in an unranked tree can have arbitrarily many children. An example is shown in Fig. 7.2 (the second tree). Some nodes – the root, nodes 11 and 3 – have three children; some have two (node 2), some have one (nodes 1 and 33). The transition function for an automaton working on binary trees was of the form δ : Q × Q × Σ → Q, based on the fact that each nonleaf node has exactly two children. In an unranked tree, the number of children could be arbitrary. The idea of extending the notion of tree automata to the unranked case is then as follows: we have additional string automata that run on the children of each node, and the acceptance conditions of those automata determine the state of the parent node. This is formalized in the definition below. 7.7 Complexity of MSO 133 Definition 7.33 (Unranked tree automata). An unranked tree automaton is a triple A = (Q, q0, δ), where as before Q is the set of states, q0 is an element of Q, and δ is the transition function δ : Q × Σ → 2Q∗ such that δ(q, a) is a regular language over Q for every q ∈ Q and a ∈ Σ. Given an unranked tree T = (D, f), a run of A on T is defined as a function r : D → Q such that the following holds: • if s is a node labeled a, with children s · 1, . . . , s · n, then the string r(s · 1)r(s · 2) . . . r(s · n) is in δ(r(s), a). In particular, if s is a leaf, then r(s) = q implies that the empty string belongs to δ(q, a). A run is successful if r(ǫ) = q0, and T is accepted by A if there exists an accepting run. An unranked tree language L is called regular if it is accepted by an unranked tree automaton. To connect regular languages with MSO-definability, we have to represent unranked trees as structures. It is no longer sufficient to model just two successor relations, since a node can have arbitrarily many successors. Instead, we introduce an ordering on successor relations. That is, an unranked tree T = (D, f) is represented as a structure D, ≺, (Pa)a∈Σ, |B |, are inexpressible too. We first introduce two possible ways of extending FO that add counting power to it: one is to use counting quantifiers and two-sorted structures, the other is to use generalized unary quantifiers. We shall mostly concentrate on counting quantifiers, as unary quantifiers can be simulated with them. We shall see a very powerful counting logic, expressing arbitrary properties of cardinalities, and yet we show that this logic is local. We also address the problem of complexity of some of the counting extensions of FO. 8.1 Counting and Unary Quantifiers Suppose we want to find an extension of FO capable of expressing the parity query: if U is a unary predicate in the vocabulary σ, and A ∈ STRUCT[σ], is |UA | even? How can one do it? One approach is to add enough expressiveness to the logic to find cardinalities of some sets: for example, sets definable by other formulae. Thus, if we have a formula ϕ(x), we want to find the cardinality of ϕ(A) = {a | A |= ϕ(a)}. The problem is that | ϕ(A) | is a number, and hence the logic must be adequately equipped to deal with numbers. To be able to use |ϕ(A)|, we introduce counting quantifiers: ∃ix ϕ(x) 142 8 Logics with Counting is a formula with a new free variable i, which states that there are at least i elements a of A such that ϕ(a) holds. The variable i must range over some numerical domain (which, as we shall see, is different for different counting logics). On that numerical domain, we should have some arithmetic operations available (e.g., addition and multiplication), as well as quantification over it, so that sentences in the logic could be formed. Without yet giving a formal definition of the logic that extends FO with counting quantifiers, we show, as an example, how parity is definable in it: ∃i∃j (i = j + j) ∧ ∃ixϕ(x) ∧ ∀k (k > i) → ¬∃kx ϕ(x) . This sentence says that we can find an even number i (since it is of the form 2j) such that exactly i elements satisfy ϕ(x): that is, at least i elements satisfy ϕ, and for every k > i, we cannot find k elements that satisfy ϕ. Note that we really have two different kinds of variables: variables that range over the domain of A, and variables that range over some numerical domain. Such a logic is called two-sorted. Formally, a structure for such a logic has two universes: one is the non-numerical universe (we shall normally refer to it as first-sort universe) and the numerical, second-sort universe. We now give the formal definition of the logic FO(Cnt). Definition 8.1 (FO with counting). Given a vocabulary σ, a σ-structure for FO with counting, FO(Cnt), is a structure of the form {a0, . . . , an−1}, {0, . . ., n − 1}, (Ri)A , +, ×, min, max where {a0, . . . , an−1}, (Ri)A is a structure from STRUCT[σ] (Ri ranges over the symbols in σ), + and × are ternary relations {(i, j, k) | i + j = k} and {(i, j, k) | i·j = k} on {0, . . . , n−1}, min denotes 0 and max denotes n−1. We shall assume that the universes {a0, . . . , an−1} and {0, . . . , n−1} are disjoint. Formulae of FO(Cnt) can have free variables of two sorts, ranging over the two universes. We normally use i, j, k, ı,  for second-sort variables. FO(Cnt) extends the definition of FO by the following rules: • min, max are terms of the second sort. Also, every second-sort variable i is a term of the second sort. • If t1, t2, t3 are terms of the second sort, then +(t1, t2, t3) and ×(t1, t2, t3) are formulae (which we shall normally write as t1+t2 = t3 and t1·t2 = t3). • If ϕ(x, ı) is a formula, then ∃i ϕ(x, ı) is a formula. The quantifier ∃i binds the second-sort variable i. • If ϕ(y, x, ı) is a formula, then ψ(x, i, ı) ≡ ∃iyϕ(y, x, ı) is a formula. The quantifier ∃iy binds the first-sort variable y but not the second-sort variable i. 8.1 Counting and Unary Quantifiers 143 For the semantics of this logic, only the last item needs explanation. Suppose we have a structure A, and we fix an interpretation a for x (from {a0, . . . , an−1}), ı0 for i, and i0 for i (from {0, . . . , n − 1}). Then A |= ψ(a, i0, ı0) iff |{b ∈ {a0, . . . , an−1} | A |= ϕ(b, a, ı0)}| ≥ i0. If we have a σ-structure A, there is a two-sorted structure A′ naturally associated with A. Assuming A = {a0, . . . , an−1}, we let the numerical domain of A′ be {0, . . . , n − 1}, with min and max interpreted as 0 and n − 1, and + and × getting their usual interpretations. Hence, for A ∈ STRUCT[σ], we shall write A |= ϕ whenever ϕ is an FO(Cnt) formula, instead of the more formal A′ |= ϕ. Let us see a few examples of definability in FO(Cnt). First, the usual linear ordering on numbers is definable: i ≤ j iff ∃k (i + k = j). Note that this does not imply definability of ordering on the first-sort universe; in fact we shall see that with such an ordering, FO(Cnt) is more powerful than FO(Cnt) on unordered first-sort structures (similarly to the case of FO, shown in Theorem 5.3, and MSO, shown in Corollary 7.13). We can define a formula ∃!ixϕ(x, · · · ) saying that there are exactly i elements satisfying ϕ: ∃!ixϕ(x, · · · ) ≡ ∃ixϕ(x, · · · ) ∧ ∀k (k > i) → ¬∃kxϕ(x, · · · ) . We can also compare cardinalities of two sets. Suppose we have two formulae ϕ(x) and ψ(x); to test if |ϕ(A)|>|ψ(A)|, one could write ∃i ∃ixϕ(x) ∧ ¬∃ixψ(x) . One can also write a formula for the majority predicate MAJ(ϕ, ψ) testing if the set ϕ(A) contains at least half of the set ψ(A): ∃i∃j (∃!ix(ϕ(x) ∧ ψ(x))) ∧ (∃!jxψ(x)) ∧ (i + i ≥ j) . Note that the definition of FO(Cnt) allows us to use formulae of the form t1(ı) {=, >, ≥} t2(ı), where t1 and t2 are terms. For example, (i + i ≥ j) is ∃k (k = i + i ∧ k ≥ j). We now present another way of adding counting power to FO that does not involve two-sorted structures. Suppose we want to state that | ϕ(A) | is even. We define a new quantifier, Qeven, that binds one variable, and write Qevenx ϕ(x). In fact, more generally, for a formula with several free variables ϕ(x, y), we can construct a new formula Qevenx ϕ(x, y), with free variables y. Its semantics is defined as follows. If a is the interpretation for y, then A |= Qevenx ϕ(x, a) ⇔ |{b | A |= ϕ(b, a)}| mod 2 = 0. 144 8 Logics with Counting Using the same approach, we can do cardinality comparisons. For example, let QH be a quantifier that binds two variables; then for two formulae ϕ1(x, y) and ϕ2(z, y), we have a new formula ψ(y) ≡ QHx, z (ϕ1(x, y), ϕ2(z, y)) such that A |= ψ(a) ⇔ |{b | A |= ϕ1(b, a)}| = |{b | A |= ϕ2(b, a)}|. The quantifier QH is known as the H¨artig, or equicardinality, quantifier. Another example is the Rescher quantifier QR. The formation rule is the same as for the H¨artig quantifier, and A |= QRx, z (ϕ1(x, a), ϕ2(z, a)) ⇔ |{b | A |= ϕ1(b, a)}| > |{b | A |= ϕ2(b, a)}|. What is common to these definitions? In all the cases, we construct sets of the form ϕ(A, a) = {b ∈ A | A |= ϕ(b, a)} ⊆ A, and then make some cardinality statements about those sets. This idea admits a nice generalization. Definition 8.2 (Unary quantifiers). Let σu k be a vocabulary of k unary relation symbols U1, . . . , Uk, and let K ⊆ STRUCT[σu k ] be a class of structures closed under isomorphisms. Then QK is a unary quantifier and FO(QK) extends the set of formulae of FO with the following additional rule: if ψ1(x1, y1), . . . , ψk(xk, yk) are formulae, then QKx1 . . . xk(ψ1(x1, y1), . . . , ψk(xk, yk)) is a formula. (8.1) Here QK binds xi in the ith formula, for each i = 1, . . . , k. A free occurrence of a variable y in ψi(xi, yi) remains free in this new formula unless y = xi. The semantics of QK is defined as follows: A |= QKx1 . . . xk(ψ1(x1, a1), . . . , ψk(xk, ak)) ⇔ A, ψ1(A, a1), . . . , ψk(A, ak) ∈ K. (8.2) In this definition, ai is a tuple of parameters that gives the interpretation for those free variables of ψi(xi, yi) which are not equal to xi. If Q is a set of unary quantifiers, then FO(Q) is the extension of FO with the formation rule above for each QK ∈ Q. The quantifier rank of formulae with unary quantifiers is defined by the additional rule: qr(QKx1, . . . , xk(ψ1(x1, y1), . . . , ψk(xk, yk))) = max{qr(ψi(xi, yi)) | i ≤ k} + 1. (8.3) The three examples seen earlier are all unary quantifiers: for Qeven, the class K consists of structures A, U such that |U | is even; for QH, it consists of structures A, U1, U2 with |U1 |=|U2 |, and for QR, it consists of structures A, U1, U2 with | U1 |>| U2 |. Note that the usual quantifiers ∃ and ∀ are 8.2 An Infinitary Counting Logic 145 examples of unary quantifiers too: the classes of structures corresponding to them consist of A, U with U = ∅ and U = A, respectively. We shall see that the two ways of adding counting power to a logic – by means of counting quantifiers, or unary quantifiers – are essentially equivalent in their expressiveness. Formulae with counting quantifiers tend to be easier to understand, but the logic becomes two-sorted. Unary quantifiers, on the other hand, let us keep the logic one-sorted, but then a new quantifier has to be introduced for each counting property we wish to express. 8.2 An Infinitary Counting Logic The goal of this section is to introduce a very powerful counting logic: so powerful, in fact, that it can express arbitrary properties of cardinalities, even nonrecursive ones. Yet we shall see that this logic cannot address another limitation of FO, namely, expressing iterative computations. We shall later see another logic that expresses very powerful forms of iteration, and yet is unable to count. Both of these logics are based on the idea of expanding FO with infinitary connectives. Definition 8.3 (Infinitary connectives and L∞ω). The logic L∞ω is defined as an extension of FO with infinitary connectives and : if ϕi’s are formulae, for i ∈ I, where I is not necessarily finite, and the free variables of all the ϕi’s are among x, then i∈I ϕi and i∈I ϕi are formulae. Their free variables are those variables in x that occur freely in one of the ϕ’s. The semantics is defined as follows: A |= i∈I ϕi(a) if for some i ∈ I, it is the case that A |= ϕi(a), and A |= i∈I ϕ(a) if A |= ϕi(a) for all i ∈ I. This logic per se is too powerful to be of interest in finite model theory, in view of the following. Proposition 8.4. Let C be a class of finite structures closed under isomorphism. Then there is an L∞ω sentence ΦC such that A ∈ C iff A |= ΦC. Proof. Recall that by Lemma 3.4, for every finite B, there is a sentence ΦB such that A |= ΦB iff A ∼= B. Hence we take ΦC to be B∈C ΦB. Clearly, A |= ΦC iff A ∈ C. 146 8 Logics with Counting However, we can make logics with infinitary connectives useful by putting some restrictions on them. Our goal now is to define a two-sorted counting logic L∗ ∞ω(Cnt). We do it in two stages: first, we extend L∞ω with some counting features, and second, we impose restrictions that make the logic suitable in the finite model theory context. The structures for this logic are two-sorted, but the second sort is no longer interpreted as an initial segment of the natural numbers: now it is the whole set N. Furthermore, there is a constant symbol for each k ∈ N (which we also denote by k). Hence, a structure is of the form {a1, . . . , an}, N, (RA i ), {k}k∈N , (8.4) where again {a1, . . . , an}, (RA i ) is a finite σ-structure, and Ri’s range over symbols in σ. We now define L∞ω(Cnt), an extremely powerful two-sorted logic, that extends infinitary logic L∞ω. Its structures are two-sorted structures (8.4), and the logic extends L∞ω by the following rules: • Each variable or constant of the second sort is a term of the second sort. • If ϕ is a formula and x is a tuple of free first-sort variables in ϕ, then #x.ϕ is a term of the second sort, and its free variables are those in ϕ except x. The interpretation of this term is the number of tuples a over the finite first-sort universe that satisfy ϕ. That is, given a structure A with the first-sort universe A, a formula ϕ(x, y, ı) and the interpretations b and ı0 for y and ı, respectively, the value of the term #x.ϕ(x, b, ı0) is |{a ∈ A|x| | A |= ϕ(a, b, ı0)}| . • Counting quantifiers ∃ixϕ, with the same semantics as before, except that i could be an arbitrary natural number. The logic L∞ω(Cnt) is enormously powerful: it can define not only every property of finite models (since it contains L∞ω), but also every predicate or function on N. That is, P ⊆ Nk is definable by ϕP (i1, . . . , ik) = (n1,...,nk)∈P (i1 = n1) ∧ . . . ∧ (ik = nk) . (8.5) Note that the definition is also redundant: for example, ∃ix ϕ can be replaced by #x.ϕ ≥ i. However, we need counting quantifiers separately, as will become clear soon. Next, we restrict the logic by defining the rank of a formula, rk(ϕ). Its definition is similar to that of quantifier rank, but there is one important difference. In a two-sorted logic, we may have quantification over two different universes. In the definition of the rank, we disregard quantification over N. Thus, rk(ϕ) and rk(t), where t is a term, are defined inductively as follows: 8.2 An Infinitary Counting Logic 147 • rk(t) = 0 if t is a variable, or a term k for k ∈ N. • rk(ϕ) = 0 if ϕ is an atomic formula of vocabulary σ (i.e., an atomic firstsort formula). • rk(t1 = t2) = max{rk(t1), rk(t2)}, where t1 and t2 are terms. • rk(¬ϕ) = rk(ϕ). • rk(#x.ϕ) = rk(ϕ)+ |x|. • rk( ϕj) = rk( ϕj) = supj rk(ϕj). • rk(∀x ϕ) = rk(∃x ϕ) = rk(∃ix ϕ) = rk(ϕ) + 1. • rk(∀i ϕ) = rk(∃i ϕ) = rk(ϕ). Note that if ϕ is an FO formula, then rk(ϕ) = qr(ϕ). Definition 8.5. L∗ ∞ω(Cnt) is defined as the restriction of L∞ω(Cnt) to formulae and terms that have finite rank. This logic is clearly closed under the Boolean connectives and both firstand second-sort quantification. It is not closed under infinitary connectives: for example, if Φi, i > 0, are L∗ ∞ω(Cnt) sentences such that rk(Φi) = i, then i Φi is not an L∗ ∞ω(Cnt) sentence. Note also that (8.5) implies that every subset of Nk , k > 0, is definable by an L∗ ∞ω(Cnt) formula of rank 0. Thus, we assume that +, ·, −, ≤, and in fact every predicate on natural numbers is available. To give an example, we can express properties like: there is a node in the graph whose in-degree i and out-degree j satisfy p2 i > pj where pi stands for the ith prime. This is done by ∃x∃i∃j (i = #y.E(y, x))∧(j = #y.E(x, y))∧P(i, j), where P is the predicate on N for the property p2 i > pj. Known expansions of FO with counting properties are contained in L∗ ∞ω(Cnt). Proposition 8.6. For every FO, FO(Cnt), or FO(Q) formula, where Q is a collection of unary quantifiers, there exists an equivalent L∗ ∞ω(Cnt) formula of the same rank. Proof. The proof is trivial for FO and FO(Cnt). For FO(Q), assume we have a formula ψ(y1, . . . , yk) ≡ QKx1 . . . xk.(ψ1(x1, y1), . . . , ψk(xk, yk)), (8.6) where K is a class of σu k -structures A = A, U1, . . . , Uk closed under isomorphism. Let Π be the set of all 2k mapping π : {1, . . . , k} → {0, 1}, and for a structure A ∈ K, let π(A) = i:π(i)=1 UA i ∩ j:π(j)=0 (A − UA j ) . 148 8 Logics with Counting With each structure A, we then associate a tuple Π(A) = (π(A))π∈Π, with π’s ordered lexicographically. Since K is a class of unary structures closed under isomorphism, A ∈ K and Π(A) = Π(B) imply B ∈ K. This provides a translation of (8.6) into L∗ ∞ω(Cnt) as follows. Let PK(n0, . . . , n2k−1) be the predicate on N that holds iff (n0, . . . , n2k−1) is of the form Π(A) for some A ∈ K. Then (8.6) translates into PK #x.ψπ0 (x, y1, . . . , yk), . . . , #x.ψπ2k−1 (x, y1, . . . , yk) , (8.7) where π0, . . . , π2k−1 is the enumeration of Π in the lexicographic ordering, and ψπ(x, y1, . . . , yk) = i:π(i)=1 ψi(x, yi) ∧ j:π(j)=0 ¬ψi(x, yi). Thus, if b1, . . . , bk interpret y1, . . . , yk, respectively, in a structure B, then the value of #x.ψπ(x, b1, . . . , bk) in B is precisely π B, ψ1(B, b1), . . . , ψk(B, bk) . Therefore, (8.7) holds for b1, . . . , bk in B iff the σu k -structure B, ψ1(B, b1), . . . , ψk(B, bk) is in K. This proves the equivalence of (8.6) and (8.7). Finally, since PK is a numerical predicate, it has rank 0, and hence the rank of (8.7) is max{rk(ψ1), . . . , rk(ψk)} + 1 = rk(ψ), which proves the proposition. In general, L∗ ∞ω(Cnt) can be viewed as an extremely powerful counting logic: we can define arbitrary cardinalities of sets of tuples over a structure, and on those, we can use arbitrary numerical predicates. Compared to L∗ ∞ω(Cnt), a logic such as FO(Cnt) restricts us in what sort of cardinalities we can define (only those of sets given by formulae in one free variable), and what operations we can use on those cardinalities (those definable with addition and multiplication). We now introduce what seems to be a drastic simplification of L∗ ∞ω(Cnt). Definition 8.7. The logic L◦ ∞ω(Cnt) is defined as L∗ ∞ω(Cnt) where counting terms #x.ϕ and quantification over N are not allowed. On the surface, L◦ ∞ω(Cnt) is a lot simpler than L∗ ∞ω(Cnt), mainly because counting terms for vectors, #x.ϕ, are very convenient for defining complex counting properties. But it turns out that the power of L◦ ∞ω(Cnt) and L∗ ∞ω(Cnt) is identical. Proposition 8.8. There is a translation ϕ → ϕ◦ of L∗ ∞ω(Cnt) formulae into L◦ ∞ω(Cnt) formulae such that ϕ and ϕ◦ are equivalent and rk(ϕ) = rk(ϕ◦ ). 8.2 An Infinitary Counting Logic 149 Proof. It is easy to eliminate quantifiers over N without increasing the rank: ∃i ϕ(i, · · · ) and ∀i ϕ(i, · · · ) are equivalent to k∈N ϕ(k, · · · ) and k∈N ϕ(k, · · · ), respectively. Thus, in the formulae below, we shall be using such quantifiers, assuming that they are eliminated in the last step of the translation from L∗ ∞ω(Cnt) to L◦ ∞ω(Cnt). To eliminate counting terms, assume without loss of generality that every occurrence of #x.ϕ is of the form #x.ϕ = #y.ψ or #x.ϕ = i, where i is a variable or a constant (if #x.ϕ occurs inside an arithmetic predicate P, we replace P by its explicit definition, using infinitary connectives). Since #x.ϕ = #y.ψ is equivalent to ∃i (#x.ϕ = i) ∧ (#y.ψ = i), whose rank is the same as the rank of #x.ϕ = #y.ψ, and #x.ϕ = k, for a constant k, is equivalent to ∃i (#x.ϕ = i ∧ i = k), we may assume that all occurrences of #-terms are of the form #x.ϕ = i, where i is a second-sort variable. The proof is now by induction on the formula. The only nontrivial case is ψ(y, ) ≡ (#x.ϕ(x, y, ) = i). Throughout this proof, we assume that i is in . By the hypothesis, there exists an L◦ ∞ω(Cnt) formula ϕ◦ which is equivalent to ϕ and has the same rank. We must now produce an L◦ ∞ω(Cnt) formula ψ◦ equivalent to ψ such that rk(ψ◦ ) = rk(ϕ)+ | x |. The existence of such a formula will follow from the lemma below. Lemma 8.9. Let ϕ(x, y, ) be an L◦ ∞ω(Cnt) formula. Then there exists an L◦ ∞ω(Cnt) formula γ(y, ) of rank rk(ϕ) + |x| such that γ is equivalent to #x.ϕ = i. Proof of the lemma is by induction on | x |. If x has a single component x, γ(y, ) is defined as ∃l (l = i) ∧ (∃!lx ϕ(x, y, )) , which has rank rk(ϕ) + 1. The quantifier ∃l denotes an infinite disjunction, as explained earlier. We next assume that x = zx0. By the hypothesis, there is an L◦ ∞ω(Cnt) formula α(x0, y, , l) equivalent to (l = #z.ϕ(z, x0, y, )) such that rk(α) = rk(ϕ)+ |z|. We define β(y, , k, l) ≡ ∃!kx0 α(x0, y, , l). Then rk(β) = rk(α) + 1 = rk(ϕ)+ |x|. The formula β(y, , k, l) holds iff there exist exactly k elements x0 such that the number of vectors x with x0 in the last position that satisfy ϕ(x, · · · ) is precisely l. Note that if β(y, , k, l) and β(y, , k′ , l) hold, then k′ must equal k. Thus, to check if #x.ϕ = i, one must check if 150 8 Logics with Counting β(··· ,k,l) holds (k · l) = i. This is done as follows. Let γp(y, ) be defined as: ∃i1 . . . ip∃j1 . . . jp          p s=1 β(y, , is, js) ∧ ∀i, j β(y, , i, j) → p s=1 (i = is ∧ j = js) ∧ s=s′ (¬(is = is′ ) ∨ ¬(js = js′ )) ∧ i1 · j1 + . . . + ip · jp = i          That is, γp says that there are precisely p pairs (is, js) that satisfy β(y, , k, l), and p s=1 is · js = i. When p = 0, we define γp(y, ) as (i = 0) ∧ ∀i′ , j′ (¬β(y, , i′ , j′ )). We can see that rk(γp) = rk(β). We finally define γ(y, ) ≡ p∈N γp(y, ). It follows that γ is an L◦ ∞ω(Cnt) formula of rank that is equal to rk(β), and hence to rk(ϕ)+ | x |, and that γ is equivalent to #x.ϕ = i. This completes the proof of the lemma and the proposition. We next consider L∗ ∞ω(Cnt)+ <; that is, L∗ ∞ω(Cnt) over ordered structures. We shall see in the next section that, as for FO, there is a separation L∗ ∞ω(Cnt) (L∗ ∞ω(Cnt)+ <)inv. As the first step, we show that L∗ ∞ω(Cnt)+ < defines every property of finite structures. Intuitively, with <, one can say that a given element of A is the first, second, etc., element of A. Then the unlimited counting power allows us to code finite structures with numbers. Proposition 8.10. Every property of finite ordered structures is definable in L∗ ∞ω(Cnt). Proof. We show this for sentences in the language of graphs. Let C be a class of ordered graphs. We assume without loss of generality that the set of nodes of each such graph is a set of the form {0, . . . , n}. Then the membership in C is tested by the following L∗ ∞ω(Cnt) sentence of rank 3: G∈C ∀x∀y E(x, y) ↔ (k,l)∈EG k = #z.(z < x) ∧ l = #z.(z < y) , where EG stands for the set of edges of G. 8.3 Games for L∗ ∞ω(Cnt) 151 We finish this section by presenting a one-sorted version of L∗ ∞ω(Cnt) that has the same expressiveness. This logic is obtained by adding infinitary connectives and unary quantifiers to FO. Let QAll be the collection of all unary quantifiers; that is, all quantifiers QK where K ranges over all collections of unary structures closed under isomorphism. We define a logic L∞ω(QAll) by extending L∞ω with the formation rules (8.1) for each QK ∈ QAll, with the semantics given by (8.2), and quantifier rank defined as in (8.3). We then define L∗ ∞ω(QAll) as the restriction of L∞ω(QAll) to formulae of finite quantifier rank. This logic turns out to express the same sentences as L∗ ∞ω(Cnt). The proof of the proposition below is left as an exercise for the reader. Proposition 8.11. For every L∗ ∞ω(Cnt) formula ϕ(x) without free secondsort variables, there is an equivalent L∗ ∞ω(QAll) formula ψ(x) such that rk(ϕ) = qr(ψ), and conversely, for every L∗ ∞ω(QAll) formula ψ(x), there is an equivalent L∗ ∞ω(Cnt) formula ϕ(x) with rk(ϕ) = qr(ψ). 8.3 Games for L∗ ∞ω(Cnt) We know that the expressive power of FO can be characterized via Ehrenfeucht-Fra¨ıss´e games. Is there a similar game characterization for L∗ ∞ω(Cnt)? We give a positive answer to this question, by showing that bijective games, introduced in Sect. 4.5, capture the expressiveness of L∗ ∞ω(Cnt). We first review the definition of the game. Definition 8.12 (Bijective games). A bijective Ehrenfeucht-Fra¨ıss´e game is played by two players, the spoiler and the duplicator, on two structures A, B ∈ STRUCT[σ]. If | A |=| B |, the spoiler wins the game. If | A |=| B |, in each round i = 1, . . . , n, the duplicator selects a bijection fi : A → B, and the spoiler selects a point ai ∈ A. The duplicator responds by bi = f(ai) ∈ B. The duplicator wins the n-round game if the relation {(ai, bi) | 1 ≤ i ≤ n} is a partial isomorphism between A and B. If the duplicator has a winning strategy in the n-round bijective game on A and B, we write A ≡bij n B. Note that it is harder for the duplicator to win the bijective game. First, if |A|=|B |, the duplicator immediately loses the game. Even if |A|=|B |, in each round the duplicator must figure out what his response to each possible move by the spoiler is, before the move is made, and there must be a one-to-one correspondence between the spoiler’s moves and the duplicator’s responses. In particular, any strategy where the same element b ∈ B could be used as a response to several moves by the spoiler is disallowed. Theorem 8.13. Given two structures A, B ∈ STRUCT[σ], and k ≥ 0, the following are equivalent: 1. A ≡bij k B; 152 8 Logics with Counting 2. A and B agree on all L∗ ∞ω(Cnt) sentences of rank k. Proof. Both implications 1 → 2 and 2 → 1 are proved by induction on k. We start with the easier implication 1 → 2. By Proposition 8.8, assume that there is no quantification over the numerical domain, and that all quantifiers are of the form ∃ix. For the base case k = 0, the proof is the same as in the case of Ehrenfeucht-Fra¨ıss´e games. We now assume that the implication holds for k, and we prove it for k +1. Suppose A ≡bij k+1 B. First consider a sentence of the form Φ ≡ ∃nxϕ(x) for a constant n ∈ N. Suppose A |= Φ, and let c1, . . . , cn be distinct elements of A such that A |= ϕ(ci), i = 1, . . . , n. Since A ≡bij k+1 B, there is a bijection f : A → B such that (A, a) ≡bij k (B, f(a)) for all a ∈ A; in particular, (A, ci) ≡bij k (B, f(ci)) for all i ≤ n. By the hypothesis, (A, ci) and (B, f(ci)) agree on sentences of rank k; hence A |= ϕ(ci) implies B |= ϕ(f(ci)). Since f is a bijection, all f(ci)’s are distinct, and thus B |= ∃nxϕ(x). The converse, that B |= Φ implies A |= Φ, is proved in exactly the same way, using the bijection f−1 . Since every sentence of rank k + 1 can be obtained from sentences of the form ∃nxϕ(x) by using the Boolean and infinitary connectives, we see that A |= Φ ⇔ B |= Φ for any rank k + 1 sentence Φ. For the other direction, we use a proof similar to the proof of the Ehrenfeucht-Fra¨ıss´e theorem given in Exercise 3.11. We want to define explicitly formulae specifying rank-k types in L∗ ∞ω(Cnt). The number of types can be infinite, but this is not a problem since we can use infinitary connectives, and rank-k types will be given by formulae of rank k. We let ϕ0,m i (x) be an enumeration of all the formulae that define distinct atomic types of x with |x|= m; that is, all consistent conjunctions of the form α1(x) ∧ . . . ∧ αM (x), where αi(x) enumerate all (finitely many) atomic and negated atomic formulae in x. Next, inductively, let {ϕk+1,m i (x) | i ∈ N} be an enumeration of all the formulae of the form ∃!l1 y ϕk,m+1 i1 (x, y)∧. . .∧∃!lp y ϕk,m+1 ip (x, y) ∧ ∀y p j=1 ϕk,m+1 ij (x, y) , (8.8) as p ranges over N and (l1, . . . , lp) ranges over p-tuples of positive integers. Intuitively, each ϕk,m+1 ij (x, y) defines the rank-k m + 1-type of a tuple (x, y). Hence rank-k + 1 types of the form (8.8) say that a given x can be extended to p different rank-k types in such a way that for each ij, there are precisely lj elements y such that ϕk,m+1 ij (x, y) defines the ijth rank-k of the tuple (x, y). Note that if the formula (8.8) is true in (A, a), then |A|= l1 + . . . + lp. 8.4 Counting and Locality 153 It follows immediately from the definition of formulae ϕk,m i that for every A, a ∈ Am , and every k ≥ 0, there is exactly one ϕk,m i such that A |= ϕk,m i (a). Next, we prove the following lemma by induction on k. Lemma 8.14. For every m, every two structures A, B, and every a ∈ Am , b ∈ Bm , suppose there is a formula ϕk,m i (x) such that A |= ϕk,m i (a) and B |= ϕk,m i (b). Then (A, a) ≡bij k (B, b). Proof of the lemma. The case k = 0 is the same as in the proof of the Ehrenfeucht-Fra¨ıss´e theorem. For the induction step, assume that the statement holds for k, and let ϕk+1,m i (x) be given by (8.8). If A |= ϕk+1,m i (a) and B |= ϕk+1,m i (b), then both A and B have exactly l1 + . . . + lp elements. Furthermore, for each j ≤ p, let Aj = {a ∈ A | A |= ϕk,m+1 ij (aa)} and Bj = {b ∈ B | B |= ϕk,m+1 ij (bb)}. Then | Aj |=| Bj |= lj, and hence there exists a bijection f : A → B that maps each Aj to Bj. For any a ∈ A, if j is such that A |= ϕk,m+1 ij (aa), then B |= ϕk,m+1 ij (bf(a)), and hence by the induction hypothesis, (A, aa) ≡bij k (B, bf(a)). Thus, the bijection f proves that (A, a) ≡bij k+1 (B, b). The implication 2 → 1 of Theorem 8.13 is now a special case of Lemma 8.14, since rk(ϕk,m i ) = k. 8.4 Counting and Locality Theorem 8.13 and Corollary 4.21 stating that (A, a) ⇆(3k−1)/2 (B, b) implies (A, a) ≡bij k (B, b), immediately give us the following result. Theorem 8.15. Every L∗ ∞ω(Cnt) formula ϕ(x) without free second-sort variables is Hanf-local (and hence Gaifman-local, and has the BNDP). Thus, despite its enormous counting power, L∗ ∞ω(Cnt) remains local, and cannot express properties such as graph connectivity. Combining Theorem 8.15 and Proposition 8.6, we obtain the following. Corollary 8.16. If ϕ(x) is an FO(Cnt) formula without free second-sort variables, or an FO(Q) formula, where Q is an arbitrary collection of unary quantifiers, then ϕ(x) is Hanf-local (and hence Gaifman-local, and has the BNDP). Furthermore, we obtain the separation L∗ ∞ω(Cnt) (L∗ ∞ω(Cnt)+ <)inv, (8.9) since (L∗ ∞ω(Cnt)+ <) expresses every property of ordered structures (including nonlocal ones, such as graph connectivity), by Proposition 8.10. 154 8 Logics with Counting Theorem 8.15 says nothing about formulae that may have free numerical variables. Next, we show how to extend the notions of Hanf- and Gaifmanlocality to such formulae. Definition 8.17. An L∗ ∞ω(Cnt) formula ϕ(x, ı) is Hanf-local if there exists d ≥ 0 such that for all ı0 ∈ N|ı| , any two structures A, B, and a ∈ A|x| , b ∈ B|x| , (A, a)⇆d(B, b) implies A |= ϕ(a, ı0) ⇔ B |= ϕ(b, ı0) . Furthermore, ϕ(x, i) is Gaifman-local if there is d ≥ 0, such that for all ı0 ∈ N|ı| , every structure A, and a1, a2 ∈ A|x| , a1 ≈A d a2 implies A |= ϕ(a1, ı0) ↔ ϕ(a2, ı0). The locality rank lr(·) and the Hanf-locality rank hlr(·) are defined as before: these are the smallest d that witnesses Gaifman-locality (Hanf-locality, respectively) of a formula. In other words, the formula must be Hanf-local or Gaifman-local for any instantiation of its free second-sort variables, with the locality rank being uniformly bounded for all such instantiations. A simple extension of Theorem 4.11 shows: Proposition 8.18. If an L∗ ∞ω(Cnt) formula ϕ(x, ı) is Hanf-local, then it is Gaifman-local. Furthermore, we can show Hanf-locality of all L∗ ∞ω(Cnt) formulae (not just those without free numerical variables) by using essentially the same argument as in Theorem 4.12. Theorem 8.19. Every L∗ ∞ω(Cnt) formula ϕ(x, ı) is Hanf-local, and hence Gaifman-local. Furthermore, hlr(ϕ) ≤ (3k − 1)/2, and lr(ϕ) ≤ (3k+1 − 1)/2, where k = rk(ϕ). Proof. We give the proof for Hanf-locality; it is by induction on the structure of the formulae. For atomic formulae and Boolean connectives, it is the same as the proof of Theorem 4.12. For infinitary connectives, the argument is the same as for ∧ and ∨. By Proposition 8.8, the only remaining case is that of counting quantifiers: ϕ(x, ı) ≡ ∃jy ψ(y, x, ı). We assume j is in ı. Let rk(ψ) = k, so that rk(ϕ) = k + 1. Let d = hlr(ψ). It suffices to show that hlr(ϕ) ≤ 3d + 1. Fix an interpretation ı0 for ı (and j0 for j). Assume (A, a)⇆3d+1(B, b). By Corollary 4.10, there is a bijection f : A → B such that (A, ac) ⇆d (B, bf(c)) for every c ∈ A. Assume A |= ϕ(a, ı); then we can find c1, . . . , cj0 such that A |= ψ(cl, a, ı), l = 1, . . . , j0. Since hlr(ψ) = d, by the hypothesis, (A, acl) ⇆d (B, bf(cl)) implies B |= ψ(f(cl), b, ı), l = 1, . . . , j0. Thus, B |= ϕ(b, ı), since f is a bijection. The converse, that B |= ϕ(b, ı) implies A |= ϕ(a, ı), is identical. This proves hlr(ϕ) ≤ 3d + 1. 8.5 Complexity of Counting Quantifiers 155 8.5 Complexity of Counting Quantifiers In this section we revisit the logic FO(Cnt), and give a circuit model that corresponds to it. This circuit model defines a complexity class that extends AC0 ; the class is called TC0 , where TC stands for threshold circuits. There are different ways of defining the class TC0 ; the one chosen here uses majority circuits, which have special gates for the majority function. Definition 8.20. Majority circuits are defined as the usual Boolean circuits except that they have additional majority gates. Such a gate has 2k inputs, x1, . . . , xn, y1, . . . , yn, for k > 0. The output of the gate is 1 if n i=1 xi ≥ n i=1 yi, and 0 otherwise. A circuit family C has one circuit Cn for each n, where n is the number of inputs. The size, the depth, and the language accepted by C, are defined in exactly the same way as for Boolean circuits. The class nonuniform TC0 is defined as the class of languages (subsets of {0, 1}∗ ) accepted by polynomialsize constant-depth families of majority circuits. We now extend FO(Cnt) to a logic FO(Cnt)All. This logic, in addition to FO(Cnt), has the linear ordering < on the non-numerical universe, and, furthermore, the restriction of every predicate P ⊆ Nk to the numerical universe {0, . . . , n − 1}; that is, P ∩ {0, . . . , n − 1}k . Theorem 8.21. The class of structures definable by an FO(Cnt)All sentence is in nonuniform TC0 . Consequently, the data complexity of FO(Cnt)All is nonuniform TC0 . Proof. As in the proof of Theorem 6.4, we code formulae by circuits. We first note that if a linear order is available on the non-numerical universe A, there is no need for the numerical universe {0, . . ., n − 1}, where n =| A |, since we can interpret min, max, <, and the arithmetic operations directly on A, associating the ith element of A in the ordering < with i ∈ N. Thus, counting quantifiers will be assumed to be of the form ∃yxϕ(x, · · · ), stating that there exist at least i elements x satisfying ϕ, where y is the ith element of A in the ordering <. Recall that for each structure A with |A| = n, its encoding enc(A) starts with 0n 1 that represents the size of the universe. For each formula ϕ(x1, . . . , xm), and each tuple b = (b1, . . . , bm) in Am , we construct a circuit Cn ϕ(b) with the input enc(A) which outputs 1 iff A |= ϕ(b). If ϕ(b) is an atomic formula of the form S(b), where S ∈ σ, then we simply output the corresponding bit from enc(A). If ϕ is a numerical formula, we 156 8 Logics with Counting output 1 or 0 depending on whether ϕ(b) is true. For Boolean connectives, we simply use ∨, ∧ or ¬ gates. Thus, it remains to show how to handle the case of counting quantifiers. Let ϕ(x1, . . . , xm) ≡ ∃x1 y ψ(y, x). That is, there exist x1 elements y satisfying ϕ (since structures are ordered, we associate an element x1 with its ranking in the linear order). Let b ∈ Am be given, and let a0, . . . , an−1 enumerate all the elements of A. Let Ci be the circuit Cn ψ(ai,b) . We then collect the n outputs of such circuits, and for each of the first n inputs (which are the first n zeros of enc(A)), we produce 1 for the first a1 zeros, and 0 for the remaining n − a1 zeros. This can easily be done with small constant-depth circuits. We then feed all the 2n inputs to a majority gate as shown in Fig. 8.1. C0 C1 Cn−1 MAJ 1 1 1 0 0 a1 . . . . . . . . . Fig. 8.1. Circuit for the proof of Theorem 8.21 It is clear from the construction that the family of circuits defined this way has a fixed constant depth (in fact, linear in the size of the formula), and polynomial size in terms of A . This completes the proof. As with nonuniform AC0 , the nonuniform version of TC0 can define even noncomputable problems, since every predicate on N is available. The uniform version of TC0 is defined as FO(Cnt)+ <: that is, FO(Cnt) with ordering available on the non-numerical universe. Thus, we restrict ourselves to addition and multiplication on natural numbers, and other functions and predicates definable with them (e.g., the BIT predicate). Uniform TC0 is a proper extension of uniform AC0 : for example, parity is in TC0 but not in AC0 . It appears to be a rather modest extension: all we add is a simple form of counting. In particular, TC0 is contained in Ptime, and in fact even in DLog. Nevertheless, we still do not know if TC0 NP. We know, however, that FO(Cnt) is subsumed by L∗ ∞ω(Cnt), and that L∗ ∞ω(Cnt) is local – and hence it cannot express many Ptime problems such as graph connectivity, acyclicity, etc. Would not this give us the desired separation? Unfortunately, it would not, since we can only prove locality of FO(Cnt) but not FO(Cnt)+ <. We have seen that for FO, its extension with order, 8.5 Complexity of Counting Quantifiers 157 that is, (FO+ <)inv, is local too. The same result, however, is not true for FO(Cnt). We now show a counterexample to locality of (FO(Cnt)+<)inv. Proposition 8.22. There exist queries expressible in (FO(Cnt)+<)inv which are not Gaifman-local. Proof. The vocabulary σ contains a binary relation E and a unary relation P. We call a σ-structure good if three conditions are satisfied: 1. E has exactly one node of in-degree 0 and out-degree 1, exactly one node of out-degree 0 and in-degree 1, and all other nodes have both in-degree 1 and out-degree 1. That is, the relation E is a disjoint union of a chain {(a0, a1), (a1, a2), . . . , (ak−1, ak)} and zero or more cycles. 2. P contains a0, does not contain ak, and with each a ∈ P, except a0, it contains its predecessor in E (the unique node b such that (b, a) ∈ E). Thus, P contains an initial segment of the successor part of E, and may contain some of the cycles in E. 3. |P| ≤ log n, where n is the size of the universe of the structure. We claim that there is an FO(Cnt) sentence Φgood that tests if a structure A ∈ STRUCT[σ] is good. Clearly, conditions 1 and 2 can be verified by FO sentences. For condition 3, it suffices to check that the predicate j ≤ log k is definable. Since j ≤ log k iff 2j ≤ k, and the predicate i = 2j is definable even in FO in the presence of addition and multiplication (see Sect. 6.4), we see that all three conditions can be defined in FO(Cnt). We now consider the following binary query Q: If A is good, return the transitive closure of E restricted to P. The result will follow from two claims. First, Q is definable in FO(Cnt)+<. Second, Q is not Gaifman-local. The latter is simple: assume, to the contrary, that Q is Gaifman-local and let d = lr(Q). Let k = 4d + 5, and n = 2k . Take E to be a successor (chain) of length n, with P interpreted as its initial k elements. Notice that this is a good structure. Then in P, we can find two elements a, b with isomorphic and disjoint d-neighborhoods. Hence, (a, b) ≈d (b, a), but the transitive closure query would distinguish (a, b) from (b, a). It remains to show that Q is expressible in FO(Cnt)+<. First, we assume, without loss of generality, that in a given structure A, elements of P precede elements of A − P in the ordering <. Indeed, if this is not true of <, we can always define, in FO, a new ordering <1 which coincides with < on P and on A − P, and, furthermore, a <1 b for all a ∈ P and b ∈ P. Let S ⊆ P, with S = {s1, . . . , sm}. Let each sj be the ijth element in the ordering <; that is, ∃!ijx (x ≤ sj) holds. Define aS as the pth element of A in the ordering <, where BIT(p, i1), . . . , BIT(p, im) are all true, and for 158 8 Logics with Counting every i ∈ {i1, . . . , im}, the value of BIT(p, i) is false. Since |P| ≤ log n, such an element aS exists for every S ⊆ P. Moreover, since BIT is definable, there is a definable (in FO(Cnt)) predicate Code(u, v) which is true iff v is of the form aS for a set S, and u ∈ S. The query Q will now be definable by a formula ∃z ψ(x, y, z), where ψ says that z codes the path from x to y. That is, it says the following: • Code(x, z) and Code(y, z) hold. • If x0 is the predecessor of x and y0 is the successor of y, then Code(x0, z) and Code(y0, z) do not hold. • For every other element u = x, y such that Code(u, z) holds, it is the case that Code(u1, z) and Code(u2, z) hold, where u1 and u2 are the predecessor and the successor of u. • Code(a0, z) holds iff a0 = x, and Code(ak, z) does not hold. Here a0 and ak are the elements of in-degree and out-degree 0, respectively. Clearly, all these conditions can be expressed in FO(Cnt). Given the special form of E, one can easily verify that this defines the transitive closure restricted to P. As a corollary of Proposition 8.22, we get a separation FO(Cnt) (FO(Cnt)+<)inv, since all FO(Cnt)-expressible queries are Gaifman-local, by Corollary 8.16. 8.6 Aggregate Operators Aggregate operators occur in most practical database query languages. They allow one to apply functions for entire columns of relations. For example, if we have a ternary relation R whose tuples are (d, e, s), where d is the department name, e is the employee name, and s is his/her salary, a typical aggregate query would ask for the total salary for each department. Such a query would construct, for each department d, the set of all tuples {(e1, s1), . . . , (en, sn)} such that (d, ei, si) ∈ R for i = 1, . . . , n, and then output (d, n i=1 si). We view this as applying the aggregate function SUM to the multiset {s1, . . . , sn} (it is a multiset since some of the si’s can be the same, but we have to sum them all). Logics with counting seen so far are not well suited for proving results about languages with aggregations, as they cannot talk about entire columns of relations. Nevertheless, we shall show here that aggregate operators can be simulated in L∗ ∞ω(Cnt), thereby giving us expressibility bounds for practical database query languages. We first define the notion of an aggregate operator. 8.6 Aggregate Operators 159 Definition 8.23. An aggregate operator is a collection F = {f0, f1, f2, . . . , fω} of functions, where each fn, 0 < n < ω, takes an n-element multiset (bag) of natural numbers, and returns a number in N. Furthermore, f0 and fω are constants; fω is the fixed value associated to all infinite multisets. For example, the aggregate SUM will be represented as FSUM = {f0, f1, f2, . . . , fω}, where f0 = fω = 0, and fn({a1, . . . , an}) = a1 + . . . + an. Definition 8.24 (Aggregate logic). The aggregate logic Laggr is defined as the following extension of L∗ ∞ω(Cnt). For every possible aggregate operator F, a numerical term t(x, y) and a formula ϕ(x, y), we have a new numerical term t′ (x) = AggrF y t(x, y), ϕ(x, y) . Variables y become bound in AggrF y t(x, y), ϕ(x, y) . The value t′ (a) is calculated as follows. If there are infinitely many b such that ϕ(a, b) holds, then t′ (a) = fω. If there is no b such that ϕ(a, b) holds, then t′ (a) = f0. Otherwise, let b1, . . . , bm enumerate all the b such that ϕ(a, b) holds. Then t′ (a) = fm({t(a, b1), . . . , t(a, bm)}). Note that the argument of fm is in general a multiset, since some of t(a, bi) may be the same. The rank of t′ is defined as max(rk(t), rk(ϕ))+ |y|. For example, the query that computes the total salary for each department is given by the following Laggr formula ϕ(d, v): ∃e∃s R(d, e, s) ∧ v = AggrFSUM (e, s) s, R(d, e, s) . The above query assumes that some of the columns in a relation could be numerical. The results below are proved without this assumption, but it is easy to extend the proofs to relations with columns of different types (see Exercise 8.16). It turns out that this seemingly powerful extension does not actually provide any additional power. Theorem 8.25. The expressive power of Laggr and L∗ ∞ω(Cnt) is the same. Proof. It suffices to show that for every formula ϕ(x) of Laggr, there exists an equivalent formula ϕ◦ (x) of L∗ ∞ω(Cnt) such that rk(ϕ◦ ) ≤ rk(ϕ). We prove this theorem by induction on the formulae and terms. We also produce, for each second-sort term t(x) of Laggr, a formula ψt(x, z) of L∗ ∞ω(Cnt), with z of the second sort, such that A |= ψt(a, n) iff the value of t(a) on A is n. Below we show how to produce such formulae ψt. 160 8 Logics with Counting For a second-sort term t which is a variable i, we define ψt(i, z) to be (z = i). If t is a constant c, then ψt(z) ≡ (z = c). For a term t′ (x) = AggrF y t(x, y), ϕ(x, y) , ψt′ (x, z) is defined as ϕ◦ ∞(x) ∧ (z = fω) ∨ ¬ϕ◦ ∞(x) ∧ ψ′ (x, z) , where ϕ◦ ∞(x) tests if the number of y satisfying ϕ(x, y) is infinite, and ψ′ produces the value of the term in the case when the number of such y is finite. The formula ϕ◦ ∞(x) can be defined as i:yi of 2nd sort C⊆N, C infinite c∈C ϕ◦ i (x, c) where ϕ◦ i (x, yi) ≡ ∃y1, . . . , yi−1, yi+1, . . . , ym ϕ◦ (x, y). The formula ψ′ (x, z) is defined as the disjunction of ¬∃yϕ◦ (x, y)∧(z = f0) and c,(c1,n1),...,(cl,nl)         z = c ∧ ∃!n1y (ϕ◦ (x, y) ∧ ψt(x, y, c1)) ∧ · · · ∧ ∃!nly (ϕ◦ (x, y) ∧ ψt(x, y, cl)) ∧ ∀y a∈N (ϕ◦ (x, y) ∧ ψt(x, y, a) → l i=1 (a = ci))         where the disjunction is taken over all tuples (c1, n1), . . . , (cl, nl), l > 0, ni > 0, and values c ∈ N such that F({c1, . . . , c1 n1 times , . . . , cl, . . . , cl nl times }) = c. Indeed, this formula asserts either that ϕ(x, ·) does not hold and then z = f0, or that c1, . . . , cl are exactly the values of the term t(x, y) when ϕ(x, y) holds, and that ni’s are the multiplicities of the ci’s. A straightforward analysis of the produced formulae shows that rk(ψt′ ) ≤ max(rk(ϕ◦ ), rk(ψt)) plus the number of first-sort variables in y; that is, rk(ψt′ ) ≤ rk(t′ ). This completes the proof of the theorem. Corollary 8.26. Every query expressible in Laggr is Hanf-local and Gaifman- local. Thus, practical database query languages with aggregate functions still cannot express queries such as graph connectivity or transitive closure. 8.8 Exercises 161 8.7 Bibliographic Notes Extension of FO with counting quantifiers was proposed by Immerman and Lander [135]; the presentation here follows closely Etessami [68]. Generalized quantifiers are used extensively in logic, see V¨a¨an¨anen [237, 238]. The infinitary counting logic L∗ ∞ω(Cnt) is from Libkin [166], although a closely related logic with unary quantifiers was studied in Hella [121]. Proposition 8.8 is a standard technique for eliminating counting terms over tuples, see, e.g., Kolaitis and V¨a¨an¨anen [149], and [166]. Bijective games were introduced by Hella [121], and the connection between bijective games and L∗ ∞ω(Cnt) is essentially from that paper (it used a slightly different logic though). Locality of L∗ ∞ω(Cnt) is from [166]. Connection between FO(Cnt) and TC0 is from Barrington, Immerman, and Straubing [16]. The name TC0 refers to threshold circuits that use threshold gates: such a gate has a threshold i, and it outputs 1 if at least i of its inputs are set to 1. The equivalence of threshold and majority gates is well known, see, e.g., Vollmer [247]. Proposition 8.22 is from Hella, Libkin, and Nurmonen [123]. Our treatment of aggregate operators follows Gr¨adel and Gurevich [98]; the definition of the aggregate logic and Theorem 8.25 are from Hella et al. [124]. Sources for exercises: Exercise 8.6: Libkin [166] Exercises 8.7 and 8.8: Libkin [167] Exercises 8.9 and 8.10: Libkin and Wong [170] Exercise 8.11: Immerman and Lander [135] Exercises 8.12 and 8.13: Barrington, Immerman, and Straubing [16] Exercises 8.14 and 8.15: Nurmonen [189] Exercise 8.16: Hella et al. [124] 8.8 Exercises Exercise 8.1. Show that none of the following is expressible in L∗ ∞ω(Cnt): transitive closure of a graph, testing for planarity, acyclicity, 3-colorability. Exercise 8.2. Prove Proposition 8.10 for arbitrary vocabularies. Exercise 8.3. Prove Proposition 8.11. Exercise 8.4. Prove Proposition 8.18. Exercise 8.5. Prove Theorem 8.19 for Gaifman-locality. Exercise 8.6. Extend Exercise 4.11 to counting logics. That is, define functions Hanf rankL, Gaifman rankL : N → N, for a logic L, as follows: Hanf rankL(n) = max{hlr(ϕ) | ϕ ∈ L, rk(ϕ) = n}, 162 8 Logics with Counting Gaifman rankL(n) = max{lr(ϕ) | ϕ ∈ L, rk(ϕ) = n}. Assume that we deal with purely relational vocabularies. Prove that for every n > 1, Hanf rankL(n) = 2n−1 − 1 and Gaifman rankL(n) = 2n − 1, when L is one of the following: FO(Cnt), FO(Q) for any Q, L∗ ∞ω(Cnt). Exercise 8.7. Extend L∗ ∞ω(Cnt) by additional atomic formulae ιd(x, y) (where |x|=|y|), such that A |= ιd(a, b) iff a ≈A d b. Let L∗,r ∞ω(Cnt) be the resulting logic, where every occurrence of ιd satisfies d ≤ r. Prove that L∗,r ∞ω(Cnt) is Hanf-local. Exercise 8.8. Extend L∗ ∞ω(Cnt) by adding local second-order quantification: that is, second-order quantification restricted to Nd(a), where a is the interpretation of free first-order variables. Such an extension, like the one of Exercise 8.7, must have the radii of neighborhoods, over which local second-order quantification is done, uniformly bounded in infinitary formulae. Complete the definition of this logic, and prove that it captures precisely all the Hanf-local queries. Exercise 8.9. Let k be the class of preorders in which every equivalence class has size at most k. The equivalence associated with a preorder is x ∼ y ⇔ (x y) ∧ (y x). Prove that graph connectivity is not in (L∗ ∞ω(Cnt)+ k )inv. Exercise 8.10. The goal of this exercise is to prove a statement much stronger than that of Exercise 8.9. Given a preorder , let [x] be the equivalence class of x with respect to ∼. Let g : N → N be a nondecreasing function which is not bounded by a constant. Let g be the class of preorders such that on an n-element set, for at most g(n) elements we have |[x]| = 2, and for the remaining at least n − g(n) elements, |[x]| = 1; furthermore, if |[x]| = 2 and |[y]| = 1, then x ≺ y. In other words, such preorders are linear orders everywhere, except at most g(n) initial elements. Prove the following: 1. There are functions g for which (L∗ ∞ω(Cnt)+ g)inv contains nonlocal queries. 2. For every g, every query in (L∗ ∞ω(Cnt)+ g)inv has the BNDP. Exercise 8.11. Define Ehrenfeucht-Fra¨ıss´e games for FO(Cnt), and prove their correctness. Exercise 8.12. Consider the logic FO(MAJ) defined as follows. A universe of σstructure is ordered, and is thus associated with {0, . . . , n − 1}. Furthermore, for each k > 0, and a formula ϕ(x, z), with |x|= k, we have a new formula ψ(z) ≡ MAJ x ϕ(x, z), binding x, such that A |= ψ(c) iff |ϕ(A, c)| ≥ 1 2 · |A|k . Recall that ϕ(A, c) stands for {b | A |= ϕ(b, c)}. Prove the following: • Over ordered structures, the logics FO(MAJ) and FO(Cnt) express all the same queries. 8.8 Exercises 163 • In the definition of FO(MAJ), it suffices to consider k ≤ 2: that is, the majority quantifier MAJ (x1, x2) ϕ(x1, x2, z). • Over ordered structures with the BIT predicate, the fragment of FO(MAJ) in which k = 1 (i.e., only new formulae of the form MAJ x ϕ(x, z) are allowed) is as expressive as FO(Cnt). Exercise 8.13. Prove the converse of Theorem 8.21: that is, any class of structures in nonuniform TC0 is definable in FO(Cnt)All. Exercise 8.14. Consider the generalized quantifier Dn defined as follows. If ϕ(x, z) is a formula, then ψ(z) ≡ Dnx ϕ(x, z) is a formula, such that A |= ϕ(a) iff |ϕ(A, a)| mod n = 0. Next, consider strings over the alphabet {0, 1} as finite structure (see Chap. 7), and prove that none of the following properties of strings s0 . . . sm−1 is expressible in FO(Dn): • Majority: Pm−1 i=0 si ≥ m 2 ; • m mod p = 0, for every prime p that does not divide n; • `Pm−1 i=0 si ´ mod p = 0, again for every prime p that does not divide n. Exercise 8.15. Consider the generalized quantifier Dn from Exercise 8.14. Consider ordered structures (in which we can associate elements with numbers), and define an additional predicate y = nx over them. Prove that even in the presence of such an additional predicate, FO(Dn) cannot express the predicate y = (n + 1)x. Exercise 8.16. Aggregate operators in database query languages normally operate on rational numbers; for example, one of the standard aggregates is AVG = {f0, f1, f2, . . . , fω}, where f0 = fω = 0, and fn({a1, . . . , an}) = (a1 + . . . + an)/n. Define LQ aggr as an extension of Laggr where the numerical domain is Q, each q ∈ Q is a numerical term, and all aggregate operators F on Q are available. Prove the following: 1. For every LQ aggr formula ϕ(x) without free numerical variables, there exists an equivalent L∗ ∞ω(Cnt) formula of the same rank. 2. Conclude that LQ aggr is Hanf-local and Gaifman-local. Next, extend all the results to the case when different columns of σ-relations could be of different types: some of the universe of the first sort, and some numerical. Exercise 8.17.∗ Prove that transitive closure is not expressible in FO(Cnt)+<. 9 Turing Machines and Finite Models In this chapter we introduce the technique of coding Turing machines in various logics. It is precisely this technique that gave rise to numerous applications of finite model theory in computational complexity. We start by proving the earliest such result, Trakhtenbrot’s theorem, stating that finite satisfiability is not decidable. For the proof of Trakhtenbrot’s theorem, we code Turing machines with no inputs. By a refinement of this technique, we code nondeterministic polynomial time Turing machines in existential second-order logic (∃SO), proving Fagin’s theorem stating that ∃SO-definable properties of finite structures are precisely those whose complexity is NP. 9.1 Trakhtenbrot’s Theorem and Failure of Completeness Recall the completeness theorem for FO: a sentence Φ is valid (is true in all models) iff it is provable in some formal system. In particular, this implies that the set of all valid FO sentences is r.e. (recursively enumerable), since one can enumerate all the formal proofs of valid FO sentences. We now show that completeness fails over finite models. What does it mean that Φ is valid? It means that all structures A, finite or infinite, are models of Φ: that is, A |= Φ. Since we are interested in finite models only, we want to refine the notions of satisfiability and validity in the finite context. Definition 9.1. Given a vocabulary σ, a sentence Φ in that vocabulary is called finitely satisfiable if there is a finite structure A ∈ STRUCT[σ] such that A |= Φ. The sentence Φ is called finitely valid if A |= Φ holds for all finite structures A ∈ STRUCT[σ]. Theorem 9.2 (Trakhtenbrot). For every relational vocabulary σ with at least one binary relation symbol, it is undecidable whether a sentence Φ of vocabulary σ is finitely satisfiable. 166 9 Turing Machines and Finite Models In the proof that we give, the vocabulary σ contains several binary relation symbols and a constant symbol. But it is easy to modify it to prove the result with just one binary relation symbol (this is done by coding several relations into one; see Exercise 9.1). Before we prove Trakhtenbrot’s theorem, we point out two corollaries. First, as we mentioned earlier, completeness fails in the finite. Corollary 9.3. For any vocabulary containing at least one binary relation symbol, the set of finitely valid sentences is not recursively enumerable. Proof. Notice that the set of finitely satisfiable sentences is recursively enumerable: one simply enumerates all pairs (A, Φ), where A is finite, and outputs Φ whenever A |= Φ. Assume that the set of finitely valid sentences is r.e. Since ¬Φ is finitely valid iff Φ is not finitely satisfiable, we conclude that the set of sentences which are not finitely satisfiable is r.e., too. However, if both a set X and its complement ¯X are r.e., then X is recursive; hence, we conclude that the set of finitely satisfiable sentences is recursive, which contradicts Trakhtenbrot’s theorem. Another corollary states that one cannot have an analog of the L¨owenheimSkolem theorem for finite models. Corollary 9.4. There is no recursive function f such that if Φ has a finite model, then it has a model of size at most f(Φ). Indeed, with such a recursive function one would be able to decide finite satisfiability. We now prove Trakhtenbrot’s theorem. The idea of the proof is to code Turing machines in FO: for every Turing machine M, we construct a sentence ΦM of vocabulary σ such that ΦM is finitely satisfiable iff M halts on the empty input. The latter is well known to be undecidable (this is an easy exercise in computability theory). Let M = (Q, Σ, ∆, δ, q0, Qa, Qr) be a deterministic Turing machine with a one-way infinite tape. Here Q is the set of states, Σ is the input alphabet, ∆ is the tape alphabet, q0 is the initial state, Qa (Qr) is the set of accepting (rejecting) states, from which there are no transitions, and δ is the transition function. Since we are coding the problem of halting on the empty input, we can assume without loss of generality that ∆ = {0, 1}, with 0 playing the role of the blank symbol. We define σ so that its structures represent computations of M. More precisely, σ = {<, min, T0(·, ·), T1(·, ·), (Hq(·, ·))q∈Q}, where 9.1 Trakhtenbrot’s Theorem and Failure of Completeness 167 • < is a linear order and min is a constant symbol for the minimal element with respect to <; hence the finite universe will be associated with an initial segment of natural numbers. • T0 and T1 are tape predicates; Ti(p, t) indicates that position p at time t contains i, for i = 0, 1. • Hq’s are head predicates; Hq(p, t) indicates that at time t, the machine is in state q, and its head is in position p. The sentence ΦM states that <, min, Ti’s, and Hq’s are interpreted as indicated above, and that the machine eventually halts. Note that if the machine halts, then Hq(p, t) holds for some p, t, and q ∈ Qa ∪ Qr, and after that the configuration of the machine does not change. That is, all the configurations of the halting computation can be represented by a finite σ-structure. We define ΦM to be the conjunction of the following sentences: • A sentence stating that < is a linear order and min is its minimal element. • A sentence defining the initial configuration of M (it is in state q0, the head is in the first position, and the tape contains only zeros): Hq0 (min, min) ∧ ∀p T0(p, min). • A sentence stating that in every configuration of M, each cell of the tape contains exactly one element of ∆: ∀p∀t T0(p, t) ↔ ¬T1(p, t) . • A sentence imposing the basic consistency conditions on the predicates Hq’s (at any time the machine is in exactly one state): ∀t∃!p q∈Q Hq(p, t) ∧ ¬∃p∃t q,q′∈Q, q=q′ Hq(p, t) ∧ Hq′ (p, t) . • A set of sentences stating that Ti’s and Hq’s respect the transitions of M (with one sentence per transition). For example, assume that δ(q, 0) = (q′ , 1, ℓ); that is, if M is in state q reading 0, then it writes 1, moves the head one position to the left and changes the state to q′ . This transition is represented by the conjunction of ∀p∀t   p = min ∧ T0(p, t) ∧ Hq(p, t)   →      T1(p, t + 1) ∧ Hq′ (p − 1, t + 1) ∧ ∀p′ p = p′ → ( i=0,1 Ti(p′ , t + 1) ↔ Ti(p′ , t))      and 168 9 Turing Machines and Finite Models ∀p∀t   p = min ∧ T0(p, t) ∧ Hq(p, t)   →      T1(p, t + 1) ∧ Hq′ (p, t + 1) ∧ ∀p′ p = p′ → ( i=0,1 Ti(p′ , t + 1) ↔ Ti(p′ , t))      . We use abbreviations p − 1 and t + 1 for the predecessor of p and the successor of t in the ordering <; these are, of course, FO-definable. The first sentence above ensures that the tape content in position p changes from 0 to 1, the state changes from q to q′ , the rest of the tape remains the same, and the head moves to position p−1, assuming p is not the first position on the tape. The second sentence is very similar, and handles the case when p is the initial position: then the head does not move and stays in p. • Finally, a sentence stating that at some point, M is in a halting state: ∃p∃t q∈Qa∪Qr Hq(p, t). If ΦM has a finite model, then such a model represents a computation of M that starts with the tape containing all zeros, and ends in a halting state. If, on the other hand, M halts on the empty input, then the set of all configurations of the halting computations of M coded as relations <, Ti’s, and Hq’s, is a model of ΦM (necessarily finite). Thus, M halts on the empty input iff ΦM has a finite model. Since testing if M halts on the empty model is undecidable, then so is finite satisfiability for ΦM . 9.2 Fagin’s Theorem and NP Fagin’s theorem provides a purely logical characterization of the complexity class NP, by means of coding computations of nondeterministic polynomial time Turing machines in a fragment of second-order logic. Before stating the result, we give the following general definition. Recall that by properties, we mean Boolean queries, namely, collections of structures closed under isomor- phism. Definition 9.5. Let K be a complexity class, L a logic, and C a class of finite structures. We say that L captures K on C if the following hold: 1. The data complexity of L on C is K; that is, for every L-sentence Φ, testing if A |= Φ is in K, provided A ∈ C. 2. For every property P of structures from C that can be tested with complexity K, there is a sentence ΦP of L such that A |= ΦP iff A has the property P, for every A ∈ C. 9.2 Fagin’s Theorem and NP 169 If C is the class of all finite structures, we say that L captures K. Theorem 9.6 (Fagin). ∃SO captures NP. Before proving this theorem, we make several comments and point out some corollaries. Fagin’s theorem is a very significant result as it was the first machine-independent characterization of a complexity class. Normally, we define complexity classes in terms of resources (time, space) that computations can use; here we use a purely logical formalism. Following Fagin’s theorem, logical characterizations have been proven for many complexity classes (we already saw them for uniform AC0 and TC0 , and later we shall see how to characterize NLog, Ptime, and Pspace over ordered structures). The hardest open problems in complexity theory concern separation of complexity classes, with the “Ptime vs. NP” question being undoubtedly the most famous such problem. Logical characterizations of complexity classes show that such separation results can be formulated as inexpressibility results in logic. Suppose that we have two complexity classes K1 and K2, captured by logics L1 and L2. To prove that K1 = K2, it would then suffice to separate the logics L1 and L2; that is, to show that some problem definable in L2 is inexpressible in L1, or vice versa. Since the class coNP consists of the problems whose complements are in NP, and the negation of an ∃SO sentence is an ∀SO sentence, we obtain: Corollary 9.7. ∀SO captures coNP. Hence, to show that NP = coNP, it would suffice to exhibit a property definable in ∀SO but not definable in ∃SO. While we still do not know if such a property exists, recall that we have a property definable in ∀MSO but not definable in ∃MSO: graph connectivity. In fact, for reasons obvious from Fagin’s theorem and Corollary 9.7, ∃MSO is sometimes referred to as “monadic NP”, and ∀MSO as “monadic coNP”. Hence, Proposition 7.14 tells us that monadic NP = monadic coNP. Note that separating ∀SO from ∃SO would also resolve the “Ptime vs. NP” question: ∀SO = ∃SO ⇒ NP = coNP ⇒ Ptime = NP (if Ptime and NP were the same, NP would be closed under the complement, and hence NP and coNP would be the same). As another remark, we point out that the above remark concerning the separation of ∃SO and ∀SO is specific to the finite case. Indeed, by Fagin’s theorem, ∃SO = ∀SO over finite structures iff NP = coNP, but over some infinite structures (e.g., N, +, · ), the logics ∃SO and ∀SO are known to be different. 170 9 Turing Machines and Finite Models We now prove Fagin’s theorem. First, we show that every ∃SO sentence Φ can be evaluated in NP. Suppose Φ is ∃S1 . . . ∃Sn ϕ, where ϕ is FO. Given A, the nondeterministic machine first guesses S1, . . . , Sn, and then checks if ϕ(S1, . . . , Sn) holds. The latter can be done in polynomial time in A plus the size of S1, . . . , Sn, and thus in polynomial time in A (see Proposition 6.6). Hence, Φ can be evaluated in NP. Next, we show that every NP property of finite structures can be expressed in ∃SO. The proof of this direction is very close to the proof of Trakhtenbrot’s theorem, but there are two additional elements we have to take care of: time bounds, and the input. Suppose we are given a property P of σ-structures that can be tested, on encodings of σ-structures, by a nondeterministic polynomial time Turing machine M = (Q, Σ, ∆, δ, q0, Qa, Qr) with a one-way infinite tape. Here Q = {q0, . . . , qm−1} is the set of states, and we assume without loss of generality that Σ = {0, 1} and ∆ extends Σ with the blank symbol “ ”. We assume that M runs in time nk . Notice that n is the size of the encoding, so we always assume n > 1. We can also assume without loss of generality that M always visits the entire input; that is, that nk always exceeds the size of the encodings of n-element structures (this is possible because the size of enc(A), defined in Chap. 6, is polynomial in A ). The sentence describing acceptance by M on encodings of structures from STRUCT[σ] will be of the form ∃L ∃T0∃T1∃T2 ∃Hq0 . . . ∃Hqm−1 Ψ, (9.1) where Ψ is a sentence of vocabulary σ ∪ {L, T0, T1, T2} ∪ {Hq | q ∈ Q}. Here L is binary, and other symbols are of arity 2k. The intended interpretation of these relational symbols is as follows: • L is a linear order on the universe. With L, one can define, in FO, the lexicographic linear order ≤k on ktuples. Since M runs in time nk and visits at most nk cells, we can model both positions on the tape (p) and time (t) by k-tuples of the elements of the universe. With this, the predicates Ti’s and Hq’s are defined similarly to the proof of Trakhtenbrot’s theorem: • T0, T1, and T2 are tape predicates; Ti(p, t) indicates that position p at time t contains i, for i = 0, 1, and T2(p, t) says that p at time t contains the blank symbol. • Hq’s are head predicates; Hq(p, t) indicates that at time t, the machine is in state q, and its head is in position p. 9.2 Fagin’s Theorem and NP 171 The sentence Ψ must now assert that when M starts on the encoding of A, the predicates Ti’s and Hq’s correspond to its computation, and eventually M reaches an accepting state. Note that the encoding of A depends on a linear ordering of the universe of A. We may assume, without loss of generality, that this ordering is L. Indeed, since queries are closed under isomorphism, choosing one particular ordering to be used in the representation of enc(A) does not affect the result. We now define Ψ as the conjunction of the following sentences: • The sentence stating that L defines a linear ordering. • The sentence stating that – in every configuration of M, each cell of the tape contains exactly one element of ∆; – at any time the machine is in exactly one state; – eventually, M enters a state from Qa. All these are expressed in exactly the same way as in the proof of Trakhtenbrot’s theorem. • Sentences stating that Ti’s and Hq’s respect the transitions of M. These are written almost as in the proof of Trakhtenbrot’s theorem, but one has to take into account nondeterminism. For every a ∈ ∆ and q ∈ Q, we have a sentence (q′,b,move)∈δ(q,a) α(q,a,q′,b,move), where move ∈ {ℓ, r} and α(q,a,q′,b,move) is the sentence describing the transition in which, upon reading a in state q, the machine writes b, makes the move move, and enters state q′ . Such a sentence is written in exactly the same way as in the proof of Trakhtenbrot’s theorem. • The sentence defining the initial configuration of M. Suppose we have formulae ι(p) and ξ(p) of vocabulary σ∪{L} such that A |= ι(p) iff the pth position of enc(A) is 1 (in the standard encoding of structures presented in Chap. 6), and A |= ξ(p) iff p exceeds the length of enc(A). Note that we need L in these formulae since the encoding refers to a linear order on the universe. With such formulae, we define the initial configuration by ∀p ∀t ¬∃u (u 1, then ι is false. Assume p1 = 0. Then we are talking about the position p2n + p3. Positions 0 to n−1 have zeros, so if p2 = 0, then again ι is false. If p3 = 0, then (p2 −1)n+(p3 −1)+(n+1) = p2n+p3, and hence the position corresponds to E(p2 − 1, p3 − 1). If p3 = 0, then this position corresponds to E(p2 − 2, n − 1). Hence, the formula ι(p1, p2, p3) is of the form (p1 = 0) ∧(p2 > 1) ∧ (p3 = 0) ∧ E(p2 − 1, p3 − 1) ∨ (p3 = 0) ∧ E(p2 − 2, n − 1) ∨ (p1 = 0) ∧ (p2 = 1) ∧ (p3 = 0) ∨ (p1 = 1) ∧ . . . , where for the case of p1 = 1 a similar case analysis is done. Clearly, with the linear order L, both 0 and n − 1, and the predecessor function are definable, and hence ι is FO. (The details of writing down ι for arbitrary k are left as an exercise to the reader, see Exercise 9.4.) The formula ξ(p) simply says that p, considered as a number, exceeds n2 + n + 1. This completes the proof of Fagin’s theorem. We now show several more corollaries of Fagin’s theorem. The first one is Cook’s theorem stating that SAT, propositional satisfiability, is NP-complete. 9.2 Fagin’s Theorem and NP 173 Corollary 9.8 (Cook). SAT is NP-complete. Proof. Let P be a problem (a class of σ-structures) in NP. By Fagin’s theorem, there is an ∃SO sentence Φ ≡ ∃S1 . . . ∃Sn ϕ such that A is in P iff A |= Φ. Let X = {Si(a) | i = 1, . . . , n, a ∈ Aarity(Si) }. We construct a propositional formula αA ϕ with variables from X such that A |= Φ iff αA ϕ is satisfiable. The formula αA ϕ is obtained from ϕ by the following three transformations: • replacing each ∃x ψ(x, ·) by a∈A ψ(a, ·); • replacing each ∀x ψ(x, ·) by a∈A ψ(a, ·); and • replacing each R(a), for R ∈ σ, by its truth value in A. In the resulting formula, the variables are of the form Si(a); that is, they come from the set X. Clearly, A |= Φ iff αA ϕ is satisfiable, and αA ϕ can be constructed by a deterministic logarithmic space machine. This proves NP-completeness of SAT. The logics ∃SO and ∀SO characterize NP and coNP, the first level of the polynomial hierarchy PH. Recall that the levels of PH are defined inductively: Σp 1 = NP, and Σp k+1 = NPΣp k . The level Πp k is defined as the set of complements of problems from Σp k. Also recall that Σ1 k is the class of SO sentences of the form (∃ . . . ∃)(∀ . . . ∀)(∃ . . . ∃) . . . ϕ, with k quantifier blocks, and Π1 k is defined likewise but the first block of quantifiers is universal. We now sketch an inductive argument showing that Σ1 k captures Σp k, for every k. The base case is Fagin’s theorem. Now consider a problem in Σp k+1. By Fagin’s theorem, there is an ∃SO sentence Φ (corresponding to the NP machine) with additional predicates expressing Σp k properties. We know, by the hypothesis, that those properties are definable by Σ1 k formulae. Then pushing the second-order quantifier outwards, we convert Φ into a Σ1 k+1 sentence. The extra quantifier alternation arises when these predicates for Σp k properties are negated: suppose we have a formula ∃ . . . ∃ϕ(P), where P is expressed by a formula ∃ . . . ∃ψ, with ψ being FO, and P may occur negatively. Then putting the resulting formula in the prenex form, we have a second-order quantifier prefix of the form (∃ . . . ∃)(∀ . . . ∀). For example, ∃ . . . ∃ ¬ ∃ . . . ∃ψ is equivalent to ∃ . . . ∃∀ . . . ∀ ¬ψ. Filling all the details of this inductive proof is left to the reader as an exercise (Exercise 9.5). Thus, we have the the following result. Corollary 9.9. For each k ≥ 1, • Σ1 k captures Σp k, and • Π1 k captures Πp k . In particular, SO captures the polynomial hierarchy. 174 9 Turing Machines and Finite Models 9.3 Bibliographic Notes Trakhtenbrot’s theorem, one of the earliest results in finite model theory, was published in 1950 [234]. Fagin’s theorem was published in 1974 [70, 71]. His motivation came from the complementation problem for spectra. The spectrum of a sentence Φ is the set {n ∈ N | Φ has a finite model of size n}. The complementation problem (Asser [14]) asks whether spectra are closed under complement; that is, where the complement of the spectrum of Φ is the spectrum of some sentence Ψ. If σ = {R1, . . . , Rn} is the vocabulary of Φ, then the spectrum of Φ can be alternatively viewed as finite models (of the empty vocabulary) of the ∃SO sentence ∃R1 . . . ∃Rn Φ (by associating a universe of size n with n). Fagin defined generalized spectra as finite models of ∃SO sentences (i.e., the vocabulary no longer needs to be empty). The complementation problem for generalized spectra is then the problem whether NP equals coNP. The result that ∃SO and ∀SO are different on N, +, · is due to Kleene [146]. In fact, over N, +, · , the intersection of ∃SO and ∀SO collapses to FO, while over finite structures it properly contains FO. Cook’s theorem is from [39] (and is presented in many texts of complexity and computability, e.g. [126, 195]). The polynomial hierarchy and its connection with SO are from Stockmeyer [223]. Sources for exercises: Exercises 9.6 and 9.7: Gr¨adel [97] Exercise 9.8: Jones and Selman [140] Exercise 9.9: Lautemann, Schwentick, and Th´erien [162] Exercise 9.10: Eiter, Gottlob, and Gurevich [63] Exercise 9.11: Gottlob, Kolaitis, and Schwentick [95] Exercise 9.12: Makowsky and Pnueli [178] Exercise 9.13: (a) from Fagin [72] (b) from Ajtai [10] (see also Fagin [74]) 9.4 Exercises Exercise 9.1. Prove Trakhtenbrot’s theorem for an arbitrary vocabulary with at least one binary relation symbol. Hint: use the binary relation symbol to code several binary relations, used in our proof of Trakhtenbrot’s theorem. Exercise 9.2. Prove that Trakhtenbrot’s theorem fails for unary vocabularies: that is, if all the symbols in σ are unary, then finite satisfiability is decidable. Exercise 9.3. Use Trakhtenbrot’s theorem to prove that order invariance for FO queries is undecidable. 9.4 Exercises 175 Exercise 9.4. Give a general definition of the formula ι from the proof of Fagin’s theorem (i.e., for arbitrary σ and k). Exercise 9.5. Complete the proof of Corollary 9.9. Exercise 9.6. Show that there is an encoding schema for finite σ-structures such that the formulae ι from the proof of Fagin’s theorem can be assumed to be quantifier-free, if the successor relation and the minimal and maximal element with respect to it can be used in formulae. Exercise 9.7. Use the encoding scheme of Exercise 9.6 to prove that every NP can be defined by an ∃SO sentence whose first-order part is universal (i.e., of the form ∀ . . . ∀ ψ, where ψ is quantifier-free), under the assumption that we consider structures with explicitly given order and successor relations, as well as constants for the minimal and the maximal elements. Prove that without these assumptions, universal first-order quantification in ∃SO formulae is not sufficient to capture all of NP. What kind of quantifier prefixes does one need in the general case? Exercise 9.8. Prove that a set X ⊆ N is a spectrum iff it is in NEXPTIME. Explain why this does not contradict Fagin’s theorem. Exercise 9.9. Consider the vocabulary σΣ = (<, (Pa)a∈Σ) used in Chap. 7 for coding strings as finite structures. Recall that a sentence Φ over such vocabulary defines a language (a subset of Σ∗ ) given by {s ∈ Σ∗ | Ms |= Φ}. Consider a restriction ∃SOmatch of ∃SO in which existential second-order variables range over matchings: that is, binary relations of the form {(xi, yi) | i ≤ k} where all xi’s and yi’s are distinct. Prove that a language is definable in ∃SOmatch iff it is context-free. Exercise 9.10. Let S be a set of quantifier prefixes, and let ∃SO(S) be the fragment of ∃SO which consists of sentences of the form ∃R1 . . . ∃Rn ϕ, where ϕ is a prenex formula whose quantifier prefix is in S. We call ∃SO(S) regular if over strings it only defines regular languages. Prove the following: • ∃SO(∀∗ ∃∀∗ ) is regular; • ∃SO(∃∗ ∀∀) is regular; • if ∃SO(S) is regular, then it is contained in the union of ∃SO(∀∗ ∃∀∗ ) and ∃SO(∃∗ ∀∀); • if ∃SO(S) is not regular, then it defines some NP-complete language. Exercise 9.11. We now consider ∃SO(S) and ∃MSO(S) over directed graphs. Prove the following: • ∃SO(∃∗ ∀) only defines polynomial time properties of graphs; • ∃SO(∀∀) and ∃MSO(∃∗ ∀∀) in which at most one second-order quantifier is used only define polynomial time properties of graphs; • each of the following defines some NP-complete problems on graphs: – ∃SO(∃∀∀), where only one second-order quantifier over binary relations is used; 176 9 Turing Machines and Finite Models – ∃MSO(∀∃) and ∃MSO(∀∀∀), where only one second-order quantifier is used; – ∃MSO(∀∀), where only two second-order quantifiers are used. Exercise 9.12. Define SO(k, m) as the union of Σ1 k and Π1 k where all quantification is over relations of arity at most m. That is, SO(k, m) is the restriction of SO to at most k − 1 alternations of quantifiers, and quantification is over relations of arity m. This is usually referred to as the alternation-arity hierarchy. Prove that the alternation-arity hierarchy is strict: that is, there is a constant c such that SO(k, m) SO(k + c, m + c) for all k, m. Exercise 9.13. Define ∃SO(m) as the restriction of class of ∃SO to second-order quantification over relations of arity at most m. Prove the following: (a) If ∃SO(m) = ∃SO(m + 1), then ∃SO(k) = ∃SO(m) for every k ≥ m. (b) If σ contains an m-ary relation symbol P, then the class of structures in which P has an even number of tuples is not ∃SO(m − 1)-definable. (c) Conclude from (a) and (b) that, if σ contains an m-ary relation symbol P, then ∃SO(i) ∃SO(j) over σ-structures, for every 1 ≤ i < j ≤ m. Exercise 9.14.∗ Now consider just the arity hierarchy for SO: that is, SO(m) is defined as S k∈N SO(k, m). Is the arity hierarchy strict? Exercise 9.15.∗ We call a sentence categorical if it has at most one model of each finite cardinality. Is it true that every spectrum is a spectrum of a categorical sentence? 10 Fixed Point Logics and Complexity Classes Most logics we have seen so far are not well suited for expressing many tractable graph properties, such as graph connectivity, reachability, and so on. The limited expressiveness of FO and counting logics is due to the fact that they lack mechanisms for expressing fixed point computations. Other logics we have seen, such as MSO, ∃SO, and ∀SO, can express intractable graph properties. Consider, for example, the transitive closure query. Given a binary relation R, we can express relations R0 , R1 , R2 , R3 , . . ., where Ri contains pairs (a, b) such that there is a path from a to b of length at most i. To compute the transitive closure of R, we need the union of all those relations: that is, R∞ = ∞ i=0 Ri . How could one compute such a union? Since relation R is finite, starting with some n, the sequence Ri , i ≥ 0, stabilizes: Rn = Rn+1 = Rn+2 = . . .. Indeed, in this case n can be taken to be the number of elements of relation R. Hence, R∞ = Rn ; that is, Rn is the limit of the sequence Ri , i > 0. But we can also view Rn as a fixed point of an operator that sends each Ri to Ri+1 . In this chapter we study logics extended with operators for computing fixed points of various operators. We start by presenting the basics of fixed point theory (in a rather simplified way, adapted for finite structures). We then define various extensions of FO with fixed point operators, study their expressiveness, and show that on ordered structures these extensions capture complexity classes Ptime and Pspace. Finally, we show how to extend FO with an operator for computing just the transitive closure, and prove that this extension captures NLog on ordered structures. 178 10 Fixed Point Logics and Complexity Classes 10.1 Fixed Points of Operators on Sets Typically the theory of fixed point operators is presented for complete lattices: that is, partially ordered sets U, ≺ where every – finite or infinite – subset of U has a greatest lower bound and a least upper bound in the ordering ≺. However, here we deal only with finite sets, which somewhat simplifies the presentation. Given a set U, let ℘(U) be its powerset. An operator on U is a mapping F : ℘(U) → ℘(U). We say that an operator F is monotone if X ⊆ Y implies F(X) ⊆ F(Y ), and inflationary if X ⊆ F(X) for all X ∈ ℘(U). Definition 10.1. Given an operator F : ℘(U) → ℘(U), a set X ⊆ U is a fixed point of F if F(X) = X. A set X ⊆ A is a least fixed point of F if it is a fixed point, and for every other fixed point Y of F we have X ⊆ Y . The least fixed point of F will be denoted by lfp(F). Let us now consider the following sequence: X0 = ∅, Xi+1 = F(Xi ). (10.1) We call F inductive if the sequence (10.1) is increasing: Xi ⊆ Xi+1 for all i. Every monotone operator F is inductive, which is shown by a simple induction. Of course X0 ⊆ X1 since X0 = ∅. If Xi ⊆ X+1 , then, by monotonicity, F(Xi ) ⊆ F(Xi+1 ); that is, Xi+1 ⊆ Xi+2 . This shows that Xi ⊆ Xi+1 for all i ∈ N. If F is inductive, we define X∞ = ∞ i=0 Xi . (10.2) Since U is assumed to be finite, the sequence (10.1) actually stabilizes after some finite number of steps, so there is a number n such that X∞ = Xn . To give an example, let R be a binary relation on a finite set A, and let F : ℘(A2 ) → ℘(A2 ) be the operator defined by F(X) = R ∪ (R ◦ X). Here ◦ is the relational composition: R ◦ X = {(a, b) | (a, c) ∈ R, (c, b) ∈ X, for some c ∈ A}. Notice that this operator is monotone: if X ⊆ Y , then R ◦ X ⊆ R ◦ Y . Let us now define the sequence Xi , i ≥ 0, as in (10.1). First, X0 = ∅. Since R ◦ ∅ = ∅, we have X1 = R. Then X2 = R ∪ (R ◦ R) = R ∪ R2 ; that is, the set of pairs (a, b) such that there is a path of length at most 2 from a to b. Continuing, we see that Xi = R ∪ . . . ∪ Ri , the set of pairs connected 10.1 Fixed Points of Operators on Sets 179 by paths of length at most i. This sequence reaches a fixed point X∞ , which is the transitive closure of R. We now prove that every monotone operator has a least fixed point, which is the set X∞ (10.2), defined as the union of the increasing sequence (10.1). Theorem 10.2 (Tarski-Knaster). Every monotone operator F : ℘(U) → ℘(U) has a least fixed point lfp(F) which can be defined as lfp(F) = {Y | Y = F(Y )}. Furthermore, lfp(F) = X∞ = i Xi , for the sequence Xi defined by (10.1). Proof. Let W = {Y | F(Y ) ⊆ Y }. Clearly, W = ∅, since U ∈ W. We first show that S = W is a fixed point of F. Indeed, for every Y ∈ W, we have S ⊆ Y and hence F(S) ⊆ F(Y ) ⊆ Y ; therefore, F(S) ⊆ W = S. On the other hand, since F(S) ⊆ S, we have F(F(S)) ⊆ F(S), and thus F(S) ∈ W. Hence, S = W ⊆ F(S), which proves S = F(S). Let W′ = {Y | F(Y ) = Y } and S′ = W′ . Then S ∈ W′ and hence S′ ⊆ S; on the other hand, W′ ⊆ W, so S = W ⊆ W′ = S′ . Hence, S = S′ . Thus, S = {Y | Y = F(Y )} is a fixed point of F. Since it is the intersection of all the fixed points of F, it is the least fixed point of F. This shows that lfp(F) = {Y | Y = F(Y )} = {Y | F(Y ) ⊆ Y }. To prove that lfp(F) = X∞ , note that the sequence Xi increases, and hence for some n ∈ N, Xn = Xn+1 = . . . = X∞ . Thus, F(X∞ ) = X∞ and X∞ is a fixed point. To show that it is the least fixed point, it suffices to prove that Xi ⊆ Y for every i and every Y ∈ W. We prove this by induction on i. Clearly X0 ⊆ Y for all Y ∈ W. Suppose we need to prove the statement for Xi+1 . Let Y ∈ W. We have Xi+1 = F(Xi ). By the hypothesis, Xi ⊆ Y , and by monotonicity, F(Xi ) ⊆ F(Y ) ⊆ Y . Hence, Xi+1 ⊆ Y . This shows that all the Xi ’s are contained in all the sets of W, and completes the proof of the theorem. Not all the operators of interest are monotone. We now present two different constructions by means of which the fixed point of non-monotone operators can be defined. Suppose F is inflationary: that is, Y ⊆ F(Y ) for all Y . Then F is inductive; that is, the sequence (10.1) is increasing, and hence it reaches a fixed point X∞ . Now suppose G is an arbitrary operator. With G, we associate an inflationary operator Ginfl defined by Ginfl(Y ) = Y ∪G(Y ). Then X∞ for Ginfl is called the inflationary fixed point of G and is denoted by ifp(G). In other words, ifp(G) is the union of all sets Xi where X0 = ∅ and Xi+1 = Xi ∪G(Xi ). Finally, we consider an arbitrary operator F : ℘(U) → ℘(U) and the sequence (10.1). This sequence need not be inductive, so there are two possibilities. The first is that this sequence reaches a fixed point; that is, for some 180 10 Fixed Point Logics and Complexity Classes n ∈ N we have Xn = Xn+1 , and thus for all m > n, Xm = Xn . If there is such an n, it must be the case that n ≤ 2|U| , since there are only 2|U| subsets of U. The second possibility is that no such n exists. We now define the partial fixed point of F as pfp(F) = Xn if Xn = Xn+1 ∅ if Xn = Xn+1 for all n ≤ 2|U| . The definition is unambiguous: since Xn = Xn+1 implies that the sequence (10.1) stabilizes, then Xn = Xn+1 and Xm = Xm+1 imply that Xn = Xm . We leave the following as an easy exercise to the reader. Proposition 10.3. If F is monotone, then lfp(F) = ifp(F) = pfp(F). 10.2 Fixed Point Logics We now show how to add fixed point operators to FO. Suppose we have a relational vocabulary σ, and an additional relation symbol R ∈ σ of arity k. Let ϕ(R, x1, . . . , xk) be a formula of vocabulary σ ∪ {R}. We put the symbol R explicitly as a parameter, since this formula will give rise to an operator on σ-structures. For each A ∈ STRUCT[σ], the formula ϕ(R, x) gives rise to an operator Fϕ : ℘(Ak ) → ℘(Ak ) defined as follows: Fϕ(X) = {a | A |= ϕ(X/R, a)}. (10.3) Here the notation ϕ(X/R, a) means that R is interpreted as X in ϕ; more precisely, if A′ is a (σ ∪{R})-structure expanding A, in which R is interpreted as X, then A′ |= ϕ(a). The idea of fixed point logics is that we add formulae for computing fixed points of operators Fϕ. This already gives us formal definitions of logics IFP and PFP. Definition 10.4. The logics IFP and PFP are defined as extensions of FO with the following formation rules: • (For IFP): if ϕ(R, x) is a formula, where R is k-ary, and t is a tuple of terms, where |x|=|t|= k, then [ifpR,xϕ(R, x)](t) is a formula, whose free variables are those of t. • (For PFP): if ϕ(R, x) is a formula, where R is k-ary, and t is a tuple of terms, where |x|=|t|= k, then [pfpR,xϕ(R, x)](t) is a formula, whose free variables are those of t. 10.2 Fixed Point Logics 181 The semantics is defined as follows: • (For IFP): A |= [ifpR,xϕ(R, x)](a) iff a ∈ ifp(Fϕ). • (For PFP): A |= [pfpR,xϕ(R, x)](a) iff a ∈ pfp(Fϕ). Why could we not define an extension with the least fixed point in exactly the same way? The reason is that least fixed points are guaranteed to exist only for monotone operators. However, monotonicity is not an easy property to deal with. Lemma 10.5. Testing if Fϕ is monotone is undecidable for FO formulae ϕ. Proof. Let Φ be an arbitrary sentence, and ϕ(S, x) ≡ (S(x) → Φ). Suppose Φ is valid. Then ϕ(S, x) is always true and hence Fϕ is monotone in every structure. Suppose now that A |= ¬Φ for some nonempty structure A. Then, over A, ϕ(S, x) is equivalent to ¬S(x), and hence Fϕ is not monotone. Therefore, Fϕ is monotone iff Φ is true in every nonempty structure, which is undecidable, by Trakhtenbrot’s theorem. Thus, to ensure that least fixed points are only taken for monotone operators, we impose some syntactic restrictions. Given a formula ϕ that may contain a relation symbol R, we say that an occurrence of R is negative if it is under the scope of an odd number of negations, and is positive, if it is under the scope of an even number of negations. For example, in the formula ∃x¬R(x)∨¬∀y∀z¬(R(y)∧¬R(z)), the first occurrence of R (i.e., R(x)) is negative, the second (R(y)) is positive (as it is under the scope of two negations), and the last one (R(z)) is negative again. We say that a formula is positive in R if there are no negative occurrences of R in it; in other words, either all occurrences of R are positive, or there are none at all. Definition 10.6. The logic LFP extends FO with the following formation rule: • if ϕ(R, x) is a formula positive in R, where R is k-ary and t is a tuple of terms, where |x|=|t|= k, then [lfpR,xϕ(R, x)](t) is a formula, whose free variables are those of t. The semantics is defined as follows: A |= [lfpR,xϕ(R, x)](a) iff a ∈ lfp(Fϕ). Of course, there is something to be proven here: 182 10 Fixed Point Logics and Complexity Classes Lemma 10.7. If ϕ(R, x) is positive in R, then Fϕ is monotone. The proof is by an easy induction on the structure of the formula (which includes the cases of Boolean connectives, quantifiers, and lfp operators) and is left as an exercise to the reader. We now give a few examples of queries definable in fixed point logics. Transitive Closure and Acyclicity Let E be a binary relation, and let ϕ(R, x, y) be E(x, y) ∨ ∃z (E(x, z) ∧ R(z, y)). Clearly, this is positive in R. Let ψ(u, v) be [lfpR,x,yϕ(R, x, y)](u, v). What does this formula define? To answer this, we must consider the operator Fϕ. For a set X, we have Fϕ(X) = E ∪(E ◦X). We have seen this operator in the previous section, and know that its least fixed point is the transitive closure of E. Hence, ψ(u, v) defines the transitive closure of E. This also implies that graph connectivity is LFP-definable by the sentence ∀u∀v ψ(u, v). As the next example, we again consider graphs whose edge relation is E, and the formula α(S, x) given by ∀y E(y, x) → S(y) . This formula is again positive in S. The operator Fα associated with this formula takes a set X and returns the set of all nodes a such that all the nodes b from which there is an edge to a are in X. Let us now iterate this operator. Clearly, Fα(∅) is the set of nodes of in-degree 0. Then Fα(Fα(∅)) is the set of nodes a such that all nodes b with edges (b, a) ∈ E have in-degree 0. Reformulating this, we can state that Fα(Fα(∅)) is the set of nodes a such that all paths ending in a have length at most 1. Following this, at the ith stage of the iteration we get the set of nodes a such that all the paths ending in a have length at most i. When we reach the fixed point, we have nodes such that all the paths ending in them are finite. Hence, the formula ∀u [lfpS,xα(S, x)](u) tests if a graph is acyclic. Arithmetic on Successor Structures As a third example, consider structures of vocabulary (min, succ), where succ is interpreted as a successor relation on the universe, and min is the minimal element with respect to succ. That is, the structures will be of the form {0, . . . , n − 1}, 0, {(i, i + 1) | i + 1 ≤ n − 1} . We show how to define +++ = {(i, j, k) | i + j = k} and ××× = {(i, j, k) | i · j = k} 10.2 Fixed Point Logics 183 on such structures. For +++, we use the recursive definition: x + 0 = x x + (y + 1) = (x + y) + 1. Let R be ternary and β+(R, x, y, z) be y = min ∧ z = x ∨ ∃u∃v R(x, u, v) ∧ succ(u, y) ∧ succ(v, z) . Intuitively, it states the conditions for (x, y, z) to be in the graph of addition: either y = 0 and x = z, or, if we already know that x + u = v, and y = u + 1, z = v + 1, then we can infer x + y = z. This formula is positive in R, and the least fixed point computes the graph of addition: ϕ+++(x, y, z) = [lfpR,x,y,zβ+(R, x, y, z)](x, y, z). Using addition, we can define multiplication: x · 0 = 0 x · (y + 1) = x · y + x. Similarly to the case of addition, we define β×(S, x, y, z) as y = min ∧ z = min ∨ ∃u∃v S(x, u, v) ∧ succ(u, y) ∧ ϕ+++(x, v, z) . This formula is positive in S. Then ϕ×××(x, y, z) = [lfpS,x,y,zβ×(S, x, y, z)](x, y, z) defines the graph of multiplication. Since it uses ϕ+++ as a subformula, this gives us an example of nested least fixed point operators. Combining this example with Theorem 6.12, we conclude that BIT is LFPdefinable over successor structures. A Game on Graphs Consider the following game played on a graph G = V, E with a distinguished start node a. There are two players: player I and player II. At each round i, first player I selects a node bi and then player II selects a node ci, such that (a, b1), as well as (bi, ci) and (ci, bi+1), are edges in E, for all i. The player who cannot make a legal move loses the game. Let S be unary, and define α(S, x) as ∀y E(x, y) → ∃z E(y, z) ∧ S(z) . What is Fα(∅)? It is the set of nodes b of out-degree 0; that is, nodes in which player II wins, since player I does not have a single move. In general, Fα(X) is the set of nodes b such that no matter where player I moves from b, player 184 10 Fixed Point Logics and Complexity Classes II will have a response from X. Thus, iterating Fα, we see that the ith stage consists of nodes from which player II has a winning strategy in at most i − 1 rounds. Hence, [lfpS,xα(S, x)](a) holds iff player II has a winning strategy from node a. We conclude this section by a remark concerning free variables in fixed point formulae. So far, in the definition and all the examples we dealt with iterating formulae ϕ(R, x) where x matched the arity of R. However, in general one can imagine that ϕ has additional free variables. For example, if we have a formula ϕ(R, x, y) positive in R, we can, for each tuple b, define an operator Fb ϕ(X) = {a | A |= ϕ(X/R, a, b)}, and a formula ψ(t, y) ≡ [lfpR,xϕ(R, x, y)](t), with the semantics A |= ψ(c, b) iff c ∈ lfp(Fb ϕ). It turns out, however, that free variables in fixed point formulae can always be avoided, at the expense of relations of higher arity. Indeed, the formula ψ(t, y) above is equivalent to [lfpR′,x,yϕ′ (R′ , x, y)](t, y), where R′ is of arity | x | + | y |, and ϕ′ is obtained from ϕ by changing every occurrence of a subformula R(z) to R′ (z, y). This is left as an exercise to the reader. Thus, we shall normally assume that no extra parameters are present in fixed point formulae. 10.3 Properties of LFP and IFP In this section we study logics LFP and IFP. We start by introducing a very convenient tool of simultaneous fixed points, which allows one to iterate several formulae at once. We then analyze fixed point computations, and show how to define and compare their stages (that is, sets Xi as in (10.1)). From this analysis we shall derive two important conclusions. One is that LFP = IFP on finite structures. The other is a normal form for LFP, showing that nested occurrences of fixed point operators (which we saw in the multiplication example in the previous section) can be eliminated. Let σ be a relational vocabulary, and R1, . . . , Rn additional relation symbols, with Ri being of arity ki. Let xi be a tuple of variables of length ki. Consider a sequence Φ of formulae ϕ1(R1, . . . , Rn, x1), · · · , ϕn(R1, . . . , Rn, xn) (10.4) of vocabulary σ ∪ {R1, . . . , Rn}. Assume that all ϕi’s are positive in all Rj’s. Then, for a σ-structure A, each ϕi defines an operator Fi : ℘(Ak1 ) × . . . × ℘(Akn ) → ℘(Aki ) given by 10.3 Properties of LFP and IFP 185 Fi(X1, . . . , Xn) = {a ∈ Aki | A |= ϕi(X1/R1, . . . , Xn/Rn, a)}. We can combine these operators Fi’s into one operator F : ℘(Ak1 ) × . . . × ℘(Akn ) → ℘(Ak1 ) × . . . × ℘(Akn ) given by F(X1, . . . , Xn) = (F1(X1, . . . , Xn), . . . , Fn(X1, . . . , Xn)). A sequence of sets (X1, . . . , Xn) is a fixed point of F if F(X1, . . . , Xn) = (X1, . . . , Xn). Furthermore, if for every fixed point (Y1, . . . , Yn) we have X1 ⊆ Y1, . . . , Xn ⊆ Yn, then we speak of the least fixed point of F. The product ℘(Ak1 )×. . .×℘(Akn ) is partially ordered component-wise by ⊆, and the operator F is component-wise monotone. Hence, it can be iterated in the same way as usual monotone operators on ℘(U); that is, X0 = (∅, . . . , ∅) Xi+1 = F(Xi ) X∞ = ∞ i=1 Xi = ∞ i=1 Xi 1, . . . , ∞ i=1 Xi n . (10.5) Just as for the case of the usual operators on sets, one can prove that X∞ = lfp(F). We then enrich the syntax of LFP with the rule that if Φ is a family of formulae (10.4), and t is a tuple of terms of length ki, then [lfpRi,Φ](t) is a formula with the semantics A |= [lfpRi,Φ](a) iff a belongs to the ith component of X∞ . The resulting logic will be denoted by LFPsimult . As an example of a property expressible in LFPsimult , consider the following query Q on undirected graphs G = V, E : it returns the set of nodes (a, b) such that there is a simple path of even length from a to b. Let T be a ternary relation symbol, and R, S binary relation symbols. We consider the following system Φ of formulae: ϕ1(T, R, S, x, y, z) ≡ (E(x, y) ∧ ¬(x = z) ∧ ¬(y = z)) ∨ ∃u E(x, u) ∧ T (u, y, z) ∧ ¬(x = z) ϕ2(T, R, S, x, y) ≡ E(x, y) ∨ ∃u S(x, u) ∧ E(u, y) ∧ T (x, u, y) ϕ3(T, R, S, x, y) ≡ ∃u R(x, u) ∧ R(u, y) ∧ T (x, u, y) . Notice that these formulae are positive in R, S, T . We leave it to the reader to verify that the simultaneous least fixed point of this system Φ computes the following relations: 186 10 Fixed Point Logics and Complexity Classes • T ∞ (a, b, c) holds iff there is a simple path from a to b that does not pass through c; • R∞ (a, b) holds iff there is a simple path from a to b of odd length; and • S∞ (a, b) holds iff there is a simple path from a to b of even length. Thus, [lfpS,Φ](x, y) expresses the query Q. (See Exercise 10.2.) Simultaneous fixed points are often convenient for expressing complex properties, when several sets need to be defined at once. The question is then whether such fixed points enrich the expressiveness of the logic. The answer, as we are about to show, is negative. Theorem 10.8. LFPsimult = LFP. Proof. We give the proof for the case of a system Φ consisting of two formulae, ϕ1(R, S, x) and ϕ2(R, S, y). Extension to an arbitrary system is rather straightforward, and left as an exercise for the reader (Exercise 10.3). The idea is that we combine a simultaneous fixed point into two fixed point formulae, in which the lfp operators are nested. We need an auxiliary result first. Assume we have two monotone operators F1 : ℘(U) × ℘(V ) → ℘(U) and F2 : ℘(U) × ℘(V ) → ℘(V ). Following (10.5), we define the stages of the operator (F1, F2) as X0 = (X0 1 , X0 2 ) = (∅, ∅), Xi+1 = (Xi+1 1 , Xi+1 2 ) = (F1(Xi ), F2(Xi )), with the fixed point (X∞ 1 , X∞ 2 ). Fix a set Y ⊆ U, and define two operators: FY 2 : ℘(V ) → ℘(V ), FY 2 (Z) = F2(Y, Z); G1 : ℘(U) → ℘(U), G1(Y ) = F1(Y, lfp(FY 2 )). Clearly, FY 2 is monotone, and hence lfp(FY 2 ) is well-defined. The operator G1 is monotone as well (since for Y ⊆ Y ′ , it is the case that lfp(FY 2 ) ⊆ lfp(FY ′ 2 )), and hence it has a least fixed point. To prove the theorem, we need the following lemma, which is sometimes referred to as the Bekic principle. Lemma 10.9. X∞ 1 = lfp(G1). Before we prove the lemma, we show that the theorem follows from it. Since X∞ 1 = lfp(G1), we have to express G1 in lfp, which can be done, as G1 is defined as the least fixed point of a certain operator. In fact, it follows from the definition of G1 that [lfpR,Φ](t) is equivalent to lfpR,x ϕ1 R, [lfpS,yϕ2(R, S, y)] / S, x (t). 10.3 Properties of LFP and IFP 187 The roles of F1 and F2 can be reversed; that is, we can define FY 1 (Z) = F1(Z, Y ) : ℘(U) → ℘(U) and G2 : ℘(V ) → ℘(V ) by G2(Y ) = F2(lfp(FY 1 ), Y ), and prove, as in Lemma 10.9, that X∞ 2 = lfp(G2). Therefore, lfpS,y ϕ2 [lfpR,xϕ1(R, S, x)] / R, S, y (t) is equivalent to [lfpS,Φ](t). It remains to prove Lemma 10.9. First, notice that lfp(F X∞ 1 2 ) ⊆ X∞ 2 , because F X∞ 1 2 (X∞ 2 ) = F2(X∞ 1 , X∞ 2 ) = X∞ 2 . That is, X∞ 2 is a fixed point of F X∞ 1 2 , and thus it must contain its least fixed point. Hence, G1(X∞ 1 ) = F1(X∞ 1 , lfp(F X∞ 1 2 )) ⊆ F1(X∞ 1 , X∞ 2 ) = X∞ 1 . Since lfp(G1) is the intersection of all the set S such that G1(S) ⊆ S, we conclude that lfp(G1) ⊆ X∞ 1 . Next, we prove the reverse inclusion X∞ 1 ⊆ lfp(G1). We use Z to denote lfp(G1). We show inductively that for each i, Xi 1 ⊆ Z and Xi 2 ⊆ lfp(FZ 2 ). This is clear for i = 0. To go from i to i + 1, calculate Xi+1 1 = F1(Xi 1, Xi 2) ⊆ F1(Z, lfp(FZ 2 )) = G1(lfp(G1)) = lfp(G1) = Z, and Xi+1 2 = F2(Xi 1, Xi 2) ⊆ F2(Z, lfp(FZ 2 )) = FZ 2 (lfp(FZ 2 )) = lfp(FZ 2 ). Thus, X∞ 1 = ∞ i=0 Xi 1 ⊆ lfp(G1). This completes the proof of Lemma 10.9 and Theorem 10.8. One can similarly define logics IFPsimult and PFPsimult , by allowing simultaneous inflationary and partial fixed points. It turns out that for IFP and PFP, simultaneous fixed points do not increase expressiveness either. The proof presented for LFP would not work, as it relies on the monotonicity of operators defined by formulae, which cannot be guaranteed for arbitrary formulae used in the definition of the logics IFP and PFP. Nevertheless, a different technique works for these logics. We explain it now by means of an example; details are left as an exercise for the reader. Assume that the vocabulary σ has two constant symbols c1 and c2 interpreted as two distinct elements of σ-structure. This assumption is easy to get rid of, by existentially quantifying over two variables, u and w, and stating that u = w; however, formulae with constants will be easier to deal with. Furthermore, we can assume without loss of generality that structures have at least two elements, since the case of one-element structures can be dealt with explicitly by specifying the value of a fixed point operator on them. Suppose we have two formulae, ϕ1(R1, R2, x) and ϕ2(R1, R2, x), where the arities of R1 and R2 are n, and the length of x is n. Let S be a relation symbol of arity n + 1, and let ψ(S, u, x) be the formula 188 10 Fixed Point Logics and Complexity Classes (u = c1) ∧ ϕ1 S(c1, z)/R1(z), S(c2, z)/R2(z), x ∨ (u = c2) ∧ ϕ2 S(c1, z)/R1(z), S(c2, z)/R2(z), x , where S(ci, z)/Ri(z) indicates that every occurrence of Ri(z) is replaced by S(ci, z). Then the fixed point – inflationary or partial – of this formula ψ computes the simultaneous fixed point of the system {ϕ1, ϕ2}: the fixed point corresponding to Ri is the set of all n-tuples of the fixed point of ψ where the first coordinate is ci. This argument is generalized to arbitrary systems of formulae, thereby giving us the following result. Theorem 10.10. IFPsimult = IFP and PFPsimult = PFP. We now come back to single fixed point definitions and analyze them in detail. Suppose we have a formula ϕ(R, x). Assume for now that ϕ is positive in R. To construct the least fixed point of ϕ on a structure A, we inductively calculate X0 = ∅, Xi+1 = Fϕ(Xi ), and then the fixed point is X∞ = i Xi . We shall refer to Xi ’s as stages of the fixed point computation, with Xi being the ith stage. First, we note that each stage is definable by an LFP formula, if ϕ is positive in R. Indeed, for each stage i, we have a formula ϕi (xi), such that ϕi (A) is exactly Xi . These are defined inductively as follows: ϕ0 (x0) ≡ ¬(x = x) x is a variable in x0 ϕi+1 (xi+1) ≡ ϕ(ϕi /R, xi+1). (10.6) Here the notation ϕ(ϕi /R, xi+1) means that every occurrence R(y) in ϕ is replaced by ϕi (y) and, furthermore, all the bound variables in ϕ have been replaced by fresh ones. For example, consider the formula ϕ(R, x, y) ≡ E(x, y) ∨ ∃z E(x, z) ∧ R(z, y) . Following (10.6), we obtain the formulae ϕ0 (x0, y0) ≡ ¬(x0 = x0) ϕ1 (x1, y1) ≡ E(x1, y1) ∨ ∃z1 (E(x1, z1) ∧ ϕ0 (z1, y1)) ↔ E(x1, y1) ϕ1 (x2, y2) ≡ E(x2, y2) ∨ ∃z2 (E(x2, z2) ∧ ϕ1 (z2, y2)) ↔ E(x2, y2) ∨ ∃z2 (E(x2, z2) ∧ E(z2, y2)) . . . . . . computing the stages of the transitive closure operator. For an arbitrary ϕ, we can give formulae for computing stages of the inflationary fixed point computation. These are given by ϕ0 (x0) ≡ ¬(x = x) ϕi+1 (xi+1) ≡ ϕi (xi+1) ∨ ϕ(ϕi /R, xi+1). (10.7) 10.3 Properties of LFP and IFP 189 Thus, each stage of the inflationary fixed point computation is definable by an IFP formula. What is more interesting is that we can write formulae that compare stages at which various tuples get into the sets Xi of fixed point computations. Suppose we are given a formula ϕ(R, x) that gives rise to an inductive operator Fϕ, where R is k-ary and x has k variables. For example, if we are interested in inflationary fixed point computation, we can always pass from ϕ(R, x) to R(x) ∨ ϕ(R, x), whose induced operator is inductive. Given a structure A, we define |ϕ|A as the least n such that Xn = X∞ . Furthermore, for a tuple a ∈ Ak , we define |a|A ϕ as the least number i such that a ∈ Xi in the fixed point computation, and |ϕ|A + 1 if no such i exists. Notice that if ϕ is positive in R, then the stages of the least and inflationary fixed point computation are the same. We next define two relations ≺ϕ and ϕ on Ak as follows: a ≺ϕ b ≡ |a|A ϕ < |b|A ϕ , a ϕ b ≡ |a|A ϕ ≤ |b|A ϕ and |a|A ϕ ≤ |ϕ|A . The theorem below shows that these can be defined with least fixed points of positive formulae. Theorem 10.11 (Stage comparison). If ϕ is in LFP, then the binary relations ≺ϕ and ϕ are LFP-definable. Proof. The idea of the proof is as follows. We want to define both ≺ϕ and ϕ as a simultaneous fixed point. This has to be done somehow from ϕ, but in ϕ we may have both positive and negative occurrences of R. So to find some relations to substitute for the negative occurrences of R, we explicitly introduce the complements of ≺ϕ and ϕ : a ≺ϕ b ≡ |a|A ϕ ≥ |b|A ϕ , a ϕ b ≡ |a|A ϕ > |b|A ϕ or |a|A ϕ = |ϕ|A + 1. We shall be using formulae of the form ϕ(≺(y)/R, x) and ϕ( (y)/R, x). This means that, for ϕ(≺ (y)/R, x), every positive occurrence R(z) of R is replaced by z ≺ϕ y, and every negative occurrence of R(z) of R is replaced by z ≺ϕ y, and likewise for ϕ . Note that all the occurrences of the four relations ≺ϕ , ϕ , ≺ϕ , ϕ become positive. Also, we shall write ϕ(¬≺(y)/R, x), meaning that every positive occurrence R(z) of R is replaced by ¬(z ≺ϕ y), and every negative occurrence of R(z) of R is replaced by ¬(z ≺ϕ y). These 190 10 Fixed Point Logics and Complexity Classes will be used in subformulae ¬ϕ(¬ ≺ (y)/R, x), again ensuring that all the occurrences of ≺ϕ , ϕ , ≺ϕ , ϕ are positive. These four relations will be defined by a simultaneous fixed point. For technical reasons, we shall add one more relation: a ⊳ϕ b ≡ |a|A ϕ + 1 = |b|A ϕ, and show how to define (≺, , ⊳, ≺, ) by a simultaneous fixed point. For readability only, we may omit the superscript ϕ. We define the system Ψ of five formulae ψi(≺, , ⊳, ≺, , x, y), i = 1, . . . , 5, as follows: ψ1 ≡ ∃z x z ∧ z ⊳ y , ψ2 ≡ ϕ(≺(y)/R, x), ψ3 ≡ ϕ(≺(x)/R, x) ∧ ¬ϕ(≺(x)/R, y) (10.8) ∧ ϕ( (x)/R, y) ∨ ∀z ¬ϕ(¬ (x)/R, z) ∨ ϕ(≺(x)/R, z) , ψ4 ≡ ∃z x z ∧ z ⊳ y ∨ ϕ(∅/R, y) ∨ ∀z¬ϕ(∅/R, z), ψ5 ≡ ¬ϕ(¬ ≺(y)/R, x) where ϕ(∅/R, ·) means that all occurrences of R are eliminated and replaced by false. Note that all the occurrences of ≺, , ⊳, ≺, in Ψ are positive. We next claim that the simultaneous least fixed point of Ψ indeed defines ≺ϕ , ϕ , ⊳ϕ , ≺ϕ , ϕ . To prove the result, we have to show that (≺ϕ , ϕ , ⊳ϕ , ≺ϕ , ϕ ) satisfy (10.8), and that for each ∗ ∈ {≺ϕ , ϕ , ⊳ϕ , ≺ϕ , ϕ }, if a ∗ b holds, then (a, b) is in the corresponding fixed point of Ψ (10.8). This will be proved by induction on |b|A ϕ . Below, we prove a few cases for both directions. The remaining cases are very similar, and are left as an exercise for the reader. First, we prove that ⊳ϕ satisfies (10.8). Consider a tuple (a, b) in this relation. The result is immediate if |a|A ϕ = |ϕ|A + 1. If |a|A ϕ < |ϕ|A , then the third conjunct in ψ3(a, b) is equivalent to ϕ( ϕ (a)/R, b) and, therefore, ψ3(a, b) holds iff |b|A ϕ = |a|A ϕ + 1 iff a ⊳ϕ b. Finally, if |a|A ϕ = |ϕ|A , then the third conjunct in ψ3 is equivalent to the formula ∀z (¬ϕ(¬ ϕ (a)/R, z)∨ ϕ(≺ϕ (a)/R, z)) and, thus, ψ3(a, b) holds iff b is not in the fixed point of ψ3 iff |b|A ϕ = |ϕ|A + 1 = |a|A ϕ + 1. Second, we prove by induction on |b|A ϕ that, for every a, if a⊳ϕ b or a ≺ϕ b, then (a, b) is in the corresponding fixed point of Ψ. Induction Basis: |b|A ϕ = 1. • The case for ⊳ϕ . This is the simplest case, since |b|A ϕ = 1 implies that a ⊳ϕ b holds for no a. 10.3 Properties of LFP and IFP 191 • The case for ≺ϕ . Since |b|A ϕ = 1, we conclude that ϕ(∅/R, b) holds. We have a ≺ϕ b for all a, and since ϕ(∅/R, b) is true, (a, b) is in the fixed point of ψ4 for every a. Induction Step: Assume that |b|A ϕ = k + 1 and that the property holds for all c such that |c|A ϕ ≤ k. • The case for ⊳ϕ . Suppose that a ⊳ϕ b. Then |a|A ϕ ≤ k. We show that the three conjuncts in ψ3 hold for (a, b) and, thus, we conclude that (a, b) is in the fixed point of ψ3. Since |a|A ϕ < |b|A ϕ, we have |a|A ϕ ≤ |ϕ|A and, therefore, ϕ(≺ϕ (a)/R, a) holds. By the induction hypothesis, ≺ϕ (a) =≺(a), so ϕ(≺(a)/R, a) holds. Since |a|A ϕ < |b|A ϕ , ¬ϕ(¬ ≺ϕ (a)/R, b) holds. By the induction hypothesis, ≺ϕ (a) =≺(a) and, hence, ¬ϕ(¬ ≺(a)/R, b) holds. To prove that the third conjunct in ψ3 holds, we consider two cases. If |b|A ϕ ≤ |ϕ|A , then ϕ( ϕ (a)/R, b) holds. By the hypothesis, ϕ (a) = (a) and, therefore, ϕ( (a)/R, b) holds. Otherwise |b|A ϕ = |ϕ|A + 1 and |a|A ϕ = |ϕ|A . In this case all the elements generated at stage |a|A ϕ + 1 are already in stage |a|A ϕ and, therefore, the formula ∀z (¬ϕ(¬ ϕ (a)/R, z) ∨ ϕ(≺ ϕ (a)/R, z)) holds. As in the previous cases, by the induction hypothesis we conclude that ∀z (¬ϕ(¬ (a)/R, z) ∨ ϕ(≺(a)/R, z)) holds. • The case for ≺ϕ . Suppose that a ≺ϕ b, and that the second and third disjuncts in ψ4 do not hold. Then we show that the first disjunct in ψ4 holds and conclude that (a, b) is in the fixed point of ψ4. Since ϕ(∅/R, b) and ∀z¬ϕ(∅/R, z) do not hold, we have |b|A ϕ > 1 and the fixed point of ψ4 contains at least one element. Thus, there exists c such that c ⊳ϕ b. Given that a ≺ϕ b, we have a ϕ c and |c|A ϕ ≤ k. Therefore, we have a tuple c with |c|A ϕ ≤ k such that both a ϕ c and c⊳ϕ b hold. Now using the equivalence from the previous case for c ⊳ϕ b, and applying the induction hypothesis to a ϕ c, we conclude that (a, b) satisfies ∃z (a z ∧ z ⊳ b), which finishes the proof. Corollary 10.12 (Gurevich-Shelah). IFP = LFP. Proof. The inclusion LFP ⊆ IFP is immediate. For the converse, proceed by induction on the formulae. The only case to consider is ifpR,xϕ(R, x). We can assume, without loss of generality, that ϕ defines an inductive operator (if not, consider R(x) ∨ ϕ). Then [ifpR,xϕ(R, x)](t) is equivalent to ϕ(≺ϕ (t)/R, t ), which, by the stage comparison theorem, is an LFP formula. 192 10 Fixed Point Logics and Complexity Classes As another corollary of stage comparison, we establish a normal form for LFP formulae. Define a logic LFP0 which extends FO with the following. If Φ is a system of FO formulae ϕi(R1, . . . , Rn, x) positive in all the Ri’s, then [lfpRi,Φ](x) is an LFP0 formula. Note the difference between this and general LFP: we only allow fixed points to be applicable to FO formulae, and we do not close those fixed points under the Boolean connectives and quantification. In other words, every formula of LFP0 is either FO, or of the form [lfpRi,Φ](x), where Φ consists of FO formulae. Corollary 10.13. LFP = LFP0. Proof. We first show that LFP0 is closed under ∨, ∧, and ¬. For ∨ and ∧ this is easy: just introduce an extra relation to hold the union or intersection of two fixed points. For example, given ϕ1(R1, x) and ϕ2(R2, x), we define a system Φ that consists of formulae ϕ1(R1, R2, S, x), ϕ2(R1, R2, S, x), and ϕ3(R1, R2, S, x) ≡ (R1(x) ∨ R2(x)). Then lfpS,Φ is the union of fixed points of ϕ1 and ϕ2. The closure under negation follows from the stage comparison: ¬[lfpR,xϕ](t) is equivalent to t ϕ t. The closure of LFP0 under fixed point operators is immediate (one simply adds an extra formula to the system). Thus, LFP0 = LFP. 10.4 LFP, PFP, and Polynomial Time and Space The goal of this section is to show that the fixed point logics we introduced capture familiar complexity classes over ordered structures. A structure is ordered if one of the symbols of its vocabulary σ is <, interpreted as a linear order on the universe. Recall that we used a linear order for defining an encoding of a structure: indeed, a string on the tape of a Turing machine is naturally ordered from left to right. For capturing NP and the polynomial hierarchy, we did not need the assumption that the structures are ordered, since we could guess an order by second-order quantifiers. However, fixed point logics are not sufficiently expressive for guessing a linear order (in fact, this will be proved formally). Theorem 10.14 (Immerman-Vardi). Both LFP and IFP capture Ptime over the class of ordered structures. That is, LFP+< = IFP+< = Ptime. Proof. By the Gurevich-Shelah theorem (Corollary 10.12), we can use IFP and LFP interchangeably. First, we show that LFP formulae can be evaluated in polynomial time. The proof is by induction on the formulae. The cases of the Boolean connectives and quantifiers are handled in exactly the same way 10.4 LFP, PFP, and Polynomial Time and Space 193 as for FO (see, e.g., Proposition 6.6). For formulae of the form lfpR,xϕ, it suffices to observe the following: if F : ℘(U) → ℘(U) is a Ptime-computable monotone operator, then lfp(F) can be computed in polynomial time in |U |. Indeed, we know that the fixed point computation stops after at most | U | iterations, and each iteration is Ptime-computable. Hence, every LFP formula can be evaluated in polynomial time. For the converse, we use the same technique as in the proofs of Trakhtenbrot’s and Fagin’s theorems. Suppose we are given a property P of σ-structures which can be tested, on encodings of σ-structures, by a deterministic polynomial time Turing machine M = (Q, Σ, ∆, δ, q0, Qa, Qr) with a one-way infinite tape. We assume, without loss of generality, that there is only one accepting state, qa, that Σ = {0, 1}, and that ∆ extends Σ with the blank symbol. Let M run in time nk . As before, we assume that nk exceeds the size of the encodings of n-element structures. With the linear order <, we can again define the lexicographic linear order ≤k on k-tuples, and use the ordered k-tuples to model both positions of M and time. We shall define, by means of fixed point formulae, the 2k-ary predicates T0, T1, T2, (Hq)q∈Q, where Ti(p, t) indicates that position p at time t contains i, for i = 0, 1, and blank, for i = 2, and Hq(p, t) indicates that at time t, the machine is in state q, and its head is in position p. We shall provide a system Ψ of formulae whose simultaneous inflationary fixed point is exactly (T0, T1, T2, (Hq)q∈Q). Once we have such a system, the sentence testing P will be given by ∃p ∃t [ifpHqa ,Ψ ](p, t ). (10.9) Since IFPsimult = IFP and IFP = LFP, the formula (10.9) can be expressed in LFP. The system Ψ contains formulae ψi(p, t, T0, T1, T2, (Hq)q∈Q), i = 0, 1, 2, defining Ti’s, and ψq(p, t, T0, T1, T2, (Hq)q∈Q), q ∈ Q, defining Hq’s. It has the property that the jth iteration for each of the relations it defines, Rj , contains {(p, t) | R(p, t) and t < j}, where t < j means that t is among the first j − 1 k-tuples in the lexicographic ordering 0) ∧ αq(t − 1, p, T0, T1, (Hq)q∈Q), where αq again lists conditions under which at the next time instant, M will enter state q while having the head pointing at p. The first disjunct in ψq0 states that at time 0, M is in state q0 with its head in position 0. We leave it as a routine exercise to the reader to write the αi’s and αq’s, based on M’s transitions, and verify that that jth stage of the fixed point computation for the system Ψ indeed computes the configuration of M for times not exceeding j − 1. Hence, the fixed point formula (10.9) checks membership in P, which completes the proof. Note that using inflationary fixed points instead of least fixed points in the proof of Theorem 10.14 gives us extra freedom in writing down formulae of the system Ψ: we do not have to ensure that these are positive in Ti’s and Hq’s. However, one can write those formulae carefully so that they would be positive in all those relation symbols. In that case, one can replace ifp with lfp in (10.9). Hence, the proof of Theorem 10.14 then shows that every LFPdefinable property over ordered structures can be defined by a formula of the form ∃x [lfpRi,Ψ ](x), where Ψ is a system of FO formulae positive in relation symbols R1, . . . , Rn. This, of course, would follow from Corollary 10.13, stating that LFP = LFP0, but notice that for ordered structures, we obtained the normal form result without using the stage comparison theorem. We have seen that for several logics, adding an order increases their expressiveness; that is, L (L+ <)inv for L being FO, or one of its counting extensions, or MSO. The same is true for LFP, IFP, and PFP; the proof of this will be given in the next chapter when we describe additional tools such as finite variable logics and pebble games. At this point we only say that the query that separates these logics on ordered and unordered structures is even: it is not expressible in any of the fixed point logics without a linear order, but is obviously already in LFP+<, since it is Ptime-computable. We conclude this section by considering the partial fixed point logic, PFP. Over ordered structures, it corresponds to another well-known complexity class. Theorem 10.15. Over ordered structures, PFP captures Pspace. 10.5 Datalog and LFP 195 The proof, of course, follows the proofs of Trakhtenbrot’s, Fagin’s, and Immerman-Vardi’s theorems. We only explain why PFP formulae can be evaluated in Pspace. Consider pfpR,xϕ(R, x), where R is k-ary, and let Xi ’s be the stages of the partial fixed point computation on A with |A|= n. There are two possibilities. Either Xm+1 = Xm for some m, in which case a fixed point is reached. Otherwise, for some 0 ≤ i, j ≤ 2nk , i + 1 < j, we have Xi = Xj , and in this case the formula [pfpR,xϕ(R, x)](t) would evaluate to false, since the partial fixed point is the empty set. Hence, one has to check which of these cases is true. For that, it suffices to enumerate all the subsets of Ak , one by one (which can be done in Pspace), and proceed with computing the sequence Xi , checking whether a fixed point is reached. Since only 2nk steps need to be made, the entire computation is in Pspace. To show that Pspace ⊆ PFP+<, one modifies the proof of the Immerman-Vardi theorem, to simulate the accepting condition of a Turing machine by means of a partial fixed point formula. We leave the details to the reader (Exercise 10.9). 10.5 Datalog and LFP In this section we review a database query language Datalog, and relate it to fixed point logics. Recall that FO is used as the basic relational query language (it is known under the name relational calculus in the database literature). Conjunctive queries, seen in Sect. 6.7, constitute an important subclass of FO queries. They can be defined in the fragment of FO that only includes conjunction ∧ and existential quantification ∃. There is another convenient form for writing conjunctive queries that in fact is used most often in the literature. Instead of ψ(x) ≡ ∃y i αi(x, y), one omits the existential quantifiers and replaces the ∧’s with commas: Rψ(x) :– α1(x, y), α2(x, y), . . . , αm(x, y). (10.10) Here Rψ is a new relation symbol; the meaning of (10.10) is that, for a given structure A, this new relation contains the set of all tuples a such that A |= ψ(a). Expressions of the form (10.10) are called rules; the part of the rule that appears on the left of the :– (in this case, Rψ(x)) is called its head, and the part of the rule on the right of the :– is called its body. A rule is converted into a conjunctive query by replacing commas with conjunctions, and existentially quantifying all the variables that appear in the body but not in the head. For example, the rule q(x, y) :– E(x, z), E(z, v), E(v, y) is translated into ∃z∃v E(x, z) ∧ E(z, v) ∧ E(v, y) . 196 10 Fixed Point Logics and Complexity Classes Datalog programs contain several rules some of which may be recursive: that is, the same predicate symbol may appear in both the head and the body of a rule. A typical Datalog program would be of the following form: trcl(x, y) :– E(x, y) trcl(x, y) :– E(x, z), trcl(z, y) (10.11) This program computes the transitive closure of E: it says that (x, y) is in the transitive closure if there is an edge (x, y), or there is an edge (x, z) such that (z, y) is in the transitive closure. As with the fixed point definition of the transitive closure, to evaluate this program we iterate this definition, starting with the empty set, until a fixed point is reached. Definition 10.16. A Datalog program over vocabulary σ is a pair (Π, Q), where Π is a set of rules of the form P(x) :– α1(x, y), . . . , αm(x, y). (10.12) Here the relation symbol P in the head of rule (10.12) does not occur in σ, and each αi is an atomic formula of the form R(x, y), for R ∈ σ, or P′ (x, y), for P′ that occurs as a head of one of the rules of Π. Furthermore, Q is the head of one of the rules of Π. By Datalog¬ we mean the extension of Datalog where negated atomic formulae of the form ¬R(·), for R ∈ σ, can appear in the bodies of rules (10.12). For example, the transitive closure program consists of the rules (10.11), and trcl is the output predicate Q. In the standard Datalog terminology, relation symbols from σ are called extensional predicates, and symbols not in σ that appear as heads of rules are called intensional predicates. These are the predicates computed by the program, and Q is its output. To define the semantics of a Datalog (or Datalog¬) program (Π, Q), we introduce the immediate consequence operator FΠ. Let P1, . . . , Pk list all the intensional predicates (with Q being one of them). Let ni be the arity of Pi, i = 1, . . . , k. Let Pi(x) :– γ1 1 (x, y1), . . . , γ1 m1 (x, y1) · · · · · · · · · · · · Pi(x) :– γl 1(x, yl), . . . , γl ml (x, yl) (10.13) enumerate all the rules in Π with Pi as the head. Given a structure A and a tuple of sets Y = (Y1, . . . , Yk), Yi ⊆ Ani , i = 1, . . . , k, we define FΠ(Y ) = (Z1, . . . , Zk), where Zi = a ∈ Ani (A, Y1, . . . , Yk) |= l j=1 ∃yj γj 1(a, yj) ∧ . . . ∧ γj mj (a, yj) , 10.5 Datalog and LFP 197 where formulae γj l are the formulae from the rules (10.13) for the intensional predicate Pi. In other words, a ∈ Zi can be derived by applying one of the rules of Π whose head is Pi, using Y as the interpretation for the intensional predicates. Since the formula above is positive in all the intensional predicates (even for a Datalog¬ program), the operator FΠ is monotone. Hence, starting with (∅, . . . , ∅) and iterating this operator, we reach the least fixed point lfp(FΠ ) = (P∞ 1 , . . . , P∞ k ). The output of (Π, Q) on A is defined as Q∞ (recall that Q is one of the Pi’s). Returning to the transitive closure example, the stages of the fixed point computation of the immediate consequence operator are exactly the same as the stages of computing the least fixed point of E(x, y)∨∃z (E(x, z)∧R(z, y)), and hence, on an arbitrary finite graph, the program (10.11) computes its transitive closure. Analyzing the semantics of a Datalog program (Π, Q), we can see that it is simply a simultaneous least fixed point of a system Ψ of formulae ψi(x, P1, . . . , Pk) ≡ j ∃yj γj 1(a, yj) ∧ . . . ∧ γj mj (a, yj) . (10.14) That is, the answer to (Π, Q) on A is {a | A |= [lfpQ,Ψ ](a) }. Hence, each Datalog or Datalog¬ program can be expressed in LFPsimult , and thus in LFP. What fragment of LFP does Datalog¬ correspond to? The special form of formulae ψi (10.14) indicates that there are some syntactic restrictions on LFP formulae into which Datalog¬ is translated. We can capture these syntactic restrictions by a notion of existential least fixed point logic. Definition 10.17. The existential least fixed point logic, ∃LFP, over vocabulary σ, is defined as a restriction of LFP over σ, where: • negation can only be applied to atomic formulae of vocabulary σ (i.e., formulae R(·), where R ∈ σ), and • universal quantification is not allowed. Theorem 10.18. ∃LFP = Datalog¬. Proof. We have seen one direction already, since every Datalog¬ query can be translated into one simultaneous fixed point of a system of FO formulae ψi (10.14), in which no universal quantifiers are used, and negation only applies to atomic σ-formulae. Elimination of the simultaneous fixed point introduces no negation and no universal quantification, and hence Datalog¬ ⊆ ∃LFP. 198 10 Fixed Point Logics and Complexity Classes For the converse, we translate each ∃LFP formula ϕ(x1, . . . , xk) into an equivalent Datalog¬ program (Πϕ, Qϕ), which, on any structure A, computes Q∞ ϕ = ϕ(A). Moreover, the translation ensures that no relation symbol that appears positively in ϕ is negated in Πϕ. The translation proceeds by induction on the structure of the formulae as follows: • If ϕ(x) is an atomic or negated atomic formula (i.e., R(x) or ¬R(x)), then Πϕ contains one rule Qϕ(x) :– ϕ(x). • If ϕ ≡ α ∧ β, then Πϕ = Πα ∪ Πβ ∪ {Qϕ(x) :– Qα(x), Qβ(x)}. • If ϕ ≡ α ∨ β, then Πϕ = Πα ∪ Πβ ∪ {Qϕ(x) :– Qα(x), Qϕ(x) :– Qβ(x)}. • If ϕ(x) ≡ ∃yα(y, x), then Πϕ = Πα ∪ {Qϕ(x) :– Qα(y, x)}. • Let ϕ(x) ≡ [lfpR,yα(R, y)](x). By the induction hypothesis, we have a program (Πα, Qα) for α; notice that R appears positively in α, and thus does not appear negated in Πα. Hence, we can define the following program, in which R is an intensional predicate: Πϕ = Πα ∪ {R(y) :– Qα(y), Qϕ(x) :– R(x)}, and which computes the least fixed point of α. Thus, Datalog and Datalog¬ correspond to syntactic restrictions of LFP. But could they still be sufficient for capturing Ptime? Let us first look at a Datalog program (Π, Q), and suppose we have two σ-structures, A1 and A2, on the same universe A, such that for every symbol R ∈ σ, we have RA1 ⊆ RA2 . Then a straightforward induction on the stages of the immediate consequence operator shows that (Π, Q)[A1] ⊆ (Π, Q)[A2], where by (Π, Q)[A] we denote the result of (Π, Q) on A. Hence, Datalog only expresses monotone properties, and thus cannot capture Ptime (exercise: exhibit a non-monotone Ptime property). Queries expressible in Datalog¬ satisfy a slightly different monotonicity property. Suppose A is a substructure of B; that is, A ⊆ B, and for each R ∈ σ, RA is the restriction of RB to A. Then (Π, Q)[A] ⊆ (Π, Q)[B], where (Π, Q) is a Datalog¬ program. Indeed, when you look at the formulae (10.14), it is clear that if a witness a is found in A, it will be a witness for the existential quantifiers in B. Since it is again not hard to find a Ptime property that fails this notion of monotonicity, Datalog¬ fails to capture Ptime. Furthermore, even adding order preserves monotonicity, and hence Datalog¬ fails to capture Ptime even over ordered structures. 10.6 Transitive Closure Logic 199 But now assume that on all the structures, we have a successor relation succ available, as well as constants min, max for the minimal and maximal element with respect to the successor relation. It is impossible for A, succA , minA , maxA , . . . to be a substructure of B, succB , minB , maxB , . . . , and hence the previous monotonicity argument does not work. In fact, the following theorem can be shown. Theorem 10.19. Over structures with successor relation and constants for the minimal and maximal elements, Datalog¬ captures Ptime. The proof mimics the proofs of Fagin’s and Immerman-Vardi’s theorems, by directly coding deterministic polynomial time Turing machines in Datalog¬, and is left to the reader as an exercise. 10.6 Transitive Closure Logic One of the standard examples of queries expressible in LFP is the transitive closure. In this section, we study a logic based on the transitive closure operator, rather than the least or inflationary fixed point, and prove that it corresponds to a well-known complexity class. Definition 10.20. The transitive closure logic TrCl is defined as an extension of FO with the following formation rule: if ϕ(x, y, z) is a formula, where |x|=|y|= k, and t1, t2 are tuples of terms of length k, then [trclx,yϕ(x, y, z)](t1, t2) is a formula whose free variables are z plus the free variables of t1, t2. The semantics is defined as follows. Given a structure A, values a for z and ai for ti, i = 1, 2, construct the graph G on Ak with the set of edges {(b1, b2) | A |= ϕ(b1, b2, a)}. Then A |= [trclx,yϕ(x, y, a)](a1, a2) iff (a1, a2) is in the transitive closure of G. For example, connectivity of directed graphs can be expressed by the TrCl formula ∀u∀v [trclx,y(E(x, y) ∨ E(y, x))](u, v). We now state the main result of this section. Theorem 10.21. Over ordered structures, TrCl captures NLog. 200 10 Fixed Point Logics and Complexity Classes Having seen a number of results of this type, one might be tempted to think that the proof is by a simple modification of the proofs of Trakhtenbrot’s, Fagin’s, and Immerman-Vardi’s theorems. However, in this case we are running into problems, and the problems arise in the “easy” part of the proof: TrCl ⊆ NLog. It is well known that the transitive closure of a graph can be computed by a nondeterministic logspace machine. Hence, trying to show the inclusion TrCl ⊆ NLog by induction on the structure of the formulae, we have no problems with the transitive closure operator. The problematic operation is negation. Since NLog is a nondeterministic class, acceptance means that some computation ends in an accepting state. The negation of this statement is that all computations end in rejecting states, and it is not clear whether this can be reformulated as an existential statement. Our strategy for proving Theorem 10.21 is to split it into two statements. First, we define a logic posTrCl in which all occurrences of the transitive closure operator are positive (i.e., occur under the scope of an even number of negations). In fact, one can always convert such a formula into an equivalent formula in which no trcl operator would be contained in the scope of any negation symbol. We then prove two results. Proposition 10.22. Over ordered structures, posTrCl captures NLog. Proposition 10.23. Over ordered structures, posTrCl = TrCl. Clearly, Theorem 10.21 will follow from these. Furthermore, they yield the following corollary. Corollary 10.24 (Immerman–Szelepcs´enyi). NLog is closed under complementation. This is in sharp contrast to other nondeterministic classes such as NP or the levels Σp i of the polynomial hierarchy, where closure under complementation remains a major unsolved problem. In particular, for NP this is the problem of whether NP = coNP. We start by showing how to prove Proposition 10.22. With negation gone, this proof becomes very similar to the other capture proofs seen in this and the previous chapters. Indeed, the inclusion posTrCl ⊆ NLog is proved by straightforward induction (since negation is only applied to FO formulae). For the converse, suppose we have a nondeterministic logspace machine M. In such a machine, we have one read-only tape that stores the input, enc(A), and one work tape, whose size is bounded by c log n for some constant c (where n =|A|). Let Q be the set of states. To model a configuration of M, we need to model both tapes. The input tape can be described by a tuple of variables p, where p indicates a position on the tape, just as in the proof of Fagin’s and Immerman-Vardi’s theorems. 10.6 Transitive Closure Logic 201 For the work tape, we need to describe its content, and the position of the head, together with the state. The latter (position and the state) can be described with | Q | variables (assuming c log n is shorter than the encoding of structures with an n-element universe). If the alphabet of the work tape is {0, 1}, there are 2c log n = nc possible configurations, which can be described with c variables. Hence, the entire configuration can be described by tuples s of length at most c(σ)+|Q|+c, where c(σ) is a constant depending on σ that gives an upper bound on the size of tuples p describing positions in the input. Then the class of structures accepted by M is definable by the formula ∃s0∃s1 ϕinit(s0) ∧ ϕfinal(s1) ∧ [trclx,yϕnext(x, y)](s0, s1) . (10.15) Here ϕinit(s0) says that s0 is the initial configuration, with the input tape head pointing at the first position in the initial state, and the work tape containing all zeros; ϕfinal(s1) says that s1 is an accepting configuration (it is in an accepting state), and ϕnext(x, y) says that the configuration y is obtained from the configuration x in one move. It is a straightforward (but somewhat tedious) task to write these three formulae in FO, and it is done similarly to the proofs of other capture theorems. This proves Proposition 10.22. Before we prove Proposition 10.23, we re-examine (10.15). Let min and max, as before, stand for the constants for the minimal and the maximal element with respect to the ordering, and let min and max stand for the tuples of these constants, of the same length as the configuration description. Suppose instead of ϕnext(x, y) we use the formula ϕ′ next: ϕnext(x, y) ∨ (x = min ∧ ϕinit(y)) ∨ (ϕfinal(x) ∧ y = max), allowing jumps from min to the initial configuration, and from any final configuration to max. Then (10.15) is equivalent to [trclx,yϕ′ next(x, y)](min, max). (10.16) Thus, every posTrCl formula over ordered structures defines an NLog property, which can be expressed by (10.15), and hence by (10.16). We therefore obtained the following. Corollary 10.25. Over ordered structures, every posTrCl formula is equivalent to a formula of the form [trclx,yϕ](min, max), where ϕ is FO. We now prove Proposition 10.23. The proof is by induction on the structure of TrCl formulae, and the only nontrivial case is that of negation. By Corollary 10.25, we may assume that negation is applied to a formula of the form (10.16); that is, we have to show that ¬ [trclx,yϕ(x, y)](min, max), (10.17) 202 10 Fixed Point Logics and Complexity Classes where ϕ is FO, is equivalent to a posTrCl formula. Assume x = k. For an arbitrary formula α(x, y) with | y |= k, and a structure A, let dA α (a, b) be the shortest distance between a and b in α(A) (viewed as a graph on Ak ). If no path between a and b exists, we assume dA α (a, b) = ∞. We define ReachA α (a) = {b ∈ Ak | dA α (a, b) = ∞}. Thus, (10.17) holds in A iff |ReachA ϕ (min)| = |ReachA ϕ(x,y)∧¬(y=max)(min)| . (10.18) Notice that the maximal finite value of dA α (a, b) is |A| k . Since structures are ordered, we can count up to |A| k using (k +1)-tuples of variables: associating the universe A with {0, . . . , n−1}, we let a (k+1)-tuple (c1, . . . , ck+1) represent c1 · nk + c2 · nk−1 + . . . + ck · n + ck+1. (10.19) As it will not cause any confusion, we shall use the notation c for both the tuple and the number (10.19) it represents. Note also that constants 0 = min and 1, as well as successor and predecessor c + 1 and c − 1, are FO-definable in the presence of order, so we shall use them in formulae. Also notice that the maximum value of dA α (a, b), |A|k , is represented by 10 = (1, 0, . . . , 0). One useful property of posTrCl is that over ordered structures it can count: for a formula β(x) of posTrCl, one can construct another posTrCl formula countβ(y) such that A |= countβ(c) if there are at least c tuples a in β(A). Indeed, we can enumerate all the tuples a, and go over all of them, checking if β(a) holds. Since β can be checked in NLog, the whole algorithm has NLog complexity, and thus is definable in posTrCl. One can also express this counting directly: if ψ(x1v1, x2v2) is (x2 = x1 + 1) ∧ (v2 = v1) ∨ (x2 = x1 + 1) ∧ β(x2) ∧ (v2 = v1 + 1) , then ∃z trclx1v1,x2v2 ψ(x1v1, x2v2) min, min, max, z ∧ y = z ∨ β(min) ∧ (y = z + 1) expresses countβ(y) (exercise: explain why). Our next goal is to prove the following lemma. Lemma 10.26. For every FO formula α(x, y), there exists a posTrCl formula ρα(x, z) such that for every A, A |= ρα(a, c) iff |ReachA α (a)|= c. Before proving this, notice that Lemma 10.26 immediately implies Proposition 10.23, since by (10.18), (10.17) is equivalent to 10.6 Transitive Closure Logic 203 ∃z ρϕ(min, z) ∧ ρϕ(x,y)∧¬(y=max)(min, z) , which is a posTrCl formula. Let rA α (a, c) denote the cardinality of {b | dA α (a, b) ≤ c}, so that the cardinality of the set ReachA α (a) is rA α (a, 10). Assume that there is a formula γα(x, v, z1, z2) such that A |= γα(a, e, c1, c2) means that if rA α (a, e) = c1, then rA α (a, e + 1) = c2. With such a formula γα, ρα(x, z) is definable by [trclv1z1,v2z2 (v2 = v1 + 1) ∧ γα(x, v1, z1, z2) ] min, min, 10, z), since the above formula says that rα(x, 10) = z. Thus, it remains to show how to define γα. In preparation for writing down the formula γα, notice that there is a posTrCl formula dα(x, y, z) such that A |= dα(a, b, c) iff dA α (a, b) ≤ c. Indeed, it is given by [trclx1z1,x2z2 α(x1, x2) ∧ (z1 < z2) ] x, min, y, z). Coming back to γα, notice that rA α (a, e + 1) = c2 iff c2 + |{b | dA α (a, b) > e + 1}| = 10 (= nk ). Hence, if we could write a posTrCl formula expressing this condition, we would be able to express γα in posTrCl. Suppose we can express dA α (a, b) > e+1 in posTrCl. Then γα is straightforward to write, since we already saw how to count: we start with c2 and increment the count every time b with dA α (a, b) > e + 1 is found; then trcl is applied to see if 10 is reached (we leave the details of this formula to the reader). Thus, our last task is to express the condition dA α (a, b) > e+1 in posTrCl. Even though we have a formula dα(x, y, z) in posTrCl (meaning dα(x, y) ≤ z), what we need now is the negation of such a formula, which is not in posTrCl. However, it is possible to express dA α (a, b) > e + 1 in posTrCl under the condition rA α (a, e) = c1 (which is all that we need anyway, by the definition of γα). If e = min, then dA α (a, b) > 1 is equivalent to ¬α(a, b). Otherwise, dA α (a, b) > e + 1 iff one can find c tuples f different from b such that dA α (a, f) ≤ e and ¬α(f, b) for all such f. Now the distance formula (which itself is a posTrCl formula) occurs positively, and to express dA α (a, b) > e+1, we simply count the number of f satisfying the conditions above, and compare that number with c. As we have seen earlier, such counting of f’s can be done by a posTrCl formula. Thus, γα is expressible in posTrCl, which completes the proof of Lemma 10.26 and Theorem 10.21. 204 10 Fixed Point Logics and Complexity Classes 10.7 A Logic for Ptime? We have seen that LFP and IFP capture Ptime on the class of ordered structures. On the other hand, for classes such as NP and coNP we have logics that capture them over all structures. The question that immediately arises is whether there is a logic that captures Ptime, without the additional restriction to ordered structures. If there were such a logic, answering the “Ptime vs. NP” question would become a purely logical problem: one would have to separate two logics over the class of all finite structures. However, all attempts to produce a logic that captures Ptime have failed so far. In fact, it is even conjectured that no such logic exists: Conjecture (Gurevich) There is no logic that captures Ptime over the class of all finite structures. This is a very strong conjecture: since there is a logic for NP, by Fagin’s theorem, it would imply that Ptime = NP! The conjecture described precisely what a logic is. We shall not go into technical details, but the main idea is to rule out the possibility of taking an arbitrary collection of properties and stating that they constitute a logic. For example, is the collection of all Ptime properties a logic? If we want the conjecture to hold, clearly the answer ought to be no. In this short section, we shall present a few attempts to refute Gurevich’s conjecture and find a logic for Ptime – and show how they all failed. The results here will be presented without proofs, and the interested reader should consult the bibliographic notes section for the references. What are examples of properties not expressible in LFP or IFP over unordered structures? Although we have not proved this yet, we mentioned one example: the query even. We shall see later, in Chap. 11, that in general IFP cannot express nontrivial counting properties over unordered structures. Hence, one might try to add counting to IFP (it is better to use IFP, so that positiveness would not constrain us), and hope that such an extension captures Ptime. This extension of IFP, denoted by IFP(Cnt), can be defined in the same way as we defined FO(Cnt) from FO: one introduces the additional universe {0, . . . , n − 1}, where n is the cardinality of the universe of a σ-structure A, and extends the logic with counting quantifiers ∃ix. However, this extension still falls short of Ptime, and the separating example is very complicated. Theorem 10.27. There are Ptime properties which are not definable in IFP(Cnt). Another attempt to expand IFP is to introduce generalized quantifiers, already seen in Chap. 8. There, we only dealt with unary generalized quantifiers; here we present a general definition, but for notational simplicity deal with the case of one additional relation per quantifier. 10.7 A Logic for Ptime? 205 Let R be a relation symbol of arity k, R ∈ σ. Let C ⊆ STRUCT[{R}] be a class of structures closed under isomorphism. This gives rise to a generalized quantifier QC and the extension of IFP with QC, denoted by IFP(QC), which is defined as follows. If ϕ(x, y) is an IFP(QC) formula of vocabulary σ, and |x|= k, then ψ(y) ≡ QCx ϕ(x, y) (10.20) is an IFP(QC) formula. The other formation rules are exactly the same as for IFP. The semantics of (10.20) is as follows: A |= ψ(b) ⇔ A, {a | A |= ϕ(a, b)} ∈ C. For example, if C is the class of connected graphs, then the sentence QCx, y E(x, y) simply tests if the input graph is connected. If Q is a set of generalized quantifiers, then by IFP(Q) we mean the extension of IFP with the formulae (10.20) for all the generalized quantifiers in Q. There is a “simple” way of getting a logic that captures Ptime: it is IFP(Qp), where Qp is the collection of all Ptime properties. However, this is cheating: we define the logic in terms of itself. But perhaps there is a nicely behaving set Q of generalized quantifiers such that IFP(Q) captures Ptime. The first result, showing that such a class – if it exists – will be hard to find, says the following. Proposition 10.28. Let Qn be a collection of generalized quantifiers of arity at most n. There there exists a vocabulary σn such that over σn-structures, IFP(Qn) fails to capture Ptime. The reason this result is not completely satisfactory is that the arity of relations in σn depends on n. For example, Proposition 10.28 says nothing about the impossibility of capturing Ptime over graphs. And in fact there is a collection Qgr of generalized binary quantifiers (i.e., of arity 2) such that IFP(Qgr) expresses all the Ptime properties of graphs (why?). In fact, one can even show that there is a single ternary generalized quantifier Q3 such that IFP(Q3) expresses all the Ptime properties of graphs (intuitively, it is possible to code Qgr with one ternary generalized quantifier), but Q3 itself is not Ptime-computable, and hence IFP(Q3) fails to capture Ptime on graphs. The existence of a generalized quantifier Q3 raises an intriguing possibility that for some finite collection Qfin of Ptime-computable generalized quantifiers, IFP(Qfin) captures Ptime on unordered graphs. However, this attempt to refute Gurevich’s conjecture does not work either. Theorem 10.29. There is no finite collection Qfin of Ptime-computable generalized quantifiers such that IFP(Qfin) captures Ptime on unordered graphs. Thus, given all that we know today, Gurevich’s conjecture may well be true, as it has withstood a number of attempts to produce a logic for Ptime over unordered structures. 206 10 Fixed Point Logics and Complexity Classes 10.8 Bibliographic Notes Inductive operators and fixed point logics are studied extensively in Moschovakis [185] in the context of arbitrary models. The systematic study of fixed point logics in finite model theory originated with Chandra and Harel [33], who introduced the least fixed point operator in the context of database query languages to overcome well-known limitations of FO. The subject is treated in detail in Ebbinghaus and Flum [60], Immerman [133], Grohe [106]; see also a recent survey by Dawar and Gurevich [51]. All of these references present the Tarski-Knaster theorem, least and inflationary fixed point logics, and simultaneous fixed points. The “even simple path” example is taken from Kolaitis [148], where it is attributed to Yannakakis. See also Exercise 10.2. The stage comparison theorem was proved in Moschovakis [185], and specialized for the finite case in Immerman [130] and Gurevich and Shelah [119]; the proof presented here follows Leivant [165]. Corollary 10.12 is from Gurevich and Shelah [119], and Corollary 10.13 from [130]. The connection between fixed point logics and polynomial time was discovered by several people in the early 1980s. Sazonov [212] showed in 1980 that a certain least fixed point construction – of recursive-theoretic flavor – captures Ptime. Then, in 1982, Immerman [129], Vardi [244], and Livchak [172] proved what is now known as the Immerman-Vardi theorem. Both Immerman’s and Vardi’s papers appeared in the proceedings of the STOC 1982 conference; Livchak’s paper was published in Russian and became known much later; hence Theorem 10.14 is usually referred to as the Immerman-Vardi theorem. In 1986, Immerman published a full version of his 1982 paper (see [130]). Theorem 10.15 is from Vardi [244]. Datalog has been studied extensively in the database literature, see, e.g., Abiteboul, Hull, and Vianu [3] for many additional results and references. Theorem 10.19 is from Papadimitriou [194]. Theorem 10.21 is from Immerman [130, 132]: the first of these papers showed that posTrCl captures NLog, and the other paper proved closure under complementation (see also Szelepcs´enyi [226]). A number of references discuss Gurevich’s conjecture in detail (e.g., Otto [191], Kolaitis [147], as well as [60]); they also discuss the notion of a “logic” suitable for capturing Ptime. Theorem 10.27 is from Cai, F¨urer, and Immerman [30] (see also Otto [191], as well as Gire and Hoang [91] for extensions). Theorem 10.29 is from Dawar and Hella [52]. Sources for exercises: Exercise 10.10: Ajtai and Gurevich [13] Exercise 10.11: Immerman [130] Exercises 10.12 and 10.13: Gr¨adel [97] Exercise 10.14: Immerman [131] Exercise 10.15: Gr¨adel and McColm [101] 10.9 Exercises 207 Exercise 10.16: Abiteboul and Vianu [5] Exercises 10.17 and 10.18: Afrati, Cosmadakis, and Yannakakis [8] Exercise 10.19: Gr¨adel and Otto [102] Exercises 10.20 and 10.21: Grohe [107] Exercise 10.22: Shmueli [220] and Cosmadakis et al. [43] Exercise 10.23: Marcinkowski [179] Exercise 10.24: Gottlob and Koch [94] Exercise 10.25: Gurevich, Immerman, and Shelah [118] Exercise 10.26: Dawar and Hella [52] Exercise 10.27: Dawar, Lindell, and Weinstein [54] 10.9 Exercises Exercise 10.1. Prove Proposition 10.3. Exercise 10.2. Prove that the simultaneous fixed point shown before Theorem 10.8 defines pairs of nodes connected by a simple path of even length. Hint: use Menger’s theorem in graph theory. Also show that this does not generalize to directed graphs. Exercise 10.3. Prove Theorem 10.8 for a system involving an arbitrary number of formulae. Exercise 10.4. Prove Theorem 10.10. Exercise 10.5. Prove Theorem 10.15. Exercise 10.6. Prove Theorem 10.19. Exercise 10.7. Prove that the combined complexity of LFP is Exptime-complete. Exercise 10.8. Consider an alternative semantics for Datalog programs. Given a set of rules Π and a structure A, an instantiation P of all the intensional predicates is called a model of Π on A if every rule of Π is satisfied. Show that for any Π, there exists a minimal, with respect to inclusion, model Pmin. The minimal model semantics of Datalog defines the answer to (Π, Q) on A as the interpretation of Q in Pmin. Prove that the fixed point and the minimal model semantics of Datalog coin- cide. Exercise 10.9. Write down the formulae ψi and ψq from the proof of the Immerman-Vardi theorem, and show that their simultaneous least fixed point computes the relations Ti and Hq. Exercise 10.10. Show that over finite structures, monotone and positive are two different concepts (they are known to be the same over infinite structures, see Lyndon [175]). That is, give an example of an FO formula ϕ(P, ·) which is monotone in P, but not equivalent to any FO formula positive in P. 208 10 Fixed Point Logics and Complexity Classes Exercise 10.11. Assume that the vocabulary σ contains at least two distinct constants. Prove a stronger normal form result for LFP: every LFP formula is equivalent to a formula of the form [lfpR,xϕ(R, x)](t), where ϕ is an FO formula. Hint: use two constants to eliminate nested fixed points. Exercise 10.12. Consider a restriction of SO that consists of formulae of the form QR1 . . . QRn∀x ^ l αl, where each Q is either ∃ or ∀, and each αl is Horn with respect to R1, . . . , Rn. That is, it is of the form γ1 ∧ . . . ∧ γm → β, where each γj either does not mention Ri’s, or is of the form Ri(u), and β is either of the form Ri(u), or false. We denote such restriction by SO-Horn. If all the quantifiers Q are existential, we speak of ∃SO-Horn. Prove that over ordered structures, SO-Horn and ∃SO-Horn capture Ptime. Exercise 10.13. The class SO-Krom is defined similarly to SO-Horn, except that each αl is a disjunction of at most two atoms of the form Ri(u) or ¬Rj (u), and a formula that does not mention the Ri’s. ∃SO-Krom is defined as the restriction where all second-order quantifiers are existential. Prove that both SO-Krom and ∃SO-Krom capture NLog over ordered struc- tures. Exercise 10.14. Define a variant of the transitive closure logic, denoted by DetTrCl, where the transitive closure operator trcl is replaced by the deterministic transitive closure. When applied to a graph V, E , it finds pairs (a, b) which are connected by a deterministic path: on such a path, every node except b must be of out-degree 1. Prove that DetTrCl captures DLog over ordered structures. Exercise 10.15. Prove that over unordered structures, DetTrCl TrCl LFP. Exercise 10.16. Consider the following language that computes queries over STRUCT[σ]. Given an input structure A, its programs compute sequences of relations, and are defined inductively as follows: • ∅ is a program that computes no relation. • If Π(R1, . . . , Rn) is a program that computes relations R1, . . . , Rn, where R1, . . . , Rn ∈ σ, then Π(R1, . . . , Rn); R(x) :– ϕ(x); where R ∈ σ ∪ {R1, . . . , Rn}, and ϕ is an FO formula in the vocabulary of σ expanded with R1, . . . , Rn, is a program that computes relations R1, . . . , Rn, R, with R obtained by evaluating ϕ on the expansion of A with R1, . . . , Rn, R. • If Π(R1, . . . , Rn) is a program that computes relations R1, . . . , Rn, and Π′ (T1, . . . , Tk) is a program over STRUCT[σ ∪ {R1, . . . , Rn} ∪ {S1, . . . , Sk}], where the arity of each Si matches the arity of Ti, then Π(R1, . . . , Rn); while change do Π′ (T1, . . . , Tk) end; 10.9 Exercises 209 is a program that computes (R1, . . . , Rn, T1, . . . , Tk) over σ-structures. The meaning of the last statement is that starting with (∅, . . . , ∅) as the interpretation of the Si’s, one iterates Π′ ; it computes the Ti’s, which are then reused as Si’s, and so on. This is done as long as it changes one relation among the Si’s. If this program terminates, the values of the relations (T1, . . . , Tk) in that state become the output. For example, the while loop while change do T(x, y) :– E(x, y) ∨ ∃z (E(x, z) ∧ S(z, y)) end; computes the transitive closure of E. Prove that over ordered structures, such while programs compute precisely the Pspace queries. Exercise 10.17. Let monotone Ptime be the class of all monotone Ptime properties. Show that Datalog, even in the presence of a successor relation, fails to capture monotone Ptime. Hint: Let σ = {R, S}, where R is ternary, and S is unary. The separating query is defined as follows: Q is true in A iff the system of linear equations {x1 + x2 + x3 = 1 | (x1, x2, x3) ∈ RA } ∪ {x = 0 | x ∈ SA } does not have a non-negative solution. Exercise 10.18. Prove that without the successor relation, Datalog¬ fails to capture Ptime on ordered structures, even if one allows atoms ¬(x = y). Hint: The separating query takes a graph, and outputs pairs of nodes (a, b) such that there is a path from a to b whose length is a perfect square. Exercise 10.19. Show how to expand Datalog with counting, and prove that the resulting language is equivalent to the expansion of IFP with counting. Exercise 10.20. Prove that the expansion of IFP with counting captures Ptime on the class of planar graphs. Exercise 10.21. Prove that the class of planar graphs is definable in IFP. Exercise 10.22. You may recall that containment of conjunctive queries is NPcomplete (Exercise 6.19). Prove that containment of arbitrary Datalog queries is undecidable, but becomes decidable if all intensional predicates are unary. Exercise 10.23. We say that a Datalog program Π is uniformly bounded if there is a number n such that on every structure A, the fixed point of FΠ is reached after at most n steps. Prove that uniform boundedness is undecidable for Datalog, even for programs that consist of a single rule. Exercise 10.24. Consider trees represented as in Chap. 7, i.e., structures with two successor predicates, labeling predicates, and, furthermore, assume that we have unary predicates Leaf and Root interpreted as the set of leaves, and the singleton set containing the root. 210 10 Fixed Point Logics and Complexity Classes Define monadic Datalog as the restriction of Datalog where all intensional predicates are unary. Prove that over trees, Boolean and unary queries definable in monadic Datalog and in MSO are precisely the same. In particular, a tree language is definable in monadic Datalog iff it is regular. Exercise 10.25. Prove that there exists a class C of graphs which admits fixed points of unbounded depth (i.e., for every n there is an inductive operator that reaches its fixed point on some graph from C in at least n iterations), and yet LFP = FO on C. Remark: this exercise says that it is possible for LFP and FO to coincide on a class of graphs which admits fixed points of unbounded depth. The negation of this was known as McColm’s conjecture; hence the goal of this exercise is to disprove McColm’s conjecture. McColm [181] made two conjectures relating boundedness of fixed points and collapse of logics; the second conjecture that talks about FO and the finite variable logic is known to be true (see Exercise 11.19). For the next three exercises, consider the following statement, known as the ordered conjecture (see Kolaitis and Vardi [153]): If C is an infinite class of finite ordered structures, then FO LFP on C. Exercise 10.26. Prove that if the ordered conjecture does not hold, then Ptime = Pspace. Exercise 10.27. Prove that if the ordered conjecture holds, then Linh = Etime. Here Linh is the linear time hierarchy: the class of languages computed in linear time by alternating Turing machines, with a constant number of alternations, and Etime is the class of languages computed by deterministic Turing machines in time 2O(n) . Exercise 10.28.∗ Does the ordered conjecture hold? 11 Finite Variable Logics In this chapter, we introduce finite variable logics: a unifying tool for studying fixed point logics. These logics use infinitary connectives already seen in Chap. 8, but here we impose a different restriction: each formula can use only finitely many variables. We show that fixed point logics LFP, IFP, and PFP can be embedded in such a finite variable logic. Furthermore, the finite variable logic is easier to study: it can be characterized by games, and this gives us bounds on the expressive power of fixed point logics; in particular, we show that without a linear ordering, they fail to capture complexity classes. We then study definability and ordering of types in finite variable logics, and use these techniques to relate separating complexity classes to separating some fixed point logics over unordered structures. 11.1 Logics with Finitely Many Variables Let us revisit the example of the transitive closure of a relation. Suppose E is a binary relation. We know how to write FO formulae ϕn(x, y) stating that there is a path from x to y of length n (that is, formulae defining the stages of the fixed point computation of the transitive closure). One can express ϕn(x, y), n > 1, as ∃x1 . . . ∃xn−1 E(x, x1)∧. . .∧E(xn−1, y) , and ϕ1(x, y) as E(x, y). If we could use infinitary disjunctions (i.e., the logic L∞ω of Chap. 8), we could express the transitive closure query by n≥1 ϕn(x, y). (11.1) One could even define ϕn(x, y) by induction, as we did in Chap. 10: ϕ1(x, y) ≡ E(x, y), ϕn+1(x, y) ≡ ∃zn E(x, zn) ∧ ϕn(zn, y) , (11.2) where zn is a fresh variable. The problem with either definition of the ϕn’s together with (11.1) is that the logic L∞ω is useless in the context of finite 212 11 Finite Variable Logics model theory: as we saw in Chap. 8, it defines every property of finite structures (Proposition 8.4). However, if we look carefully at the definition of the ϕn’s given in (11.2), we can see that there is no need to introduce a fresh variable zn for each new formula. In fact, we can define formulae ϕn as follows: ϕ1(x, y) ≡ E(x, y) . . . . . . . . . ϕn+1(x, y) ≡ ∃z E(x, z) ∧ ∃x z = x ∧ ϕn(x, y) . (11.3) In definition (11.3), each formula ϕn uses only three variables, x, y, and z, by carefully reusing them. To define ϕn(x, y), we need to say that there is a z such that E(x, z) holds, and ϕn(z, y) holds. But with three variables, we only know how to say that ϕn(x, y) holds. So once z is used in E(x, z), it is no longer needed, and we replace it by x: that is, we say that there is an x such that x happens to be equal to z, and ϕn(x, y) holds: and we know that the latter is definable with three variables. With these formulae (11.3), we can still define the transitive closure by (11.1). What makes the difference now is the fact that the resulting formula only uses three variables. If one checks the proof of Proposition 8.4, one discovers that, to define arbitrary classes of finite structures in L∞ω, one needs, in general, infinitely many variables. So perhaps an infinitary logic in which the number of variables is finite could be useful after all? The answer to this question is a resounding yes: we shall see that all fixed point logics can be coded in a way very similar to (11.3), and that the resulting infinitary logic can be analyzed by the same techniques we have seen in previous chapters. Definition 11.1 (Finite variable logics). The class of FO formulae that use at most k distinct variables will be denoted by FOk . The class of L∞ω formulae that use at most k variables will be denoted by Lk ∞ω (reminder: L∞ω extends FO with infinitary conjunctions and disjunctions ). Finally, we define the finite variable infinitary logic Lω ∞ω by Lω ∞ω = k∈N Lk ∞ω. That is, Lω ∞ω has formulae of L∞ω that only use finitely many variables. The quantifier rank qr(·) of Lω ∞ω formulae is defined as for FO for Boolean connectives and quantifiers; for infinitary connectives, we define qr( i ϕi) = qr( i ϕi) = sup i qr(ϕi). Thus, in general the quantifier rank of an infinitary formula is an ordinal. For example, if the ϕn’s are FO formulae with qr(ϕn) = n, then 11.1 Logics with Finitely Many Variables 213 qr( n<ω ϕn) = ω, and qr(∃x n<ω ϕn) = ω + 1. When we establish a normal form for Lω ∞ω, we shall see that over finite structures it suffices to consider only formulae of quantifier rank up to ω. Let us give a few examples of definability in Lω ∞ω. We first consider linear orderings: that is, the vocabulary contains one binary relation <. With the same trick of reusing variables, we define the formulae ψ1(x) ≡ (x = x) . . . . . . . . . ψn+1(x) ≡ ∃y (x > y) ∧ ∃x y = x ∧ ψn(x) . (11.4) The formula ψn(a) is true in a linear order L iff the set {b | b ≤ a} contains at least n elements. Indeed, ψ1(x) is true for every x, and ψn+1(x) says that there is y < x such that there are at least n elements that do not exceed y. Thus, for each n we have a sentence Ψn ≡ ∃x ψn(x) that is true in L iff |L|≥ n. Now let C be an arbitrary subset of N. Consider the sentence n∈C Ψn ∧ ¬Ψn+1 . This is a sentence of L2 ∞ω, as it uses only two variables, x and y, and it is true in L iff |L|∈ C. Hence, arbitrary cardinalities of linear orderings can be tested in L2 ∞ω. Next, consider fixed point computations. Suppose that an FO formula ϕ(R, x) defines an inductive operator; that is, either ϕ is monotone in R, or we are considering an inflationary fixed point. We have seen in Chap. 10 that stages of the fixed point computation can be defined by FO formulae ϕn (x); the formulae we used, however, may potentially involve arbitrarily many variables. To be able to express the least fixed point as n ϕn (x), we need to define those formulae ϕn (x) more carefully. Assume that ϕ, in addition to x = (x1, . . . , xk), uses variables z1, . . . , zl. We introduce additional variables y = (y1, . . . , yk), and define ϕ0 (x) as ¬(x1 = x1) (i.e., false), and then inductively ϕn+1 (x) as ϕ(R, x) in which every occurrence of R(u1, . . . , uk), where u1, . . . , uk are variables among x and z, is replaced by ∃y (y = u) ∧ ∃x((x = y) ∧ ϕn (x)) . (11.5) As usual, x = y is an abbreviation for (x1 = y1)∧. . .∧(xk = yk) . Notice that in the resulting formula, variables from y cannot appear in any subformula of the form R(·). The effect of the substitution is that we use ϕ with R being given the interpretation of the nth stage, so n ϕn (x) does compute the fixed point. 214 11 Finite Variable Logics Furthermore, we at most doubled the number of variables in ϕ. Hence, if ϕ ∈ FOm , then both lfpR,xϕ and ifpR,xϕ are expressible in L2m ∞ω. If we have a complex fixed point formula (e.g., involving nested fixed points), we can then apply the construction inductively, using the same substitution (11.5), since ϕn need not be an FO formula, and can have infinitary connectives. This shows that every LFP or IFP formula is equivalent to a formula of Lω ∞ω (since for every fixed point, we at most double the number of variables). Hence, we have the following. Theorem 11.2. LFP, IFP, PFP ⊆ Lω ∞ω. Proof. We have proved it already for LFP and IFP; for PFP, the construction is modified slightly: instead of taking the disjunction of all the ϕn ’s, we define the sentence goodn as ∀x ϕn (x) ↔ ϕn+1 (x) (indicating that the fixed point was reached). Then [pfpR,xϕ](y) is expressed by ψ(y) ≡ n∈N goodn ∧ ϕn (x) . Indeed, if there is no n such that goodn holds, then the partial fixed point is the empty set, and ψ(y) is equivalent to false. Otherwise, let n0 be the smallest natural number n for which goodn holds. Then, for all m ≥ n0, we have ∀x ϕn0 (x) ↔ ϕm (x) , and hence ψ(y) defines the partial fixed point. Therefore, ψ defines pfpR,xϕ, and it at most doubles the number of variables. Using this construction inductively, we see that PFP ⊆ Lω ∞ω. We now revisit the case of orderings. We have shown before that arbitrary cardinalities of linear orderings are definable in Lω ∞ω; in other words, every query on finite linear orderings is Lω ∞ω-definable. It turns out that this extends to all ordered structures. Proposition 11.3. Every query over ordered finite σ-structures is expressible in Lω ∞ω. In fact, if m is the maximum arity of a relation symbol in σ, then it suffices to use Lm+1 ∞ω . Proof. To keep the notation simple, we consider ordered graphs G = V, E , with a linear ordering < on V (i.e., m = 2, and in this case we show definability in L3 ∞ω). Recall that we have an L2 ∞ω formula ψn(x), that uses variables x, y, and tests if there are at least n elements in V which do not exceed x in the ordering <. Hence, for each n we have an L2 ∞ω formula ψ=n(x) which holds iff x is the nth element in the ordering <. Now, for each G we define a formula χG as ∀x∀z E(x, z) ↔ (i,j)∈E ψ=i(x) ∧ ψ=j(z) ∧ ∃x ψp(x) ∧ ¬∃x ψp+1(x), viewing the universe V of cardinality p as {1, . . . , p}. Here ψ=j(z) is obtained from ψ=j(x) by replacing x by z; that is, this formula uses variables z and y. 11.2 Pebble Games 215 Note that χG ∈ L3 ∞ω and G′ |= χG iff G′ is isomorphic to G (as an ordered graph). Finally, for a class P of ordered graphs, we let ΦP ≡ G∈P χG. Clearly, this formula defines P. 11.2 Pebble Games In this section we present Ehrenfeucht-Fra¨ıss´e-style games which characterize finite variable logics. There are two elements of these games that we have not seen before. First, these are pebble games: the spoiler and the duplicator have a fixed set of pairs of pebbles, and each move consists of placing a pebble on an element of a structure, or removing a pebble and placing it on another element. Second, the game does not have to end in a finite number of rounds (but we can still determine who wins it). Definition 11.4 (Pebble games). Let A, B ∈ STRUCT[σ]. A k-pebble game over A and B is played by the spoiler and the duplicator as follows. The players have a set of pairs of pebbles {(p1 A, p1 B), . . . , (pk A, pk B)}. In each move, the following happens: • The spoiler chooses a structure, A or B, and a number 1 ≤ i ≤ k. For the description of the other moves, we assume the spoiler has chosen A. The other case, when the spoiler chooses B, is completely sym- metric. • The spoiler places the pebble pi A on some element of A. If pi A was already placed on A, this means that the spoiler either leaves it there or removes it and places it on some other element of A; if pi A was not used, it means that the spoiler picks that pebble and places it on an element of A. • The duplicator responds by placing pi B on some element of B. We denote the game that continues for n rounds by PGn k (A, B), and the game that continues forever by PG∞ k (A, B). After each round of the game, the pebbles placed on A and B define a relation F ⊆ A × B: if pi A, for some i ≤ k, is placed on a ∈ A and pi B is placed on b ∈ B, then the pair (a, b) is in F. The duplicator has a winning strategy in PGn k (A, B) if he can ensure that after each round j ≤ n, the relation F defines a partial isomorphism. That is, F is a graph of a partial isomorphism. In this case we write A ≡∞ω k,n B. The duplicator has a winning strategy in PG∞ k (A, B) if he can ensure that after every round the relation F defines a partial isomorphism. This is denoted by A ≡∞ω k B. 216 11 Finite Variable Logics L4L5 L5 L5 L5L4 L4 L4 (a) (b) (c) (d) Fig. 11.1. Spoiler winning the pebble game on L5 and L4 These games characterize finite variable logics as follows. Theorem 11.5. a) Two structures A, B ∈ STRUCT[σ] agree on all sentences of Lk ∞ω of quantifier rank up to n iff A ≡∞ω k,n B. b) Two structures A, B ∈ STRUCT[σ] agree on all sentences of Lk ∞ω iff A ≡∞ω k B. Before we prove this theorem, we give a few examples of pebble games. First, consider two arbitrary linear orderings Ln, Lm of lengths n and m, n = m. Here we show that it is the spoiler who wins PG∞ 2 (Ln, Lm). The strategy for L5 and L4 is shown in Fig. 11.1; the general strategy is exactly the same. We have two pairs of pebbles, and elements pebbled by pebble 1 are shown as circled, and those pebbled by pebble 2 are shown in dashed boxes. The spoiler starts by placing pebble 1 on the top element of L5; the duplicator is forced to respond by placing the matching pebble on the top element of L4. Then the spoiler places the second pebble on the second element of L5, and the duplicator matches it in L4 (if he does not, he loses in the next round). This is the configuration shown in Fig. 11.1 (a). Next, the spoiler removes pebble 1 from the top element of L5 and places it on the third element. The spoiler is forced to mimic the move in L4, to preserve the order relation. We are now in the position shown in Fig. 11.1 (b). The spoiler then moves the second pebble two levels down; the duplicator matches it. We are now in position (c). At this point the spoiler places pebble 1 on the last element of L5, and the duplicator has no place for the matching pebble, and thus he loses in the position shown in Fig. 11.1 (d). Note that we could not have expected any other result here, since we know that all queries over finite linear orderings are expressible in L2 ∞ω; hence, the duplicator should not be able to win PG∞ 2 (Ln, Lm) unless n = m. 11.2 Pebble Games 217 As another example, consider structures of the empty vocabulary: that is, just sets. We claim the following: if |A|, |B| ≥ k, then the duplicator wins PG∞ k (A, B); in other words, A ≡∞ω k B. Indeed, the strategy for the duplicator is very similar to his strategy in the Ehrenfeucht-Fra¨ıss´e game: at all times, he has to maintain the condition that pi A and pj A are placed on the same element iff pi B and pj B are placed on the same element. Since both sets have at least k elements, this condition is easily maintained, and the duplicator can win the infinite game. This gives us the following. Corollary 11.6. The query even is not expressible in Lω ∞ω. Proof. Assume, to the contrary, that even is expressible by a sentence Φ of Lω ∞ω. Let k be such that Φ ∈ Lk ∞ω. Choose two sets A and B of cardinalities k and k + 1, respectively. By the above, A ≡∞ω k B and hence A |= Φ iff B |= Φ. This, however, contradicts the assumption that Φ defines even. From Corollary 11.6, we derive a result mentioned, but not proved, in Chap. 10. Corollary 11.7. • LFP (LFP+<)inv. • IFP (IFP+<)inv. • PFP (PFP+<)inv. Proof. Since LFP, IFP, PFP ⊆ Lω ∞ω, none of them defines even; however, over ordered structures these logics capture Ptime and Pspace, and hence can define even. Before proving Theorem 11.5, we make two additional observations. First, consider an infinitary disjunction ϕ ≡ i∈I ϕi, where all ϕi are FO formulae, and assume that qr(ϕ) ≤ n. This means that qr(ϕi) ≤ n for all i ∈ I. We know that, up to logical equivalence, there are only finitely many different FO formulae of quantifier rank n. Hence, there is a finite subset I0 ⊂ I such that ϕ is equivalent to i∈I0 ϕi; that is, to an FO formula. Using this argument inductively on the structure of Lω ∞ω formulae, we conclude that for every k, every Lk ∞ω formula of quantifier rank n is equivalent to an FOk formula of the same quantifier rank. Hence, if A and B agree on all FOk sentences of quantifier rank at most n, then A ≡∞ω k,n B. Now assume that A and B agree on all FOk sentences. That is, for every n, we have A ≡∞ω k,n B. Since A and B are finite, so is the number of different maps from Ak to Bk , and hence every infinite strategy in PG∞ k (A, B) is completely determined by a finite strategy for sufficiently large n: the one in which all (finitely many) possible configurations of the game appeared. Thus, for sufficiently large n (that depends on A and B), winning PGn k (A, B) implies winning PG∞ k (A, B). We therefore obtain the following. Proposition 11.8. For every two structures A, B, the following are equiva- lent: 218 11 Finite Variable Logics 1. A and B agree on all FOk sentences, and 2. A and B agree on all Lk ∞ω sentences. The second observation is about formulae with free variables. We write (A, a) ≡∞ω k,n (B, b) (or (A, a) ≡∞ω k (B, b)), where | a | = | b | = m ≤ k, if the duplicator wins the game PGn k (A, B) (or PG∞ k (A, B)) from the position where the first m pebbles have been placed on the elements of a and b respectively. A slight modification of the proof of Theorem 11.5 shows the following. Corollary 11.9. Given two structures, A, B, and a ∈ Am , b ∈ Bm , m ≤ k, a) (A, a) ≡∞ω k,n (B, b) iff for every ϕ(x) ∈ Lk ∞ω with qr(ϕ) ≤ n, it is the case that A |= ϕ(a) ⇔ B |= ϕ(b). b) (A, a) ≡∞ω k (B, b) iff for every ϕ(x) ∈ Lk ∞ω, it is the case that A |= ϕ(a) ⇔ B |= ϕ(b). We are now ready to prove Theorem 11.5. As with the Ehrenfeucht-Fra¨ıss´e theorem, we shall use a certain back-and-forth property in the proof. We start with a few definitions. Given a partial map f : A → B, its domain and range will be denoted by dom(f) and rng(f); that is, f is defined on dom(f) ⊆ A, and f(dom(f)) = rng(f) ⊆ B. We let symbols α and β range over finite and infinite ordinals. Given two structures A and B and an ordinal β, let Iβ be a set of partial isomorphisms between A and B, and let Iα = {Iβ | β < α}. We say that Iα has the k-back-and-forth property if the following conditions hold: • Every set Iβ is nonempty. • Iβ′ ⊆ Iβ for β < β′ . • Each Iβ is downward-closed: if g ∈ Iβ and f ⊆ g (i.e., dom(f) ⊆ dom(g), and f and g coincide on dom(f)), then f ∈ Iβ. • If f ∈ Iβ+1 and |dom(f)| < k, then forth: for every a ∈ A, there is g ∈ Iβ such that f ⊆ g and a ∈ dom(g); back: for every b ∈ B, there is g ∈ Iβ such that f ⊆ g and b ∈ rng(g). As before, games are nothing but a reformulation of the back-and-forth property. Indeed, for a finite α, having a family Iα with the k-back-and-forth property is equivalent to A ≡∞ω k,α−1 B: the collection Iβ simply consists of configurations from which the duplicator wins with β moves remaining. This also suffices for infinitely long games: as we remarked earlier, for every two finite structures A and B, and for some n, depending on A and B, it is the case that A ≡∞ω k,n B implies A ≡∞ω k B. Furthermore, if we have a sufficiently 11.2 Pebble Games 219 long finite chain Iα, some Iβ’s will be repeated, as there are only finitely many partial isomorphisms between A and B. Hence, such a chain can then be extended to arbitrary ordinal length. Therefore, it will be sufficient to establish equivalence between indistinguishability in Lk ∞ω and the existence of a family of partial isomorphisms with the k-back-and-forth property. This is done in the following lemma. Lemma 11.10. Given two structures A and B, they agree on all sentences of Lk ∞ω of quantifier rank < α iff there is a family Iα = {Iβ | β < α} of partial isomorphisms between A and B with the k-back-and-forth property. In the rest of the section, we prove Lemma 11.10. Suppose A and B agree on all sentences of Lk ∞ω of quantifier rank < α. Let β < α. Define Iβ as the set of partial isomorphisms f with |dom(f)|≤ k such that for every ϕ ∈ Lk ∞ω with qr(ϕ) ≤ β, and every a contained in dom(f), A |= ϕ(a) ⇔ B |= ϕ(f(a)). We show that Iα = {Iβ | β < α} has the k-back-and-forth property. Since A and B agree on all sentences of Lk ∞ω of quantifier rank < α, each Iβ is nonempty as it contains the empty partial isomorphism. The containment Iβ′ ⊆ Iβ for β < β′ is immediate from the definition, as is downward-closure. Thus, it remains to prove the back-and-forth property. Assume, to the contrary, that we found f ∈ Iβ+1, with β+1 < α, such that |dom(f)| = m < k, and f violates the forth condition. That is, there exists a ∈ A such that there is no g ∈ Iβ extending f with a ∈ dom(g). In this case, by the definition of Iβ, for every b ∈ B we can find a formula ϕb(x0, x1, . . . , xm) of quantifier rank at most β such that for some a1, . . . , am ∈ dom(f), we have A |= ϕb(a, a1, . . . , am) and B |= ¬ϕb(b, f(a1), . . . , f(am)). Now let ϕ(x1, . . . , xm) ≡ ∃x0 b∈B ϕb(x0, x1, . . . , xm). Clearly, A |= ϕ(a1, . . . , am), but B |= ¬ϕ(f(a1), . . . , f(am)), which contradicts our assumption f ∈ Iβ+1 (since qr(ϕ) ≤ β + 1). The case when f violates the back condition is handled similarly. For the other direction, assume that we have a family Iα with the k-backand-forth property. We use (transfinite) induction on β to show that for every ϕ(x1, . . . , xm) ∈ Lk ∞ω, m ≤ k, with qr(ϕ) ≤ β < α, for every f ∈ Iβ, a1, . . . , am ∈ dom(f) : A |= ϕ(a1, . . . , am) ⇔ B |= ϕ(f(a1), . . . , f(am)). (11.6) Clearly, (11.6) suffices, since it implies that A and B agree on Lk ∞ω sentences of quantifier rank < α. 220 11 Finite Variable Logics The basis case is β = 0. Then ϕ is a Boolean combination of atomic formulae (for finite quantifier ranks, as we saw, infinitary connectives are superfluous), and hence (11.6) follows from the assumption that f is a partial isomorphism. We now use induction on the structure of ϕ. The case of Boolean combinations is trivial. If ϕ ≡ i ϕi and qr(ϕ) > qr(ϕi) for all i, then β is a limit ordinal and again (11.6) for ϕ easily follows by applying the hypothesis to all the ϕi’s of smaller quantifier rank. Thus, it remains to consider the case of ϕ(x1, . . . , xm) ≡ ∃x0 ψ(x0, . . . , xm), with qr(ϕ) = β + 1 and qr(ψ) = β for some β with β + 1 < α. We can assume without loss of generality that x0 is not among x1, . . . , xm (exercise: why?) and hence m < k. Let f ∈ Iβ+1 and a1, . . . , am ∈ dom(f). Assume that A |= ϕ(a1, . . . , am); that is, for some a0 ∈ A, A |= ψ(a0, a1, . . . , am). Since Iβ+1 is downwardclosed, we can further assume that dom(f) = {a1, . . . , am}. Since |dom(f)| = m < k, by the k-back-and-forth property we find g ∈ Iβ extending f such that a0 ∈ dom(g). Applying (11.6) inductively to ψ, we derive B |= ψ(g(a0), g(a1), . . . , g(am)). That is, B |= ψ(g(a0), f(a1), . . . , f(am)) since f and g agree on a1, . . . , am. Hence, B |= ϕ(f(a1), . . . , f(am)). The other direction, that B |= ϕ(f(a1), . . . , f(am)) implies A |= ϕ(a1, . . . , am), is completely symmetric. This finishes the proof of (11.6), Lemma 11.10, and Theorem 11.5. 11.3 Definability of Types For logics like FO and MSO, we have used rank-k types, which are collections of all formulae of quantifier rank k that hold in a given structure. An extremely useful feature of types is that they can be defined by formulae of quantifier rank k, and we have used this fact many times. When we move to finite variable logics, the role of parameter k is played by the number of variables rather than the quantifier rank. We can, therefore, define, FOk -types, but then it is not immediately clear if every such type is itself definable in FOk . In this section we prove that this is the case. As with the case of FO or MSO types, this definability result proves very useful, and we derive some interesting corollaries. In particular, we establish a normal form for Lk ∞ω, and prove that every class of finite structures that is closed under ≡∞ω k is definable in Lk ∞ω. Definition 11.11 (FOk -types). Given a structure A and a tuple a, the FOk -type of (A, a) is tpFOk (A, a) = {ϕ(x) ∈ FOk | A |= ϕ(a)}. An FOk -type is any set of formulae of FOk of the form tpFOk (A, a). 11.3 Definability of Types 221 One could have defined Lk ∞ω-types as well, as the set of all Lk ∞ω formulae that hold in (A, a). This, however, would be unnecessary, since every FOk type completely determines the Lk ∞ω-type: this follows from Proposition 11.8 stating that two structures agree on all Lk ∞ω formulae iff they agree on all FOk formulae. Note that unlike in the cases of FO and MSO, the number of different FOk -types need not be finite, since we do not restrict the quantifier rank. In fact we saw in the example of finite linear orderings that there are infinitely many different FO2 -types, since every finite cardinality of a linear ordering can be characterized by an FO2 sentence. Each FOk -type τ is trivially definable in Lk ∞ω by ϕ∈τ ϕ. More interestingly, we can show that FOk -types are definable without infinitary connectives. Theorem 11.12. For every FOk -type τ, there is an FOk formula ϕτ (x) such that, for every structure A, tpFOk (A, a) = τ ⇔ A |= ϕτ (a). Before we prove Theorem 11.12, let us state a few corollaries. First, restricting our attention to sentences, we obtain the following. Corollary 11.13. For every structure A, there is a sentence ΨA of FOk such that for any other structure B, we have B |= ΨA iff A ≡∞ω k B. We know that without restrictions on the number of variables, we can write a sentence that tests if B is isomorphic to A, and this is why the full infinitary logic defines every class of finite structures. Corollary 11.13 shows that, rather than testing isomorphism as in the full infinitary logic, in Lk ∞ω one can write a sentence that tests ≡∞ω k -equivalence. We can also see that closure under ≡∞ω k is sufficient for definability in Lk ∞ω. Corollary 11.14. If a class C of structures is closed under ≡∞ω k (i.e., A ∈ C and A ≡∞ω k B imply B ∈ C), then C is definable in Lk ∞ω. Proof. Let T be the collection of Lk ∞ω-types τ such that there is a structure A in C with tpFOk (A) = τ. From closure under ≡∞ω k it follows that τ∈T ϕτ defines C. Definability of Lk ∞ω-types also yields a normal form result, stating that only countable disjunctions of FOk formulae suffice. Corollary 11.15. Every Lk ∞ω formula is equivalent to a single countable disjunction of FOk formulae. 222 11 Finite Variable Logics Proof. Let ϕ(x) be an Lk ∞ω formula. Consider the set Cϕ = {(A, a) | A |= ϕ(a)}, such that no two elements of Cϕ are isomorphic (this ensures that Cϕ is countable, since there are only countably many isomorphism types of finite structures). Let ϕA,a(x) be the FOk formula defining tpFOk (A, a). Let ψ(x) ≡ (A,a)∈Cϕ ϕ(A,a)(x). We claim that ϕ and ψ are equivalent. Suppose B |= ϕ(b). Let (B′ , b′ ) be an isomorphic copy of (B, b) present in Cϕ. Then B′ |= ϕ(B′,b′)(b′ ) and thus B′ |= ψ(b′ ) and B |= ψ(b). Conversely, if B |= ψ(b), then for some A and a with A |= ϕ(a), we have tpFOk (A, a) = tpFOk (B, b); that is, (A, a) ≡∞ω k (B, b). Since ϕ is an Lk ∞ω formula, this implies B |= ϕ(b), showing that ϕ and ψ are equivalent. Since the negation of an Lk ∞ω formula is an Lk ∞ω formula, we obtain a dual result. Corollary 11.16. Every Lk ∞ω formula is equivalent to a single countable conjunction of FOk formulae. We now present the proof of Theorem 11.12. To keep the notation simple, we look at the case when there are no free variables; that is, we deal with tpFOk (A). Another assumption that we make is that the vocabulary σ is purely relational. Adding free variables and constant symbols poses no problem (Exercise 11.1). Fix a structure A, and let A≤k be the set of all tuples of elements of A of length up to k. For any a = (a1, . . . , al) ∈ A≤k , where l ≤ k, we define a formula ϕm a (x1, . . . , xl). Intuitively, these formulae will have the property that they precisely characterize what one can say about a in FOk , with quantifier rank at most m: that is, B |= ϕm a (b) iff (A, a) and (B, b) agree on all the FOk formulae of quantifier rank up to m. To define these formulae, consider partial functions h : {x1, . . . , xk} → A, and first define formulae ϕm h (y), with free variables y being those in dom(h), as follows: • ϕ0 h(y) is the conjunction of all atomic and negated atomic formulae true in A of h(y). • To define ϕm+1 h (y), consider two cases: 1. Suppose |dom(h)|< k. Let i be the least index such that xi ∈ dom(h), and ha be the extension of h defined on dom(h) ∪ {xi} such that ha(xi) = a. Then ϕm+1 h (y) ≡ ϕm h (y) ∧ a∈A ∃xi ϕm ha (y, xi) ∧ ∀xi a∈A ϕm ha (y, xi). 11.3 Definability of Types 223 2. Suppose |dom(h)| = k. Let hi be the restriction of h which is not defined only on xi. Then ϕm+1 h (x) ≡ ϕm h (x) ∧ k i=1 ϕm+1 hi (xi), where xi is x with the variable xi excluded. Finally, we define ϕm a (x1, . . . , xl) as ϕm h (x), where h is given by h(xi) = ai, for i = 1, . . . , l. To show that formulae ϕm a do what they are supposed to do, we show that if they hold, a certain sequence of sets of partial isomorphisms with the k-back-and-forth property must exist. Lemma 11.17. Let a = (a1, . . . , al) ∈ A≤k . Then B |= ϕm a (b) iff there exists a collection Im = {I0, I1, . . . , Im} of sets of partial isomorphism between A and B with the k-back-and-forth property such that Im ⊆ Im−1 ⊆ . . . ⊆ I0, and g = {(a1, b1), . . . , (al, bl)} ∈ Im. Proof of Lemma 11.17. Since qr(ϕm a ) = m and A |= ϕm a (a), the existence of Im implies, by Lemma 11.10, that B |= ϕm a (b). For the converse, we establish the existence of Im by induction on m. If m = 0, we let I0 consist of all the restrictions of g. Clearly, I0 is not empty, and since g is a partial isomorphism (because, by the assumption, B |= ϕ0 a(b), and thus a and b satisfy the same atomic formulae), all elements of I0 are partial isomorphisms. For the induction step, to go from m to m + 1, we distinguish two cases. Case 1: l < k. From B |= ϕm+1 a (b) and the definition of ϕm+1 a it follows that B |= ϕm a (b), and thus we have, by the induction hypothesis, a sequence I′ m = {I′ 0, . . . , I′ m} of partial isomorphisms with the k-back-and-forth property such that g ∈ I′ m. Looking at the second conjunct of ϕm+1 a and applying the induction hypothesis for m, we see that for every a ∈ A there exists b ∈ B and a sequence Ia m = {Ia 0 , . . . , Ia m} of partial isomorphisms with the k-back-and-forth property such that ga,b = {(a1, b1), . . . , (al, bl), (a, b)} ∈ Ia m. We now define: Ii = I′ i ∪ a∈A Ia i for i ≤ m Im+1 = {f | f ⊆ g}. It is easy to see that component-wise unions like this preserve the k-backand-forth property. Furthermore, since g ∈ I′ m, then Im+1 ⊆ I′ m ⊆ Im. Thus, we only have to check the k-back-and-forth property with respect to Im+1 and Im. But this is guaranteed by the second and the third conjunct of ϕm+1 a . 224 11 Finite Variable Logics Indeed, consider g and a ∈ A − dom(g). Since B |= ϕm+1 a (b), by the second conjunct we see that B |= ∃xϕm aa(b, x) and hence for some b ∈ B, we have B |= ϕm aa(bb). But then g ∪ {(a, b)} ∈ I′ m ⊆ Im. The back property is proved similarly. This completes the proof for case 1. Case 2: l = k. By the definition of ϕm+1 a for the case of l = k, we see that B |= ϕm a (b), and hence by the induction hypothesis, g is a partial isomorphism. For each i ≤ k, let gi be g without the pair (ai, bi). Applying the argument for the case l < k to each gi, we get a sequence of partial isomorphisms {Ii 0, . . . , Ii m+1} with the k-back-and-forth property such that Ii m+1 ⊆ . . . ⊆ Ii 0. Now we define Ij = {g} ∪ k i=1 Ii j, j ≤ m + 1. One can easily verify all the properties of a sequence of partial isomorphisms with the k-back-and-forth property: in fact, all of the properties are preserved under component-wise union, and since |dom(g)| = k, the k-back-and-forth extension for g is not required. This completes the proof case 2 and Lemma 11.17. For each a ∈ A≤k , consider ϕm a (A) = {a0 | A |= ϕm a (a0)}. By definition, ϕm+1 a is of the form ϕm a ∧ . . ., and hence ϕ0 a(A) ⊇ ϕ1 a(A) ⊇ . . . ⊇ ϕm a (A) ⊇ ϕm+1 a (A) ⊇ . . . . Since A is finite, this sequence eventually stabilizes. Let ma be the number such that ϕma a (A) = ϕm a (A) for all m > ma. Then we define M = max a∈A≤k ma, and ΨA ≡ ϕM ǫ ∧ a∈A≤k ∀x1 . . . ∀xk ϕM a (x) → ϕM+1 a (x) . (11.7) Here ǫ stands for the empty sequence. By the definition of M, A |= ΨA. Furthermore, ΨA ∈ FOk . Thus, to conclude the proof, we show that ΨA defines tpFOk (A). In other words, we need the following. Lemma 11.18. If B is a finite structure, then B |= ΨA iff tpFOk (A) = tpFOk (B); that is, A ≡∞ω k B. Proof of Lemma 11.18. Since ΨA ∈ FOk and A |= ΨA, it suffices to show that A ≡∞ω k B whenever B |= ΨA. Let B |= ΨA. We define a set G of partial maps between A and B by {(a1, b1), . . . , (al, bl)} ∈ G ⇔ B |= ϕM+1 (a1,...,al)(b1, . . . , bl). 11.4 Ordering of Types 225 Since B |= ΨA, the sentence ϕM+1 ǫ is true in B, and thus G is nonempty, as the empty partial map is a member of G. Applying Lemma 11.17 to each g = {(a1, b1), . . . , (al, bl)} ∈ G, we see that there is a sequence Ig = {Ig 0 , . . . , Ig M+1} of partial isomorphisms with the k-back-and-forth property such that Ig 0 ⊇ . . . ⊇ Ig M+1 and g ∈ Ig M+1. We now define a family I = {Ii | i ∈ N} by Ii = g∈G Ig i for i ≤ M + 1 Ii = IM+1 for i > M + 1. It remains to show that I has the k-back-and-forth property. As we have seen in the proof of Lemma 11.17, the k-back-and-forth property is preserved through component-wise union, and since all Ii, i > M + 1, are identical, it suffices to prove that every partial isomorphism in IM+2 can be extended in IM+1. Fix f ∈ IM+2 such that |dom(f)| < k. We show the forth part; the back part is identical. Let a ∈ A. Since f ∈ IM+2, and the sequence {I0, . . . , IM+1} has the k-back-and-forth property, we can find f′ ∈ IM with f ⊆ f′ and a ∈ dom(f′ ). Let f′ = {(a1, b1), . . . , (al, bl)}. Since f′ is a partial isomorphism from IM , from Lemma 11.17 we conclude that B |= ϕM (a1,...,al)(b1, . . . , bl). Now from the implication in (11.7), we see that B |= ϕM+1 (a1,...,al)(b1, . . . , bl); therefore, f′ ∈ G. But then f′ ∈ If′ M+1 and hence f′ ∈ IM+1, which proves the forth part. Since the back part is symmetric, this concludes the proof of Lemma 11.18 and Theorem 11.12. 11.4 Ordering of Types In this section, we show that many interesting properties of types can be expressed in LFP. In particular, consider the following equivalence relation ≈FOk on tuples of elements of a structure A: a ≈FOk b ⇔ tpFOk (A, a) = tpFOk (A, b). Clearly this relation is definable by an Lk ∞ω formula ψ(x, y) ≡ τ ϕτ (x) ∧ ϕτ (y) , where τ ranges over all FOk -types. It is more interesting, however, that this relation is definable in a weaker logic LFP. Furthermore, it turns out that there is a formula of LFP that defines a certain preorder ≺FOk on tuples, such that the equivalence relation induced by this preorder is precisely ≈FOk . This means that on structures in which all elements have different FOk -types, we can define a linear order in 226 11 Finite Variable Logics LFP, and hence, by the Immerman-Vardi theorem, on such structures LFP captures Ptime. We start by showing how to define ≈FOk . Proposition 11.19. Fix a vocabulary σ. For every k and l ≤ k, there is an LFP formula η(x, y) in 2l free variables such that for every A ∈ STRUCT[σ], A |= η(a, b) ⇔ a ≈FOk b. Proof. The atomic FOk -type of (A, a), with |a|= l ≤ k, is the conjunction of all atomic and negated atomic formulae true of a in A. Since there are finitely many atomic FOk -formulae, up to logical equivalence, each atomic type is definable by an FOk formula. Let α1(x), . . . , αs(x) list all such formulae. Then we define ψ0(x, y) ≡ i,j≤s, i=j αi(x) ∧ αj(y) . This is a formula of quantifier rank 0, and A |= ψ0(a, b) iff the atomic FOk types of a and b are different. Next, we define a formula ψ in the vocabulary σ expanded with a 2l-ary relation R: ψ(R, x, y) ≡ ψ0(x, y) ∨ l i=1 ∃xi∀yiR(x, y) ∨ l i=1 ∃yi∀xiR(x, y), (11.8) and let ϕ(x, y) ≡ [lfpR,x,y ψ(R, x, y)](x, y). Consider the fixed point computation for ψ. Initially, we have tuples (a, b) with different atomic types; that is, tuples corresponding to the position in the pebble game in which the spoiler wins. At the next stage, we get all the positions of the pebble game (a, b) such that, in one move, the spoiler can force the winning position. In general, the ith stage consists of positions from which the spoiler can win the pebble game in i − 1 moves, and hence A |= ϕ(a, b) iff from the position (a, b), the spoiler can win the game. In other words, A |= ϕ(a, b) iff (A, a) ≡∞ω k (A, b), or, equivalently, tpFOk (A, a) = tpFOk (A, b). Hence, η can be defined as ¬ϕ, which is an LFP formula. We now extend this technique to define a preorder ≺FOk on tuples, whose associated equivalence relation is precisely ≈FOk . Suppose we have a set X partitioned into subsets X1, . . . , Xm. Consider a binary relation ≺ on X given by x ≺ y ⇔ x ∈ Xi, y ∈ Xj, and i < j. We call relations obtained in such a way strict preorders. With each strict preorder ≺ we associate an equivalence relation whose equivalence classes are precisely X1, . . . , Xm. It can be defined by the formula ¬(x ≺ y) ∧ ¬(y ≺ x). 11.4 Ordering of Types 227 Theorem 11.20. For every vocabulary σ, and every k, there exists an LFP formula χ(x, y), with | x |=| y |= k, such that on every A ∈ STRUCT[σ], the formula χ defines a strict preorder ≺FOk whose equivalence relation is ≈FOk . As we mentioned before, this result becomes useful when one deals with structures A such that for every a, b ∈ A, tpFOk (a) = tpFOk (b) whenever a = b. Such structures are called k-rigid. Theorem 11.20 tells us that in a k-rigid structure, there is an LFP-definable strict preorder whose equivalence classes are of size 1: that is, a linear order. Hence, from the Immerman-Vardi theorem we obtain: Corollary 11.21. Over k-rigid structures, LFP captures Ptime. Now we prove Theorem 11.20. We shall use the following notation. If a = (a1, . . . , ak) is a tuple, then ai←a is the tuple in which ai was replaced by a, i.e., (a1, . . . , ai−1, a, ai+1, . . . , ak). Recall the formula ψ(x, y) (11.8). The fixed point of this formula defined the complement of ≈FOk , and it follows from the proof of Proposition 11.19 that the jth stage of the fixed point computation for ψ, ψj (x, y), defines the set of positions from which the spoiler wins with j − 1 moves remaining. In other words, A |= ψj (a, a) iff (A, a) and (B, b) disagree on some FOk formula of quantifier rank up to j − 1. We now use this formula ψ to define a formula γ(S, x, y) such that the jth stage of the inflationary fixed point computation for γ defines a strict preorder whose equivalence relation is the complement of the relation defined by ψj (x, y). In other words, γj (A) defines a relation ≺j on Ak such that the equivalence relation ∼j associated with this preorder is a ∼j b ⇔ (A, a) ≡∞ω k,j−1 (A, b). We now explain the idea of the construction. In the beginning, we have to deal with atomic FOk -types. Since these can be explicitly defined (see the proof of Proposition 11.19), we can choose an arbitrary ordering on them. Now, suppose we have defined ≺j, the jth stage of the fixed point computation for γ, whose equivalence relation is the set of positions from which the duplicator can play for j − 1 moves (i.e., the complement of the jth stage of ψ). Let Y1, . . . , Ys be the equivalence classes. We have to refine ≺j to come up with a preorder ≺j+1. For that, we have to order tuples (a, b) which were equivalent at the jth stage, but become nonequivalent at stage j + 1. But these are precisely the tuples that get into the fixed point of ψ at stage j + 1. Looking at the definition of ψ (11.8), we see that there are two ways for ψj+1 (a, b) to be true (i.e., for (a, b) to get into the fixed point at stage j + 1): 1. There is a ∈ A such that ϕj (ai←a, bi←b) holds for every b ∈ A. In other words, the equivalence class of ai←a contains no tuple of the form bi←b which is different from b. 228 11 Finite Variable Logics 2. Symmetrically, there is b ∈ A such that the equivalence class of bi←b contains no tuple of the form ai←a = a. Assume that i′ is the minimum number ≤ k such that either 1 or 2 above, or both, happen. Let Y be the set of all the tuples ai′←a for case 1 and bi′←b for case 2. We then consider the smallest, with respect to ≺j, equivalence class Yp’s into which elements of Y may fall. Note that it is impossible that for some a, b, both ai′←a and bi′←b are in Yp. Hence, either 1′ . for some a, ai′←a is in Yp, or 2′ . for some b, bi′←b is in Yp. In case 1′ , we let a ≺j+1 b, and in case 2′ , we let b ≺j+1 a. This is the algorithm; it remains to express it in LFP. The formula χ(x, y) will be defined as [ifpS,x,yγ(S, x, y)](x, y). To express γ, we first deal with the atomic case. Since we have an explicit listing α1, . . . , αs of formulae defining atomic types, we can use γ0(x, y) ≡ i 0, and a purely relational vocabulary σ = {R1, . . . , Rl} such that the arity of each Ri is at most k (since we shall be dealing with FOk formulae, we can impose this additional restriction without loss of generality). We shall use the preorder relation ≺FOk defined in the previous section; its equivalence relation is a ≈FOk b given by tpFOk (A, a) = tpFOk (A, b), for a, b ∈ Ak . Whenever k and A are clear from the context, we shall write [a] for the ≈FOk -equivalence class of a. Definition 11.22. Given a vocabulary σ = {R1, . . . , Rl}, where the arities of all the Ri’s do not exceed k, and a σ-structure A, we define a new vocabulary ck(σ) and a structure Ck(A) ∈ STRUCT[ck(σ)] as follows. Let t = kk , and let π1, . . . , πt enumerate all the functions π : {1, . . ., k} → {1, . . . , k}. Then ck(σ) = {<, U, U1, . . . , Ul, S1, . . . , Sk, P1, . . . , Pt}, where <, the Si’s, and the Pj’s are binary, and U, U1, . . . , Ul are unary. The universe of Ck(A) is Ak / ≈FOk , the set of ≈FOk -equivalence classes of k-tuples from A. The interpretation of the predicates is as follows (where a stands for (a1, . . . , ak)): • < is interpreted as ≺FOk . • U([a]) holds iff a1 = a2. • Ui([a]) holds iff (a1, . . . , am) ∈ RA i , where m ≤ k is the arity of Ri. • Si([a], [b]) holds iff a and b differ at most in their ith component. • Pπ contains pairs ([a], [(aπ(1), . . . , aπ(k))]) for all a ∈ Ak . Lemma 11.23. The structure Ck(A) is well-defined, and < is interpreted as a linear ordering on its universe. 230 11 Finite Variable Logics Proof. Suppose U([a]) holds and b ∈ [a]. Then a1 = a2, and since tpFOk (a) = tpFOk (b), we have b1 = b2. Since other predicates of Ck(A) are defined in terms of atomic formulae over A, they are likewise independent of particular representatives of the equivalence classes. Finally, Theorem 11.20 implies that < is a linear ordering on Ak / ≈FOk . The structure Ck(A) can be viewed as a canonical structure in terms of Lk ∞ω-definability. Proposition 11.24. For every A, B ∈ STRUCT[σ], A ≡∞ω k B ⇔ Ck(A) ∼= Ck(B). Proof sketch. Suppose A ≡∞ω k B. Since every FOk -type is definable by an FOk formula, every type that is realized in A is realized in B. Hence, | A |=| B |. Furthermore, since ≺FOk is definable by the same formula on all σ-structures, we have an order-preserving map h : Ak / ≈FOk → Bk / ≈FOk . It is easy to verify that such a map is an isomorphism between Ck(A) and Ck(B). For the converse, one can use the isomorphism h : Ck(A) → Ck(B) together with relations Si to establish a winning strategy for the duplicator in the kpebble game. Details are left as an easy exercise for the reader. We next show how to translate formulae of LFP and PFP over Ck(A) to formulae over A, and vice versa. We assume, as throughout most of Chap. 10, that fixed point formulae do not have parameters. Lemma 11.25. 1. For every LFP or PFP formula ϕ(x) over vocabulary σ that uses at most k variables, there is an LFP (respectively, PFP) formula ϕ◦ over vocabulary ck(σ) in one free variable such that A |= ϕ(a) ⇔ Ck(A) |= ϕ◦ ([a]). (11.9) 2. For every LFP or PFP formula ϕ(x1, . . . , xm) in the language of ck(σ), there is an LFP (respectively, PFP) formula ϕ∗ (y) over vocabulary σ in km free variables such that Ck(A) |= ϕ([a1], . . . , [am]) ⇔ A |= ϕ∗ (a1, . . . , am). Before proving Lemma 11.25, we present its main application. Theorem 11.26 (Abiteboul-Vianu). Ptime = Pspace iff LFP = PFP. Proof. Suppose Ptime = Pspace. Let ϕ be a PFP formula, and let it use k variables. By Lemma 11.25 (1), we have a PFP formula ϕ◦ over ck(σ). Since ϕ◦ is in PFP, it is computable in Pspace, and thus, by the assumption, in Ptime. Since ϕ◦ is defined over ordered structures of the vocabulary ck(σ), by the Immerman-Vardi theorem it is definable in LFP over ck(σ), by a formula ψ(x). Now applying Lemma 11.25 (2), we get an LFP formula ψ∗ (x) over vocabulary σ which is equivalent to ϕ. Hence, LFP = PFP. For the other direction, if LFP = PFP, then LFP+ < = PFP+ <, and hence Ptime = Pspace. 11.5 Canonical Structures and the Abiteboul-Vianu Theorem 231 Corollary 11.27. The following are equivalent: • LFP = PFP; • LFP+< = PFP+<; • Ptime = Pspace. Notice that this picture differs drastically from what we have seen for logics capturing DLog, NLog, and Ptime: while the exact relationships between DetTrCl+ < = DLog, TrCl+ < = NLog, and LFP+ < = Ptime are not known, we do know that DetTrCl TrCl LFP. However, for the case of LFP and PFP, we cannot even conclude LFP PFP without resolving the Ptime vs. Pspace question. We now prove Lemma 11.25. As the first step, we prove part 1 for the case of ϕ being an FOk formula. Note that in general, x may have fewer than k variables. However, in this proof we shall treat any such formula as defining a k-ary relation; that is, ϕ(xj1 , . . . , xjs ) defines the relation ϕ(A) = {(a1, . . . , ak) | A |= ϕ(aj1 , . . . , ajs )}, and when we write A |= ϕ(a), we actually mean that a ∈ Ak and a ∈ ϕ(A). Using this convention, we define ϕ◦ by induction on the structure of the formula: • If ϕ is xi = xj, then choose π so that π(1) = i, π(2) = j, and let ϕ◦ (x) ≡ ∃y Pπ(x, y) ∧ U(y) . • If ϕ is an atomic formula of the form Ri(xj1 , . . . , xjs ), choose π so that π(1) = j1, . . . , π(s) = js, and let ϕ◦ (x) ≡ ∃y Pπ(x, y) ∧ Ui(y) . • (¬ϕ)◦ ≡ ¬ϕ◦ . • (ϕ1 ∨ ϕ2)◦ ≡ ϕ◦ 1 ∨ ϕ◦ 2. • If ϕ is ∃xiψ(x), then ϕ◦ (x) ≡ ∃y Si(x, y) ∧ ψ◦ (y) . It is routine to verify, by induction on formulae, that the above translation guarantees (11.9). For example, if ϕ is xi = xj, then A |= ϕ(a) implies that ai = aj, and hence Ck(A) |= Pπ([a], [b]) for π(i) = 1, π(j) = 2, and b = (ai, aj, . . .). Since Ck(A) |= U([b]), we conclude that Ck(A) |= ϕ◦ ([a]). Conversely, if Ck(A) |= Pπ([a], [b]) ∧ U([b]) for π as above and some b, we conclude that there is c ∈ [a] with ci = cj. Since tpFOk (a) = tpFOk (c), it follows that ai = aj and A |= ϕ(a). The other basis case is similar. For the induction step, the only nontrivial case is that of ϕ being ∃xiψ(x). If A |= ϕ(a), then for some ai that differs from a in at most the ith position we have A |= ψ(ai), and hence by the induction hypothesis, Ck(A) |= Si([a], [ai])∧ ψ◦ ([ai]) and, therefore, Ck(A) |= ϕ◦ ([a]). Conversely, assume that for some b, 232 11 Finite Variable Logics Ck(A) |= Si([a], [b])∧ψ◦ ([b]). Then we can find a0 ≈FOk a and b0 ≈FOk b such that a0 and b0 differ in at most the ith position. Consider the k-pebble game on (A, a0) and (A, a). Suppose that in one move the spoiler goes from (A, a0) to (A, b0). Since the duplicator can play from position (a0, a), he can respond to this move and find b′ such that (A, b0) ≡∞ω k (A, b′ ). Hence, b′ ∈ [b], and it differs from a in at most the ith position. Since [b′ ] = [b], by the induction hypothesis we conclude that A |= ψ(b′ ), which witnesses A |= ϕ(a). This concludes the proof of (11.9) for FOk formulae. Furthermore, (11.9) is preserved if we expand the vocabulary by an extra relation symbol R, with a corresponding R′ added to ck(σ), and interpret R as a relation closed under ≡∞ω k . Since we know that all the stages of lfp and pfp operators define such relations (see Exercise 11.6), we conclude that (11.9) holds for LFP and PFP formulae. The proof of part 2 of Lemma 11.25 is by straightforward induction on the formulae, using the fact that ≺FOk is definable in LFP (Theorem 11.20). Details are left to the reader as an exercise. 11.6 Bibliographic Notes Infinitary logics have been studied extensively in model theory, see, e.g., Barwise and Feferman [18]. The finite variable logic was introduced by Barwise [17], who also defined the notion of a family of partial isomorphisms with the k-back-and-forth property. Pebble games were introduced by Immerman [128] and Poizat [200]. Kolaitis and Vardi [152, 153] studied many aspects of finite variable logics; in particular, they showed that it subsumes fixed point logics, and proved normal forms for Lk ∞ω. A systematic study of finite variable logics was undertaken by Dawar, Lindell, and Weinstein [53], and our presentation here is based on that paper. In particular, definability of FOk -types in FOk is from [53], as well as the definition of a linear ordering on FOk -types. Theorem 11.26 is due to Abiteboul and Vianu [6], but the presentation here is based on the model-theoretic approach of [53] rather than the more computational approach of [6]. The approach of [6] is based on relational complexity. Relational complexity classes are defined using machines that compute directly on structures rather than on their encodings as strings. Abiteboul and Vianu [6] and Abiteboul, Vardi, and Vianu [4] establish a tight connection between fixed point logics and relational complexity classes, and show that questions about containments among standard complexity classes can be translated to questions about containments among relational complexity classes. Otto’s book [191] is a good source for information on finite variable logics over finite models. 11.7 Exercises 233 Sources for exercises: Exercises 11.6 and 11.7: Dawar, Lindell, and Weinstein [53] Exercises 11.8 and 11.9: Dawar [49] Exercise 11.10: de Rougemont [56] Exercise 11.11: Dawar, Lindell, and Weinstein [53] Exercise 11.12: Lindell [171] Exercise 11.13: Grohe [108] Exercises 11.14 and 11.15: Dawar, Lindell, and Weinstein [53] Exercises 11.16 and 11.17: Kolaitis and Vardi [154] Exercise 11.18: Grohe [110] Exercise 11.19: McColm [181] Kolaitis and Vardi [153] 11.7 Exercises Exercise 11.1. Extend the proof of Theorem 11.12 to handle free variables, and constants in the vocabulary. Exercise 11.2. Fill in the details at the end of the proof of Theorem 11.20. Exercise 11.3. Complete the proof of Proposition 11.24. Exercise 11.4. Complete the proof of Lemma 11.25, part 2. Exercise 11.5. Prove that the FOk hierarchy is strict: there are properties expressible in FOk+1 which are not expressible in FOk . Exercise 11.6. The goal of this exercise is to find a tight (as far as the number of variables is concerned) embedding of fixed point logics into Lω ∞ω. Let LFPk , IFPk , and PFPk stand for restrictions of LFP, IFP, and PFP to formulae that use at most k distinct variables (we assume that fixed point formulae have no parameters). Prove that LFPk , IFPk , PFPk ⊆ Lk ∞ω. Hint: Let ϕ(R, x) be a formula, and let ϕi (x) define the ith stage of a fixed point computation. Show by induction on i that the query defined by ϕi is closed under ≡∞ω k , and use Corollary 11.14. Exercise 11.7. Prove that if A and B agree on all FOk sentences of quantifier rank up to nk + k + 1 and |A|≤ n, then A ≡∞ω k B. Exercise 11.8. Consider the complete bipartite graph Kn,m. Show that Kk,k ≡∞ω k Kk,k+1 for every k. Also show that Kn,m is Hamiltonian iff n = m. Conclude that Hamiltonicity is not Lω ∞ω-definable. Exercise 11.9. Prove that 3-colorability is not Lω ∞ω-definable. Exercise 11.10. Let In be a graph with n isolated vertices and Cm an undirected cycle of length m. For two graphs G1 = V1, E1 and G2 = V2, E2 with V1 and V2 disjoint, let G1 × G2 be the graph whose nodes are V1 ∪ V2, and the edges include E1, E2, as well as all the edges (v1, v2) for v1 ∈ V1, v2 ∈ V2. Prove that for a graph of the form In × Cm, it is impossible to test, in Lω ∞ω, if n = m. Use this result to give another proof (cf. Exercise 11.8) that Hamiltonicity is not Lω ∞ω-definable. 234 11 Finite Variable Logics Exercise 11.11. A binary tree is balanced if all the leaves are at the same distance from the root. Prove that L4 ∞ω defines a Boolean query Q on graphs such that if Q(G) is true, then G is a balanced binary tree. Exercise 11.12. Prove that there is a Ptime query on balanced binary trees which is not LFP-definable. Conclude that LFP Lω ∞ω ∩ Ptime. Exercise 11.13. Prove that the following problems are Ptime-complete for each fixed k. • Given two σ-structures A and B, is it the case that A ≡∞ω k B? • Given a σ-structure A and a, b ∈ Ak , are tpFOk (A, a) and tpFOk (A, b) the same? Exercise 11.14. Prove that if A is a finite rigid structure (i.e., a structure that has no nontrivial automorphisms), then there is a number k such that A is k-rigid. Exercise 11.15. Prove that the structure Ck(A) can be constructed in polynomial time. Exercise 11.16. Define ∃Lk ∞ω as the fragment of Lk ∞ω that contains all atomic formulae and is closed under infinitary conjunctions and disjunctions, and existential quantification. Let ∃Lω ∞ω = [ k ∃Lk ∞ω. Prove that Datalog ⊆ ∃Lω ∞ω. Exercise 11.17. Consider the following modification of the k-pebble game. For two structures A and B, the spoiler always plays in A and the duplicator always responds in B. The spoiler wins if at some point, the position (a, b) does not define a partial homomorphism (as opposed to a partial isomorphism in the standard game). The duplicator wins (which is denoted by A ⊳∞ω k B) if the spoiler does not win; that is, if after each round the position defines a partial homomorphism. Prove that the following are equivalent: • A ⊳∞ω k B. • If Φ ∈ ∃Lk ∞ω and A |= Φ, then B |= Φ. Exercise 11.18. By an FOk theory we mean a maximally consistent set of FOk sentences. Define the k-size of an FOk theory T as the number of different FOk types realized by finite models of T. Prove that there is no recursive bound on the size of the smallest model of an FOk theory in terms of its k-size. That is, for every k there is a vocabulary σk such that is no recursive function f with the property that every FOk theory T in vocabulary σk has a model of size at most f(n), where n is the k-size of T. Exercise 11.19. Let C be a class of σ-structures. We call it bounded if for every relation symbol R ∈ σ, there exists a number n such that every FO formula ϕ(R, x) positive in R reaches its least fixed point on any structure in C in at most n iterations. Prove that the following are equivalent: • C is bounded; • Lω ∞ω collapses to FO on C. Exercise 11.20.∗ Is the FOk hierarchy strict over ordered structures? That is, are there properties which, over ordered structures, are definable in FOk+1 but not in FOk , for arbitrary k? 12 Zero-One Laws In this chapter we show that properties expressible in many logics are almost surely true or almost surely false; that is, either they hold for almost all structures, or they fail for almost all structures. This phenomenon is known as the zero-one law. We prove it for FO, fixed point logics, and Lω ∞ω. We shall also see that the “almost everywhere” behavior of logics is drastically different from their “everywhere” behavior. For example, while satisfiability in the finite is undecidable, it is decidable if a sentence is true in almost all finite models. 12.1 Asymptotic Probabilities and Zero-One Laws To talk about asymptotic probabilities of properties of finite models, we adopt the convention that the universe of a structure A with |A| = n will be {0, . . ., n − 1}. Let us start by considering the case of undirected graphs. By Grn we denote the set of all graphs with the universe {0, . . ., n − 1}. The number of undirected graphs on {0, . . . , n − 1} is |Grn | = 2(n 2). Let P be a property of graphs. We define µn(P) = |{G ∈ Grn | G has P}| |Grn | . That is, µn(P) is the probability that a randomly chosen graph on the set of nodes {0, . . ., n−1} has P. Randomly here means with respect to the uniform distribution: each graph is equally likely to be chosen. We then define the asymptotic probability of P as µ(P) = lim n→∞ µn(P), (12.1) 236 12 Zero-One Laws if the limit exists. If P is expressed by a sentence Φ of some logic, then we refer to µn(Φ) and µ(Φ). In general, we can deal with arbitrary σ-structures. In that case, we can define sn σ as the number of different σ-structures with the universe {0, . . . , n − 1}, and sn σ(P) as the number of different σ-structures with the universe {0, . . ., n − 1} that have the property P, and let µn(P) = sn σ(P) sn σ . Then the asymptotic probability µ(P) is defined again by (12.1). We now consider a few examples: • Let P be the property “there are no isolated nodes”. We claim that µ(P) = 1. For that, we show that µ( ¯P) = 0, where ¯P is: “there is an isolated node”. To calculate µn( ¯P), note that there are n ways to choose an isolated node, and 2(n−1 2 ) ways to put edges on the remaining nodes. Hence µn( ¯P) ≤ n · 2(n−1 2 ) 2(n 2) = n 2n−1 , and thus µ( ¯P) = 0. • Let P be the property of being connected. Again, we show that µ( ¯P) = 0, and thus the asymptotic probability of graph connectivity is 1. To calculate µ( ¯P), we have to count the number of graphs with at least two connected components. Assuming the size of one component is k, – there are n k ways to choose a subset X ⊆ {0, . . ., n − 1}; – there are 2(k 2) ways to put edges on X; and – there are 2(n−k 2 ) ways to put edges on the complement of X. Hence, µn( ¯P) ≤ n−1 k=1 n k · 2(k 2) · 2(n−k 2 ) 2(n 2) = n−1 k=1 n k 2k2+kn = n 2n+1 + n−1 k=2 n k 2k2+kn ≤ n 2n+1 + 1 22n · n−1 k=2 n k ≤ n 2n+1 + 1 2n → 0. • Consider the query even. Then µn(even) = 1 if n is even, 0 if n is odd. Hence, µ(even) does not exist. 12.1 Asymptotic Probabilities and Zero-One Laws 237 • The last example is the parity query. If σ has a unary relation U, then A satisfies parityU iff |UA | mod 2 = 0. Therefore, µn(parityU ) = k≤n, k even n k , and hence µ(parityU ) = 1 2 . Thus, for some properties P, the asymptotic probability µ(P) is 0 or 1, for some, like parity, µ(P) could be a number between 0 and 1, and for some, like even, it may not even exist. Definition 12.1 (Zero-one law). Let L be a logic. We say that it has the zero-one law if for every property P (i.e., a Boolean query) definable in L, either µ(P) = 0, or µ(P) = 1. The first property P for which we proved µ(P) = 1 was the absence of isolated nodes: this property is FO-definable. Graph connectivity, which also has asymptotic probability 1, is not FO-definable, but it is definable in LFP and hence in Lω ∞ω. On the other hand, the even and parityU queries, which violate the zero-one law, are not Lω ∞ω-definable, as we saw in Chap. 11. It turns out that µ(P) is 0 or 1 for every property definable in Lω ∞ω. Theorem 12.2. Lω ∞ω has the zero-one law. Corollary 12.3. FO, LFP, IFP, and PFP all have the zero-one law. Zero-one laws can be seen as statements that a logic cannot do nontrivial counting. For example, if a logic L has the zero-one law, then even is not expressible in it, as well as any divisibility properties (e.g., is the size of a certain set congruent to q modulo p?), cardinality comparisons (e.g., is | X | bigger than |Y |?), etc. Note also that while LFP, IFP, PFP, and Lω ∞ω all have the zero-one law, their extensions with ordering no longer have it, since LFP+ < defines even, a Ptime query. In the presence of a linear order (in fact, even successor), FO fails to have the zero-one law too. To see this, let S be the successor relation, and consider the sentence ∀x∀y ∀z ¬S(z, x) ∧ ¬S(y, z) → E(x, y) , saying that if x is the initial and y the final element of the successor relation, then there is an edge between them. Since this sentence states the existence of one specific edge, its asymptotic probability is 1 2 . We shall prove Theorem 12.2 in the next section after we introduce the main tool for the proof: extension axioms. 238 12 Zero-One Laws z T S − T Fig. 12.1. Extension axiom 12.2 Extension Axioms Extension axioms are statements defined as follows. Let S be a finite set of cardinality n, and let T ⊆ S be of cardinality m. Then the extension axiom EAn,m says that there exists z ∈ S such that for all x ∈ T , there is an edge between z and x, and for all x ∈ S − T , there is no edge between z and x. This is illustrated in Fig. 12.1. Extension axioms can be expressed in FO in the language of graphs. In fact, EAn,m is given by the following sentence: ∀x1, . . . , xn i=j xi = xj → ∃z       n i=1 z = xi ∧ i≤m E(z, xi) ∧ i>m ¬E(z, xj)       . (12.2) The extension axiom EAn,m is vacuously true in a structure with fewer than n elements, but we shall normally consider it in structures with at least n elements. We shall be using special cases of extension axioms, when |S| = 2k and |T | is k. Such an extension axiom will be denoted by EAk. That is, EAk says if X ∩ Y = ∅, and | X |=| Y |= k, then there is z such that there is an edge (x, z) for all x ∈ X but there is no edge (y, z) for any y ∈ Y . Proposition 12.4. µ(EAk) = 1 for each k. Proof. We show instead that µ(¬EAk) = 0. Let n > 2k. Note that for EAk to fail, there must be disjoint X and Y of cardinality k such that there is no z ∈ X ∪ Y with E(x, z) for all x ∈ X and ¬E(y, z) for all y ∈ Y . We now calculate µn(¬EAk), for n > 2k. • There are n k ways to choose X. 12.2 Extension Axioms 239 • There are n−k k ways to choose Y . Therefore, there are at most n k · n−k k ≤ n2k ways to choose X and Y . • Since there are no restrictions on edges on X ∪ Y , there are 2(2k 2 ) ways to put edges on X ∪ Y . • Again, since there are no restrictions on edges outside of X ∪ Y , there are 2(n−2k 2 ) ways to put edges outside of X ∪ Y . • The only restriction we have is on putting edges between X ∪ Y and its complement X ∪ Y : for each of the n − 2k elements z ∈ X ∪ Y , we can put edges between z and the 2k elements of X ∪ Y in every possible way except one, where z is connected to every member of X and not connected to any member of Y . Hence, for each z there are 22k − 1 ways of putting edges between z and X ∪ Y , and therefore the number of ways to put edges between X ∪ Y and X ∪ Y is (22k − 1)n−2k . Thus, µn(¬EAk) ≤ n2k · 2(2k 2 ) · 2(n−2k 2 ) · (22k − 1)n−2k 2(n 2) . (12.3) A simple calculation shows that 2(2k 2 ) · 2(n−2k 2 ) 2(n 2) ≤ 1 22k(n−2k) . (12.4) Combining (12.3) and (12.4) we obtain µn(¬EAk) ≤ n2k · 1 − 1 22k n−2k → 0, proving that µ(¬EAk) = 0 and µ(EAk) = 1. Corollary 12.5. µ(EAn,m) = 1, for any n and m ≤ n. Proof. For graphs of size > 2n, EAn implies EAn,m for any m ≤ n. Corollary 12.6. Each EAk has arbitrarily large finite models. Notice that it is not immediately obvious from the statement of EAk that there are finite graphs with at least 2k elements satisfying it. However, Proposition 12.4 tells us that we can find such graphs; in fact, almost all graphs satisfy EAk. We now move to the proof of the zero-one law for Lω ∞ω. First, we need a lemma. 240 12 Zero-One Laws Lemma 12.7. Let G1, G2 be finite graphs such that G1, G2 |= EAn,m for all m ≤ n ≤ k. Then G1 ≡∞ω k G2. Proof. The extension axioms provide the strategy. Suppose we have a position in the game where (a1, . . . , ak) have been played in G1 and (b1, . . . , bk) in G2. Let the spoiler move the ith pebble from ai to some element a. Let I ⊆ {1, . . . , k} − {i} be all the indices such that there is an edge from a to aj, for all j ∈ I. Then by the extension axioms we can find b ∈ G2 such that there is an edge from b to every bj, for j ∈ I, and there are no edges from b to any bl, for l ∈ I. Hence, the duplicator can play b as the response to a. This shows that the pebble game can continue indefinitely, and thus G1 ≡∞ω k G2. And finally, we prove the zero-one law. Let Φ be from Lk ∞ω. Suppose there is a model G of EAk, of size at least 2k, that is also a model of Φ. Suppose G′ is a graph that satisfies EAk and has at least 2k elements. Then, by Lemma 12.7, we have G′ ≡∞ω k G and hence G′ |= Φ. Therefore, µ(ϕ) ≥ µ(EAk) = 1. Conversely, assume that no model of EAk of size ≥ 2k is a model of Φ. Then µ(Φ) ≤ µ(¬EAk) = 0. We now revisit the example of graph connectivity, for which the asymptotic probability was shown to be 1. If we look at EA2, then for graphs with at least four nodes it implies that, for any x = y, there exists z such that E(x, z) and E(y, z) hold. Hence, every graph with at least four nodes satisfying EA2 is connected, and thus µ(connectivity) = 1. As another example of using extension axioms for computing asymptotic probabilities, consider EA2 and an edge (x, y). As before, we can find a node z such that E(x, z) and E(y, z) hold, and hence a graph satisfying EA2 has a cycle (x, y, z). This means that µ(acyclicity) = 0. Finally, we explain how to state the extension axioms for an arbitrary vocabulary σ that contains only relation symbols. Given variables x1, . . . , xn, let Aσ(x1, . . . , xn) be the collection of all atomic σ-formulae of the form R(xi1 , . . . , xim ), where R ranges over relations from σ, and m is the arity of R. Let F ⊆ Aσ(x1, . . . , xn). With F, we associate a formula χF (x1, . . . , xn) (called a complete description) given by ϕ∈F ϕ ∧ ψ∈Aσ(x1,...,xn)−F ¬ψ. That is, a complete description states precisely which atomic formulae in x1, . . . , xn are true, and which are not. Let F now be a subset of Aσ(x1, . . . , xn), and G a subset of Aσ(x1, . . . , xn, xn+1) such that G extends F; that is, F ⊆ G. Then the extension axiom EAF,G is the sentence 12.3 The Random Graph 241 ∀x1 . . . xn     i=j xi = xj ∧ χF (x1, . . . , xn) → ∃xn+1 i≤n xn+1 = xi ∧ χG(x1, . . . , xn)     saying that every complete description in n variables can be extended to every consistent complete description in n + 1 variables. A similar argument shows that µ(EAF,G) = 1. Therefore, the zero-one law holds for arbitrary finite structures, not only graphs. 12.3 The Random Graph In this section we deal with a certain infinite structure. This structure, called the random graph, has an interesting FO theory: it consists of precisely all the sentences Φ for which µ(Φ) = 1. By analyzing the random graph, we prove that it is decidable, for an FO sentence Φ, whether µ(Φ) = 1. First, recall the BIT predicate: BIT(i, j) is true iff the jth bit of the binary expansion of i is 1. Definition 12.8. The random graph is defined as the infinite (undirected) graph RG = N, E where there is an edge between i and j, for j < i, iff BIT(i, j) is true. Why is this graph called random? After all, the construction is completely deterministic. It turns out there is a probabilistic construction that results in this graph. Suppose someone wants to randomly build a countable graph whose nodes are natural numbers. When reaching a new node n, this person would look at all nodes k < n, and for each of them will toss a coin to decide if there is an edge between k and n. What kind of graph does one get as the result? It turns out that with probability 1, the constructed graph is isomorphic to RG. However, for our purposes, we do not need the probabilistic construction. What is important to us is that the random graph satisfies all the extension axioms. Indeed, to see that RG |= EAn,m, let S ⊂ N be of size n and X ⊆ S be of size m. Let l be a number which, when given in binary, has ones in positions from X, and zeros in positions from S − X. Furthermore, assume that l has a one in some position whose number is higher than the maximal number in S. Then l witnesses EAn,m for S and T . To give a concrete example, if S = {0, 1, 2, 3, 4} and X = {0, 2, 3}, then the number l is 45, or 101101 in binary. Next, we define a theory EA = {EAk | k ∈ N}. (12.5) Recall that a theory T (a set of sentences over vocabulary σ) is complete if for each sentence Φ, either T |= Φ or T |= ¬Φ; it is ω-categorical if, up to 242 12 Zero-One Laws isomorphism, it has only one countable model, and decidable, if it is decidable whether T |= Φ. Theorem 12.9. EA is complete, ω-categorical, and decidable. Proof. For ω-categoricity, we claim that up to isomorphism, RG is the only countable model of EA. Suppose that G is another model of EA (and thus it satisfies all the extension axioms EAn,m). We claim that RG ≡ω G; that is, the duplicator can play countably many moves of the EhrenfeuchtFra¨ıss´e game on RG and G. Indeed, suppose after round r we have a position ((a1, . . . , ar), (b1, . . . , br)) defining a partial isomorphism, and suppose the spoiler plays ar+1 in RG. Let I = {i ≤ r | RG |= E(ar+1, ai)}. Since G |= EA, by the appropriate extension axiom we can find br+1 such that G |= E(br+1, bi) iff i ∈ I. Thus, the resulting position ((a1, . . . , ar, ar+1), (b1, . . . , br, br+1)) still defines a partial isomorphism. If we have two countable structures such that A ≡ω B, then A ∼= B. Indeed, if A = {ai | i ∈ N} and B = {bi | i ∈ N}, let the spoiler play, in each even round, the smallest unused element of A, and in each odd round the smallest unused element of B. Then the union of the sequence of partial isomorphisms generated by this play is an isomorphism between A and B. Thus, we have shown that G |= EA implies G ∼= RG and hence EA is ω-categorical. The next step is to show completeness of EA. Suppose that we have a sentence Φ such that neither EA |= Φ nor EA |= ¬Φ. Thus, both theories EA∪{Φ} and EA∪{¬Φ} are consistent. By the L¨owenheim-Skolem theorem, we get two countable models G′ , G′′ of EA such that G′ |= Φ and G′′ |= ¬Φ. However, by ω-categoricity, this means that G′ ∼= G′′ ∼= RG. This contradiction proves that EA is complete. Finally, a classical result in model theory says that a recursively axiomatizable complete theory is decidable. Since (12.5) provides a recursive axiomatization, we conclude that EA is decidable. Corollary 12.10. If Φ is an FO sentence, then RG |= Φ iff µ(Φ) = 1. Proof. Let RG |= Φ. Since EA is complete, EA |= Φ, and hence, by compactness, for some k > 0, {EAi | i ≤ k} |= Φ. Thus, EAk |= Φ and hence µ(Φ) ≥ µ(EAk) = 1. Conversely, if RG |= ¬Φ, then µ(¬Φ) = 1 and µ(Φ) = 0. Hence, for any Φ with µ(Φ) = 1, we have RG |= ϕ. Combining Corollary 12.10 and decidability of EA, we obtain the follow- ing. Corollary 12.11. For an FO sentence Φ it is decidable whether µ(Φ) = 1. Thus, Trakhtenbrot’s theorem tells us that it is undecidable whether a sentence is true in all finite models, but now we see that it is decidable whether a sentence is true in almost all finite models. 12.4 Zero-One Law and Second-Order Logic 243 12.4 Zero-One Law and Second-Order Logic We have proved the zero-one law for the finite variable logic Lω ∞ω and its fragments such as FO and fixed point logics. It is natural to ask what other logics have it. Since the zero-one law can be seen as a statement saying that a logic cannot count, counting logics cannot have it. Another possibility is second-order logic and its fragments. Even such a simple fragment as ∃SO, the existential second-order logic, does not have the zero-one law: since ∃SO equals NP, the query even is in ∃SO. But we shall see that some nontrivial restrictions of ∃SO have the zero-one law. One way to obtain such restrictions is to look at quantifier prefixes of the first-order part. Recall that an ∃SO sentence can be written as ∃X1 . . . ∃XnQ1x1 . . . Qmxm ϕ(X1, . . . , Xn, x1, . . . , xm), (12.6) where each Qi is ∀ or ∃, and ϕ is quantifier-free. If r is a regular expression over the alphabet {∃, ∀}, by ∃SO(r) we denote the set of all sentences (12.6) such that the string Q1 . . . Qm is in the language denoted by r. For example, ∃SO(∃∗ ∀∗ ) is a fragment of ∃SO that consists of sentences (12.6) for which the first-order part has all existential quantifiers in front of the universal quantifiers. Theorem 12.12. ∃SO(∃∗ ∀∗ ) has the zero-one law. Proof. To keep the notation simple, we shall prove this for undirected graphs, but the result is true for arbitrary vocabularies that contain only relation symbols. The result will follow from two lemmas. Lemma 12.13. Let S1, . . . , Sm be relation symbols, and ϕ an FO sentence of vocabulary {S1, . . . , Sm, E} such that RG |= ∀S1 . . . ∀Sm ϕ(S1, . . . , Sm). Then there is an FO sentence Φ of vocabulary {E} such that µ(Φ) = 1 and Φ → ∀S ϕ is a valid sentence. Lemma 12.14. Let S1, . . . , Sm be relation symbols, and ϕ(x, y) a quantifierfree FO formula of vocabulary {S1, . . . , Sm, E} such that RG |= ∃S1 . . . ∃Sm ∃x ∀y ϕ(S, x, y). Then there is an FO sentence Ψ of vocabulary {E} such that µ(Φ) = 1 and Φ → ∃S ∃x ∀y ϕ is a finitely valid sentence. First, these lemmas imply the theorem. Indeed, assume that we are given an ∃SO(∃∗ ∀∗ ) sentence Θ ≡ ∃S ∃x ∀y ϕ. Let RG |= Θ. Then, by Lemma 12.14, there is a sentence Φ with µ(Φ) = 1 such that Θ is true in every finite 244 12 Zero-One Laws model of Φ, and hence µ(Θ) = 1. Conversely, assume RG |= ¬Θ. Since ¬Θ is an ∀SO sentence, by Lemma 12.13 we find a sentence Φ with µ(Φ) = 1 such that ¬Θ is true in every model of Φ, and thus µ(¬Θ) = 1 and µ(Θ) = 0. Hence, µ(Θ) is either 0 or 1. It remains to prove the lemmas. Proof of Lemma 12.13. Assume that RG |= ∀Sϕ(S), but for every FO sentence Φ with µ(Φ) = 1, it is the case that (Φ → ∀S ϕ) is not a valid sentence (i.e., Φ ∧ ∃S¬ϕ(S) has a model). Consider the theory T = EA∪{¬ϕ} of vocabulary {S1, . . . , Sm, E}. Since every finite conjunction of extension axioms has asymptotic probability 1, by compactness we conclude that T is consistent, and by the L¨owenheim-Skolem theorem, it has a countable model A. Since EA is ω-categorical, the {E}reduct of A is isomorphic to RG. But then RG |= ∃S¬ϕ(S), a contradiction. This proves Lemma 12.13. Proof of Lemma 12.14. Let |S | = m and |x| = n. Let A1, . . . , Am witness the second-order quantifiers, and let a1, . . . , an be the elements of RG witnessing FO existential quantifiers. Let RG0 be the finite subgraph of RG with the universe {a1, . . . , an}. We can find finitely many extension axioms {EAk,l} such that their conjunction implies the existence of a subgraph isomorphic to RG0. Let Φ be the conjunction of all such extension axioms. Let A be a finite model of Φ. By the extension axioms, there is a subgraph RGA of RG that is isomorphic to A and contains RG0. Now we claim that RGA |= ∃S∃x∀y ϕ. To witness the second-order quantifiers, we take the restrictions of the Ai’s to RGA; as witnesses of FO existential quantifiers we take a1, . . . , an. Since universal sentences are preserved under substructures, we conclude that RGA |= ∀y ϕ(A, a, y), and thus RGA |= ∃S∃x∀y ϕ. Therefore, A |= ∃S∃x∀y ϕ, which proves the lemma. There are more results concerning zero-one laws for fragments of SO, but they are significantly more complicated, and we present them without proofs. One other prefix class which admits the zero-one law is ∃∗ ∀∃∗ ; that is, exactly one universal quantifier is present. Theorem 12.15. ∃SO(∃∗ ∀∃∗ ) has the zero-one law. Going to two universal quantifiers, however, creates problems. Theorem 12.16. ∃SO(∀∀∃) does not have the zero-one law, even if the FO part does not use equality. For some prefix classes, the failure of the zero-one law is fairly easy to show. Consider, for example, the sentence ∃S ∀x∃y∀z   S(x, y) ∧ ¬S(x, x) ∧ S(x, z) → y = z ∧ S(x, z) ↔ S(z, x)   . 12.5 Almost Everywhere Equivalence of Logics 245 This in an ∃SO(∀∃∀) sentence saying there is a permutation S in which every element has order 2; that is, this sentence expresses even and thus ∃SO(∀∃∀) fails the zero-one law. A similar sentence can be written in ∃SO(∀∀∀∃). The result can further be strengthened to show that both ∃SO(∀∃∀) and ∃SO(∀∀∀∃) fail to have the zero-one law even if the FO order part does not mention equality. 12.5 Almost Everywhere Equivalence of Logics In this short section, we shall prove a somewhat surprising result that on almost all structures, there is no difference between FO, LFP, PFP, and Lω ∞ω. Definition 12.17. Given a logic L, its fragment L′ , and a vocabulary σ, we say that L and L′ are almost everywhere equivalent over σ, if there is a class C of finite σ-structures such that µ(C) = 1 and for every L formula ϕ, there is an L′ formula ψ such that ϕ and ψ coincide on structures from C. Theorem 12.18. Lω ∞ω and FO are almost everywhere equivalent over σ, for any purely relational vocabulary σ. Proof sketch. For simplicity, we deal with undirected graphs. Let Ck be the class of finite graphs satisfying EAk. We claim that on Ck, every Lk ∞ω formula is equivalent to an FOk formula. Indeed, for a tuple a = (a1, . . . , ak) in a structure A ∈ Ck, its FOk type tpFOk (A, a) is completely determined by the atomic type of a; that is, by the atomic formulae E(ai, aj) that hold for a. To see this, notice that if a and b have the same atomic type, then (a, b) is a partial isomorphism, and by EAk from the position (a, b) the duplicator can play indefinitely in the k-pebble game; hence, (A, a) ≡∞ω k (A, b). Therefore, there are only finitely many FOk types, and each Lk ∞ω formula is a disjunction of those, and thus equivalent to an FOk formula. (In fact, we proved a stronger statement that on Ck, every Lk ∞ω formula is equivalent to a quantifier-free FOk formula.) We now consider the classes C1 ⊆ C2 ⊆ . . ., and observe that since each µ(Ck) is 1, then for any sequence ǫ1 > ǫ2 > . . . > 0 such that limn→∞ ǫn = 0, we can find an increasing sequence of numbers n1 < n2 < . . . < nk < . . . such that µn(Ck ∩ Grn) > 1 − ǫk, for n > nk. We then define C = A ∈ STRUCT[{E}] if |A| ≥ nk, then A ∈ Ck . One can easily check that µ(C) = 1. We claim that every Lω ∞ω formula is equivalent to an FO formula on C. Indeed, let ϕ be an Lk ∞ω formula. We know that on Ck, it is equivalent to an FOk formula ϕ′ . Thus, to find a formula ψ 246 12 Zero-One Laws to which ϕ is equivalent on C, one explicitly enumerates all the structures of cardinality up to nk and evaluates ϕ on them. Then, one writes an FO formula ψk saying that if A is one of the structures with |A| < nk, then ψk(A) = ϕ(A), and for all the structures with |A| ≥ nk, ψk agrees with ϕ′ . Since the number of structures of cardinality up to nk is fixed, this can be done in FO. This result has complexity-theoretic implications. While we know that LFP and PFP queries have respectively Ptime and Pspace data complexity, Theorem 12.18 shows that their complexity can be reduced to AC0 on almost all structures. 12.6 Bibliographic Notes That FO has the zero-one law was proved first by Glebskii et al. [92] in 1969, and independently by Fagin (announced in 1972, but the journal version [73] appeared in 1976). Fagin used extension axioms introduced by Gaifman [87]. Blass, Gurevich, and Kozen [22] and – independently – Talanov and Knyazev [227] proved that LFP has the zero-one law, and the result for Lω ∞ω is due to Kolaitis and Vardi [152]. The random graph was discovered by Erd¨os and R´enyi [67] (the probabilistic construction); the deterministic construction used here is due to Rado [203]. In fact, RG is sometimes referred to as the Rado graph. This is also a standard construction in model theory (the Fra¨ıss´e limit of finite graphs, see [125]). The results about the theory of the random graph are from Gaifman [87]. Fagin [74] offers some additional insights into the history of extension axioms. The fact that the infinite Ehrenfeucht-Fra¨ıss´e game implies isomorphism of countable structures is from Karp [143]. The study of the zero-one law for fragments of ∃SO was initiated by Kolaitis and Vardi [150], where they proved Theorem 12.12. Theorem 12.15 is from Kolaitis and Vardi [151], and Theorem 12.16 is from Le Bars [163]. A good survey on zero-one laws and SO is Kolaitis and Vardi [155] (in particular, it explains how to prove that the zero-one law fails for ∃SO(∀∃∀) and ∃SO(∀∀∀∃) without equality). Theorem 12.18 is from Hella, Kolaitis, and Luosto [122]. For related results in the context of database query evaluation, see Abiteboul, Compton, and Vianu [1]. Sources for exercises: Exercises 12.3 and 12.4: Fagin [73] Exercise 12.5: Lynch [173] Exercise 12.6: Kaufmann and Shelah [144] and Le Bars [164] Exercise 12.7: Grandjean [104] Exercise 12.8: Hodges [125] Exercise 12.9 (b): Cameron [31] 12.7 Exercises 247 Exercise 12.11: Le Bars [163] Exercise 12.12: Kolaitis and Vardi [150] Kolaitis and Vardi [155] Blass, Gurevich, and Kozen [22] 12.7 Exercises Exercise 12.1. Calculate µ(P) for the following properties P: • rigidity; • 2-colorability; • being a tree; • Hamiltonicity; • having diameter 2. Exercise 12.2. Prove the zero-one law for arbitrary vocabularies, using extension axioms EAF,G. Exercise 12.3. Instead of µn(P), consider νn(P) as the ratio of the number of different isomorphism types of graphs on {0, . . . , n−1} that have P and the number of all different isomorphism types of graphs on {0, . . . , n−1}. Let ν(P) be defined as the limit of νn(P). Prove that if P is an FO-definable property, then ν(P) = µ(P), and thus is either 0 or 1. Exercise 12.4. If constant or function symbols are allowed in the vocabulary, the zero-one law may not be true. Specifically, prove that: • if c is a constant symbol and U a unary predicate symbol, then U(c) has asymptotic probability 1 2 ; • if f is a unary function symbol, then ∀x ¬(x = f(x)) has asymptotic probability 1 e . Exercise 12.5. Instead of the usual successor relation, consider a circular successor: a relation of the form {(a1, a2), (a2, a3), . . . , (an−1, an), (an, a1)}. Prove that in the presence of a circular successor, FO continues to have the zero-one law. Exercise 12.6. Prove that MSO does not have the zero-one law. Hint: choose a vocabulary σ to consist of several binary relations, and prove that there is an FO formula ϕ(x, y) of vocabulary σ ∪ {U}, where U is unary, such that the MSO sentence ∃U ϕ′ almost surely holds, where ϕ′ states that the set of pairs for (x, y) for which ϕ(x, y) holds is a linear ordering. Then the failure of the zero-one law follows since we know that MSO+ < can define even. Prove a stronger version of this failure, for the vocabulary of one binary relation. Exercise 12.7. Prove that for vocabularies with bounded arities, the problem of deciding whether µ(Φ) = 1, where Φ is FO, is Pspace-complete. Exercise 12.8. Prove that the random graph admits quantifier elimination: that is, every formula ϕ(x) is equivalent to a quantifier-free formula ϕ′ (x). 248 12 Zero-One Laws Exercise 12.9. (a) Consider the following undirected graph G: its universe is N+ = {n ∈ N | n > 0} and there is an edge between n and m, for n > m, iff n is divisible by pm, the mth prime. Prove that G is isomorphic to the random graph RG. Hint: the proof does not require any number theory, and is a simple application of extension axioms. (b) Consider another countable graph G′ whose universe is the set of primes congruent to 1 modulo 4. Put an edge between p and q if p is a quadratic residue modulo q. Prove that G′ is isomorphic to the random graph RG. Exercise 12.10. Let Φ be an arbitrary ∃SO sentence. Prove that it is undecidable whether µ(Φ) = 1. Exercise 12.11. Prove that the restriction of ∃SO, where the first-order part is a formula of FO2 , does not have the zero-one law. Exercise 12.12. Prove that for vocabularies with bounded arities, the problem of deciding whether µ(Φ) = 1 is • Nexptime-complete, if Φ is an ∃SO(∃∗ ∀∗ ) sentence, or an ∃SO(∃∗ ∀∃∗ ) sentence; • Exptime-complete, if Φ is an LFP sentence. Exercise 12.13.∗ Does ∃SO(∀∀∃) have the zero-one law over graphs? 13 Embedded Finite Models In finite model theory, we deal with logics over finite structures. In embedded finite model theory, we deal with logics over finite structures embedded into infinite ones. For example, one assumes that nodes of graphs are numbers, and writes sentences like ∃x∃y E(x, y) ∧ (x · y = x · x + 1) saying that there is an edge (x, y) in a graph with xy = x2 + 1. The infinite structure in this case could be R, +, · , or N, +, · , or Q, +, · . What kinds of queries can one write in this setting? We shall see in this chapter that the answer depends heavily on the properties of the infinite structure into which the finite structures are embedded: for example, queries such as even and graph connectivity turn out to be expressible on structures embedded into N, +, · , or Q, +, · , but not R, +, · . The main motivation for embedded finite models comes from database theory. Relational calculus – that is, FO – is the basic relational query language. However, databases store interpreted elements such as numbers or strings, and queries in all practical languages use domain-specific operations, like arithmetic operations for numbers, or concatenation and prefix comparison for strings, etc. Embedded finite model theory studies precisely these kinds of languages over finite models, where the underlying domain is potentially infinite, and operations over that domain can be used in formulae. 13.1 Embedded Finite Models: the Setting Assume that we have two vocabularies, Ω and σ, where σ is finite and relational. Let M be an infinite Ω-structure U, Ω , where U is an infinite set. For example, if Ω contains two binary functions + and ·, then R, +, · and N, +, · are two possible infinite Ω-structures, with + and · interpreted, in both cases, as addition and multiplication respectively. 250 13 Embedded Finite Models Definition 13.1. Let M = U, Ω be an infinite Ω-structure, and let σ = {R1, . . . , Rm}. Suppose the arity of each Ri is pi > 0. Then an embedded finite model (i.e., a σ-structure embedded into M) is a structure A = A, RA 1 , . . . , RA l , where each RA i is a finite subset of Upi , and A is the set of all the elements of U that occur in the relations RA 1 , . . . , RA l . The set A is called the active domain of A, and is denoted by adom(A). So far this is not that much different from the usual finite model, except that the universe comes from a given infinite set U. What makes the setting different, however, is the presence of the underlying structure M, which makes it possible to use rich logics for defining queries on embedded finite models. That is, instead of just FO over A, we shall use FO over (M, A) = (U, Ω, RA 1 , . . . , RA l , making use of operations available on M. Before we define this logic, denoted by FO(M, σ), we shall address the issue of quantification. The universe of (M, A) is U, so saying ∃xϕ(x) means that there is an element of U that witnesses ϕ. But while we are dealing with finite structures A embedded into M, quantification over the entire set U is not always very convenient. Consider, for example, the simple property of reflexivity. In the usual finite model theory context, to state that a binary relation E is reflexive, we would say ∀x E(x, x). However, if the interpretation of ∀x is “for all x ∈ U”, this sentence would be false in all embedded finite models! What we really want to say here is: “for all x in the active domain, E(x, x) holds”. The definition of FO(M, σ) thus provides additional syntax to quantify over elements of the active domain. Definition 13.2. Given M = U, Ω and a relational vocabulary σ, first-order logic (FO) over M and σ, denoted by FO(M, σ), is defined as follows: • Any atomic FO formula in the language of M is an atomic FO(M, σ) formula. For any p-ary symbol R from σ and terms t1, . . . , tp in the language of M, R(t1, . . . , tp) is an atomic FO(M, σ) formula. • Formulae of FO(M, σ) are closed under the Boolean connectives ∨, ∧, and ¬. • If ϕ is an FO(M, σ) formula, then the following are FO(M, σ) formulae: – ∃x ϕ, – ∀x ϕ, – ∃x∈adom ϕ, and – ∀x∈adom ϕ. 13.1 Embedded Finite Models: the Setting 251 The class of first-order formulae in the language of M will be denoted by FO(M) (i.e., the formulae built up from atomic M-formulae by Boolean connectives and quantification ∃, ∀). The class of formulae not using the symbols from Ω will be denoted by FO(σ) (in this case all four quantifiers are allowed). The notions of free and bound variables are the usual ones. To define the semantics, we need to define the relation (M, A) |= ϕ(a), for a formula ϕ(x) and a tuple a over U of values of free variables. All the cases are standard, except quantification. If we have a formula ϕ(x, y), and a tuple of elements b (values for y), then (M, A) |= ∃x ϕ(x, b) iff (M, A) |= ϕ(a, b) for some a ∈ U. On the other hand, (M, A) |= ∃x∈adom ϕ(x, b) iff (M, A) |= ϕ(a, b) for some a ∈ adom(A). The definitions for the universal quantification are: (M, A) |= ∀x ϕ(x, b) iff (M, A) |= ϕ(a, b) for all a ∈ U (M, A) |= ∀x∈adom ϕ(x, b) iff (M, A) |= ϕ(a, b) for all a ∈ adom(A). Since M is most of the time clear from the context, we shall often write A |= ϕ(a) instead of the more formal (M, A) |= ϕ(a). The quantifiers ∃x ∈ adom ϕ and ∀x ∈ adom ϕ are called active-domain quantifiers. We shall sometimes refer to the usual quantifies ∃ and ∀ as unrestricted quantifiers. From the point of view of expressive power, active-domain quantifiers are a mere convenience: since adom(A) is definable with unrestricted quantification, so are these quantifiers. But we use them separately in order to define an important sublogic of FO(M, σ). Definition 13.3. By FOact(M, σ) we denote the fragment of FO(M, σ) that only uses quantifiers ∃x ∈ adom and ∀x ∈ adom. Formulae in this fragment are called the active-domain formulae. Before moving on to the expressive power of FO(M, σ), we briefly discuss evaluation of such formulae. Since quantification is no longer restricted to a finite set, it is not clear a priori that formulae of FO(M, σ) can be evaluated – and, indeed, in some cases there is no algorithm for evaluating them. However, there is one special case when evaluation of formulae is “easy” (that is, easy to explain, not necessarily easy to evaluate). Suppose we have a sentence Φ of FO(M, σ), and an embedded finite model A. We further assume that every element c ∈ adom(A) is definable over M: that is, there is an FO(M) formula αc(x) such that M |= αc(x) iff x = c. In such a case, we replace every occurrence of an atomic formula R(t1(x), . . . , tm(x)), where R ∈ σ and the ti’s are terms, by 252 13 Embedded Finite Models (c1,...,cm)∈RA αc1 (t1(x)) ∧ . . . ∧ αcm (tm(x)). That is, we say that the tuple of values of the ti(x)’s is one of the tuples in RA . Thus, if ΦA is the sentence obtained from Φ by such a replacement, then (M, A) |= Φ ⇔ M |= ΦA . (13.1) Notice that ΦA is an FO(M) sentence, since all the σ-relations disappeared. Now using (13.1) we can propose the following evaluation algorithm: given Φ, construct ΦA , and check if M |= ΦA . The last is possible if the theory of M is decidable. 13.2 Analyzing Embedded Finite Models When we briefly looked at the standard model-theoretic techniques in Chap. 3, we noticed that they are generally inapplicable in the setting of finite model theory. For embedded finite models, we mix the finite and the infinite: we study logics over pairs (M, A), where M is infinite and A is finite. So the question arises: can we use techniques of either finite or infinite model theory? It turns out that we cannot use finite or infinite model-theoretic techniques directly; as we are about to show, in general, they fail over embedded finite models. Then we outline a new kind of tools that is used with embedded finite models: by using infinite model-theoretic techniques, we reduce questions about embedded finite models to questions about finite models, for which the preceding 12 chapters give us plenty of answers. In general, we shall see that the behavior of FO(M, σ) depends heavily on model-theoretic properties of the underlying structure M. We now discuss standard (finite) model-theoretic tools and their applicability to the study of embedded finite models. First, notice that compactness fails over embedded finite models for the same reason as for finite models. One can write sentences λn, n ≥ 0, stating that adom(A) contains at least n elements. Then T = {λn | n ≥ 0} is finitely consistent: every finite set of sentences has a finite model. However, T itself does not have a finite model. One tool that definitely applies in the embedded setting is EhrenfeuchtFra¨ıss´e games. However, playing a game is very hard. Assume, for example, that M is the real field R, +, · . Suppose σ is empty, and we want to show that the query even, testing if |adom(A)| is even, is not expressible (which, as we shall see later, is a true statement). As in the proof given in Chap. 3, suppose even is expressible by a sentence Φ of quantifier rank k. Before, we picked two structures, A1 of cardinality k and A2 of cardinality k + 1, and showed that A1 ≡k A2. Our problem now is that showing A1 ≡k A2 no longer suffices, as we have to prove 13.2 Analyzing Embedded Finite Models 253 (M, A1) ≡k (M, A2) (13.2) instead. For example, in the old strategy for winning the game on A1 and A2, if the spoiler plays any point a1 in A1 in the first move, the duplicator can respond by any point A2. But now we have to account for additional atomic formulae such as p(x) = 0, where p is a polynomial. So if we know that p(a1) = 0 for some given p, the strategy must also ensure that p(a2) = 0. It is not at all clear how one can play a game like that, to satisfy (13.2). The next obvious approach is to try finite model-theoretic techniques that avoid Ehrenfeucht-Fra¨ıss´e games, such as locality and zero-one laws. This approach, however, cannot be used for all structures M, as the following example shows. Let N be the well-known structure N, +, · ; that is, natural numbers with the usual arithmetic operations. A σ-structure over N is a σ-structure whose active domain is a finite subset of N, and hence it can be encoded by some reasonable encoding (e.g., a slight modification of the encoding of Chap. 6, where in addition all numbers in the active domain are encoded in binary). A Boolean query on σ-structures embedded into N is a function Q from such structures into {true, false}. It is computable if there is a computable function fQ : {0, 1}∗ → {0, 1} such that fQ(s) = 1 iff s is an encoding of a structure A such that Q(A) = true. Proposition 13.4. Every computable Boolean query on σ-structures embedded into N can be expressed in FO(N, σ). Proof. Without loss of generality, we assume that σ contains a single binary relation E. We use the following well-known fact about N: every computable predicate P ⊆ Nm is definable by an FO(N) formula, which we shall denote by ψP (x1, . . . , xm). The idea of the proof then is to code finite σ-structures with numbers. For a query Q, the sentence defining it will be ΦQ = ∃x χ(x) ∧ ψPQ (x) , (13.3) where χ(x) says that the input structure A is coded by the number x, and the predicate PQ is the computable predicate such that PQ(n) holds iff n is the code of a structure A with Q(A) = true. Thus, we have to show how to code structures. Let pn denote the nth prime, with the numeration starting at p0 = 2. Suppose we have a structure A with adom(A) = {n1, . . . , nk}. We first code the active domain by code0(A) = k i=1 pni . There is a formula χ0(x) of FO(N, σ) such that A |= χ0(n) iff code0(A) = n. Such a formula states the following condition: 254 13 Embedded Finite Models • for each l ∈ adom(A), n is divisible by pl but not divisible by p2 l , and • if n is divisible by a prime number p, then p is of the form pl for some l ∈ adom(A). Since the binary relation {(n, pn) | n ≥ 0} is computable and thus definable in FO(N), χ0 can be expressed as an FO(N, σ) formula. We next code the edge relation E. Let pair : N × N → N be the standard pairing function. We then code EA by code1(A) = (ni,nj )∈EA ppair(ni,nj ). As in the case of coding the active domain, there exists a formula χ1(x) such that A |= χ1(n) iff code1(A) = n – the proof is the same as for χ0. Finally, we code the whole structure by code(A) = pair code0(A), code1(A) . Clearly, A = B implies code(A) = code(B), so we did define a coding function. Moreover, since χ0 and χ1 are FO(N, σ) formulae, the formula χ(x) can be defined as ∃y∃z χ0(y)∧χ1(z)∧ψP (y, z, x), where P is the graph of the pairing function. This completes the coding scheme, and thus shows that (13.3) defines Q on structures embedded into N. Therefore, in FO(N, σ) we can express queries that violate locality notions (e.g., connectivity) and queries that do not obey the zero-one law (e.g., parity). Hence, we need a totally different set of techniques for proving bounds on the expressive power of FO(M, σ). If we want to prove results about FO(M, σ), perhaps we can relate this logic to something we know how to deal with: the pure finite model theory setting. In our new terminology, this would be FOact(U∅, σ), where U∅ = U, ∅ is a structure of the empty vocabulary. That is, there are no functions or predicates from M used in formulae, and all quantification is restricted to the finite universe adom(A). (Notice that the setting of FOact(U∅, σ) is in fact a bit more restrictive than the usual finite model theory setting: in the latter, we quantify over a finite universe that may be larger than the active domain.) For technical reasons that will become clear a bit later, we shall deal not with U∅ but rather with U< = U, < , where < is a linear order on U. Then FOact(U<, σ) corresponds to what we called FO+< in the finite model theory setting. We know a number of results about this logic: in particular, it cannot express the query even (Theorem 3.6) nor can it express graph connectivity (Theorem 5.8). We now present the first of our two new tools. First, we need the following. Suppose Ω′ expands Ω by adding some (perhaps infinitely many) predicate symbols. We call a structure M′ = U, Ω′ a definitional expansion of M = U, Ω if for every predicate P ∈ Ω′ − Ω, there exists a formula ψP (x) in the language of M such that PM′ = {a | M |= ψP (a)}. 13.2 Analyzing Embedded Finite Models 255 Definition 13.5. We say that M admits the restricted quantifier collapse, or RQC, if there exists a definitional expansion M′ of M such that FO(M, σ) = FOact(M′ , σ) for every σ. The notion of RQC can be formulated without using a definitional expansion as follows. For every FO(M, σ) formula ϕ(x), there is an equivalent formula ϕ′ (x) such that no σ-relation appears within the scope of an unrestricted quantifier ∃ or ∀ (i.e., σ-relations only appear within the scope of restricted quantifiers ∃x∈adom and ∀x∈adom). There is one special form of the restricted quantifier collapse, which arises for structures M that have the collapse and also have quantifier elimination (that is, every FO(M) formula is equivalent to a quantifier-free one). In this case, if FOact(M′ , σ) refers to a definable predicate P ∈ Ω′ −Ω, we know that P is definable by a quantifier-free formula over M. Hence, using the definition of P, we obtain an equivalent FO(M, σ) formula. Thus, we have: Proposition 13.6. If M admits the restricted quantifier collapse (RQC) and has quantifier elimination, then FO(M, σ) = FOact(M, σ). (13.4) The condition in (13.4) is usually called the natural-active collapse, since the standard unrestricted interpretation of quantifiers is sometimes called the “natural interpretation”. Using RQC, or the natural-active collapse, eliminates quantification outside of the active domain. To reduce the expressiveness of FO(M, σ) to that of FOact(U<, σ), we would also like to eliminate all references to M functions and predicates, except possibly order. This, however, in general is impossible: how could one express a query like ∃x∈adom∃y∈adom E(x, y)∧x·y = x+1? To deal with this problem, we use the notion of genericity which comes from the classical relational database setting. Informally, it states the following: when one evaluates formulae on embedded finite models, exact values of elements in the active domain do not matter. For example, the answer to the query “Does a graph have diameter 2?” is the same for the graph {(1, 2), (1, 3), (1, 4)} and for the graph {(5, 10), (5, 15), (5, 20)}, which is obtained by the mapping 1 → 5, 2 → 10, 3 → 15, 4 → 20. In general, generic queries commute with permutations of the universe. Queries expressible in FO(M, σ) need not be generic: for example, the query given by ∃x∈adom∃y∈adom E(x, y)∧x·y = x+1 is true on E = {(1, 2)} but false on E = {(1, 3)}. However, as all queries definable in standard logics over finite structures are generic, to reduce questions about FO(M, σ) to those in ordinary finite model theory, it suffices to restrict one’s attention to generic queries. 256 13 Embedded Finite Models We now define genericity for queries (which map a finite σ-structure A to a finite subset of Am , m ≥ 0). Given a function π : U → U, we extend it to finite σ-structures A by replacing each occurrence of a ∈ adom(A) with π(a). Definition 13.7. • A query Q is generic if for every partial injective function π : U → U which is defined on adom(A), it is the case that Q(A) = Q(π(A)). • The class of generic queries definable in FO(M, σ) or FOact(M, σ) is denoted by FOgen (M, σ) or FOgen act (M, σ), respectively. While it is undecidable in general if an FO(M, σ) query is generic, most queries whose inexpressibility we want to prove are generic. Definition 13.8. We say that M admits the active-generic collapse, if FOgen act (M, σ) ⊆ FOact(U<, σ). Now using the different notions of collapse together, we come up with the following methodology of proving bounds on FO(M, σ). Proposition 13.9. Let M admit both the restricted-quantifier collapse (RQC) and the active-generic collapse. Then every generic query expressible in FO(M, σ) is also expressible in FOact(U<, σ). For example, it would follow from Theorem 3.6 that for M as in the proposition above, even is not expressible in FO(M, σ). Furthermore, for such M, every query in FOgen (M, σ) is Gaifman-local, by Proposition 13.9 and Theorem 5.8. Thus, our next goal is to see for what structures collapse results can be established. We start with the active-generic collapse, and prove, in the next section, that it holds for all structures. The situation with RQC is not nearly as simple. We shall see that it fails for N, +, · and Q, +, · , but we shall prove it for the ordered real field R, +, ·, < , 0, 1 . This structure motivated much of the initial work on embedded finite models due to its database applications; this will be explained in Sect. 13.6. More examples of RQC (or its failure) are given in the exercises. We shall also revisit the random graph of the previous chapter and relate queries over it to those definable in MSO. 13.3 Active-Generic Collapse Our goal is to prove the following result. Theorem 13.10. Every infinite structure M admits the active-generic col- lapse. 13.3 Active-Generic Collapse 257 We shall assume that M is ordered: that is, one of its predicates is < interpreted as a linear order on its universe U. If this were not the case, we could have expanded M to M< by adding a linear order. Since FO(M, σ) ⊆ FO(M<, σ), the active-generic collapse for M< would imply the collapse for M: FOgen act (M, σ) ⊆ FOgen act (M<, σ) ⊆ FOact(U<, σ). The idea behind the proof of Theorem 13.10 is as follows: we show that for each formula, its behavior on some infinite set is described by a first-order formula which only uses < and no other symbol from the vocabulary of M. This is called the Ramsey property. We then show how genericity and the Ramsey property imply the collapse. Definition 13.11. Let M = U, Ω be an ordered structure. We say that an FOact(M, σ) formula ϕ(x) has the Ramsey property if the following is true: Let X be an infinite subset of U. Then there exists an infinite set Y ⊆ X and an FOact(U<, σ) formula ψ(x) such that for every σstructure A with adom(A) ⊂ Y , and for every a over Y , it is the case that A |= ϕ(a) ↔ ψ(a). We now prove the Ramsey property for an arbitrary ordered M. The following simple lemma will often be used as a first step in proofs of collapse results. Before stating it, note that for an FO(M, σ) formula (x = y) can be viewed as both an atomic FO(σ) formula and an atomic FO(M) formula. We choose to view it as an atomic FO(M) formula; that is, atomic FO(σ) formulae are only those of the form R(· · · ) for R ∈ σ. Lemma 13.12. Let ϕ(x) be an FO(M, σ) formula. Then there exists an equivalent formula ψ(x) such that every atomic subformula of ψ is either an FO(σ) formula, or an FO(M) formula. Furthermore, it can be assumed that none of the free variables x occurs in an FO(σ)-atomic subformula of ψ(x). If ϕ is an FOact(M, σ) formula, then ψ is also an FOact(M, σ) formula. Proof. Introduce m fresh variables z1, . . . , zm, where m is the maximal arity of a relation in σ, and replace any atomic formula of the form R(t1(y), . . . , tl(y)), where l ≤ m and the ti’s are M-terms, by ∃z1 ∈adom . . . ∃zl ∈adom i(zi = ti(y))∧R(z1, . . . , zl). Similarly use existential quantifiers to eliminate the free x-variables from FO(σ)-atomic formulae. The key in the inductive proof of the Ramsey property is the case of FO(M) subformulae. For this, we first recall the infinite version of Ramsey’s theorem, in the form most convenient for our purposes. Theorem 13.13 (Ramsey). Given an infinite ordered set X, and any partition of the set of all ordered m-tuples x1, . . . , xm , x1 < . . . < xm, of elements of X into l classes A1, . . . , Al, there exists an infinite subset Y ⊆ X such that all ordered m-tuples of elements of Y belong to the same class Ai. 258 13 Embedded Finite Models The following is a standard model-theoretic result that we prove here for the sake of completeness. Lemma 13.14. Let ϕ(x) be an FO(M) formula. Then ϕ has the Ramsey property. Proof. Consider a (finite) enumeration of all the ways in which the variables x may appear in the order of U. For example, if x = (x1, . . . , x4), one possibility is x1 = x3, x2 = x4, and x1 < x2. Let P be such an arrangement, and ζ(P) a first-order formula that defines it (x1 = x3 ∧ x2 = x4 ∧ x1 < x2 in the above example). Note that there are finitely many such arrangements P; let P be the set of all of those. Each P induces an equivalence relation on x: for example, {(x1, x3), (x2, x4)} for P above. Let xP be a subtuple of x containing a representative for each class (e.g., (x1, x4)) and let ϕP (xP ) be obtained from ϕ by replacing all variables from an equivalence class by the chosen representative. Then ϕ(x) is equivalent to P ∈P ζ(P) ∧ ϕP (xP ). We now show the following. Let P′ ⊆ P and P0 ∈ P′ . Let X ⊆ U be an infinite set. Assume that ψ(x) is given by P ∈P′ ζ(P) ∧ ϕP (xP ). Then there exists an infinite set Y ⊆ X and a quantifier-free formula γP0 (x) of the vocabulary {<} such that ψ is equivalent to γP0 (x) ∨ P ∈P′−{P0} ζ(P) ∧ ϕP (xP ) for tuples x of elements of Y . To see this, suppose that P0 has m equivalence classes. Consider a partition of tuples of Xm ordered according to P0 into two classes: A1 of those tuples for which ϕP0 (xP0 ) is true, and A2 of those for which ϕP0 (xP0 ) is false. By Ramsey’s theorem, for some infinite set Y ⊆ X either all ordered tuples over Y m are in A1, or all are in A2. In the first case, ψ is equivalent to ζ(P0) ∨ P ∈P′−{P0} ζ(P) ∧ ϕP (xP ), and in the second case ψ is equivalent to P ∈P′−{P0} ζ(P) ∧ ϕP (xP ), proving the claim. The lemma now follows by applying this claim inductively to every partition P ∈ P, passing to smaller infinite sets, while getting rid of all the formulae containing symbols other than = and <. At the end we have an infinite set over which ϕ is equivalent to a quantifier-free formula in the vocabulary {<}. The next lemma lifts the Ramsey property from FO(M) formulae to arbitrary FOact(M, σ) formulae. 13.3 Active-Generic Collapse 259 Lemma 13.15. Every FOact(M, σ) formula has the Ramsey property. Proof. By Lemma 13.12, we assume that every atomic subformula is an FOact(σ) formula or an FO(M) formula. The base cases for the induction are the case of FOact(σ) formulae, where there is no need to change the formula or find a subset, and the case of FO(M) atomic formulae, which is given by Lemma 13.14. Let ϕ(x) ≡ ϕ1(x)∧ϕ2(x), where X ⊆ U is infinite. First, find ψ1, Y1 ⊆ X, such that for every A and a over Y1, it is the case that A |= ϕ1(a) ↔ ψ1(a). Next, by using the hypothesis for ϕ2 and Y1, find an infinite set Y2 ⊆ Y1 such that for every A and a over Y2, it is the case that A |= ϕ2(a) ↔ ψ2(a). Then take ψ ≡ ψ1 ∧ ψ2 and Y = Y2. The case of ϕ = ¬ϕ′ is trivial. For the existential case, let ϕ(x) ≡ ∃y∈adom ϕ1(y, x). By the hypothesis, find Y ⊆ X and ψ1(y, x) such that for every A and a over Y and every b ∈ Y we have A |= ϕ1(b, a) ↔ ψ1(b, a). Let ψ(x) ≡ ∃y ∈ adom ψ1(y, x). Then, for every A and a over Y , A |= ψ(a) iff A |= ψ1(b, a) for some b ∈ adom(A) iff A |= ϕ1(b, a) for some b ∈ adom(A) iff A |= ϕ1(a), thus finishing the proof. To finish the proof of Theorem 13.10, we have to show the following. Lemma 13.16. Assume that every FOact(M, σ) formula has the Ramsey property. Then M admits the active-generic collapse. Proof. Let Q be a generic query definable in FOact(M, σ). By the Ramsey property, we find an infinite X ⊆ U and an FOact(U<, σ)-definable Q′ that coincides with Q on X. We claim they coincide everywhere. Let A be a σstructure. Since X is infinite, there exists a partial monotone injective function π from adom(A) into X such that for every pair of elements a < a′ of adom(A), there exist x1, x2, x3 ∈ X − π(adom(A)) with the property that x1 < π(a) < x2 < π(a′ ) < x3. By the genericity of Q, we have π(Q(A)) = Q(π(A)). Thus, Q(π(A)) coincides with the restriction of Q′ (π(A)) to X. We now notice that Q′ does not extend its active domain. Indeed, if adom(Q′ (π(A))) contained an element b ∈ π(adom(A)), we could have replaced this element by b′ ∈ X −π(adom(A)) such that for every a ∈ π(adom(A)), a < b iff a < b′ . Since Q′ is FOact(U<, σ)definable, this would imply that b′ ∈ adom(Q′ (π(A))), which contradicts the fact that over X, the queries Q and Q′ coincide. Hence, π(Q(A)) = Q(π(A)) = Q′ (π(A)). Again, since Q′ is FOact(U<, σ)definable, it commutes with any monotone injective map, and thus Q′ (π(A)) = π(Q′ (A)). We have shown that π(Q(A)) = π(Q′ (A)), from which Q(A) = Q′ (A) follows. This completes the proof of Theorem 13.10. Thus, no matter what functions and predicates there are in M, FO cannot express more generic active-domain semantics queries over it than just FOact(U<, σ). In particular, we have the following. 260 13 Embedded Finite Models Corollary 13.17. Let M be an arbitrary structure. Then queries such as even, parity, majority, connectivity, transitive closure, and acyclicity are not definable in FOact(M, σ). 13.4 Restricted Quantifier Collapse One part of our program for establishing bounds on FO(M, σ) has been very successful: we prove the active-generic collapse for arbitrary structures. Can we hope to achieve the same success with the restricted-quantifier collapse (RQC)? The answer is clearly negative. Corollary 13.18. The restricted-quantifier collapse fails over N = N, +, · . Proof. By Corollary 13.17, parity is not definable in FOact(N, σ), but by Proposition 13.4, it is expressible in FO(N, σ). Furthermore, RQC fails over Q, +, · , since it is possible to define natural numbers within this structure, and then emulate the proof of Proposition 13.4 to show that every computable query is expressible. However, the situation becomes very different when we move to the real numbers. We shall consider the real ordered field: that is, the structure R = R, +, ·, <, 0, 1 . This is the structure that motivated much of the initial development in embedded finite models, due to its close connections with questions about the expressiveness of languages for geographical databases. Consider the following FO(R, {E}) sentence, where E is a binary relation symbol: ∃u∃v∀x∈adom∀y∈adom E(x, y) → y = u · x + v , (13.5) saying that all elements of E ⊂ R2 lie on a line. Notice that it is essential that the first two quantifiers range over the entire set R. For example, if E is interpreted as {(2, 2), (3, 3), (4, 4)}, then the sentence (13.5) is true, and the witnesses for the existential quantifiers are u = 1 and v = 0. But neither 0 nor 1 is in the active domain of E. Nevertheless, (13.5) can be expressed by an FOact(R, {E}) sentence. To see this, notice that E lies on a line iff every three points in E are collinear. This can be expressed as ∀x1 ∈adom∀y1 ∈adom∀x2 ∈adom∀y2 ∈adom∀x3 ∈adom∀y3 ∈adom E(x1, y1) ∧ E(x2, y2) ∧ E(x3, y3) → collinear(x, y) (13.6) 13.4 Restricted Quantifier Collapse 261 where collinear(x, y) is a formula, over R, stating that (x1, y1), (x2, y2), and (x3, y3) are collinear. It is easy to check that collinear(x, y) can be written as a quantifier-free formula (in fact, due to the quantifier elimination for the real field, every formula over R is equivalent to a quantifier-free formula, but the condition for collinearity can easily be expressed directly). Hence, (13.6) is an FOact(R, {E}) formula, equivalent to (13.5). This example is an instance of a much more general result, stating that the real field R admits RQC. In fact, we show the natural-active collapse for R (since R has quantifier elimination). Moreover, the proof is constructive. Theorem 13.19. The real field R = R, +, ·, <, 0, 1 admits the restricted quantifier collapse. That is, for every FO(R, σ) formula ϕ(x), there is an equivalent FOact(R, σ) formula ϕact(x). Moreover, there is an algorithm that constructs ϕact from ϕ. Proof. The proof of this result is by induction on the structure of the formula. We shall always assume, by Lemma 13.12, that all atomic FO(σ) formulae are of the form S(y), where y contains only variables. Thus, the base cases of the induction are as follows: • ϕ(x) is S(x). In this case ϕact ≡ ϕ. • ϕ(x) is an atomic FO(R) formula. Again, ϕact ≡ ϕ in this case. The cases of Boolean operations are simple: • If ϕ ≡ ψ ∨ χ, then ϕact ≡ ψact ∨ χact; • if ϕ ≡ ¬ψ, then ψact ≡ ¬ψact. We now move to the case of an unrestricted existential quantifier. We shall first treat the case of σ-structures A with adom(A) = ∅; at the end of the proof, we shall explain how to deal with empty structures. Suppose ϕ(x) ≡ ∃z β(x, z). By the induction hypothesis, β can be assumed to be of the form β(x, z) ≡ Qy1 ∈adom . . . Qym ∈adom BC αi(x, y, z) , where each Q is either ∃ or ∀, and: 1. BC αi(x, y, z) is a Boolean combination of atomic formulae α1, . . . , αs; 2. each FO(σ) atomic formula is of the form S(u), where u ⊆ y; 3. all atomic FO(R) formulae are of the form p(x, y, z) = 0 or p(x, y, z) > 0, where p is a polynomial; and 262 13 Embedded Finite Models 4. n, m > 0, and at least one of the FO(R) atomic formulae involves a multivariate polynomial p(x, y, z) = yi − z for some yi. The reason for this is that, under the assumption adom(A) = ∅, we can always replace β by β ∧ ∃y∈adom (y − y = 0) ∧ (y − z = 0) ∨ ¬(y − z = 0) . Putting the resulting formula in the prenex normal form fulfills the conditions listed in this item. We now assume that αi(x, y, z), 1 ≤ i ≤ n, are FO(R) atomic formulae pi(x, y, z)  = > ff 0, and αi, n < i ≤ s, are FO(σ) atomic formulae. We let di be the degree, in z, of pi. For each a, b, by pa,b i (z) we denote the univariate polynomial pi(a, b, z). Note that the degree of pa,b i is at most di. We let d = maxi di. Whenever we refer to the jth root of a univariate polynomial p, we mean its jth real root in the usual ordering, if such a root exists, and 0 otherwise. Note that there exists an FO(R) formula rootj p(x) which holds iff x is the jth root of p. We now prove the following. Lemma 13.20. Let ϕ(x) be as above, where the assumptions 1–4 hold. Let A be such that adom(A) = ∅. Fix a tuple of real numbers a. Then (R, A) |= ϕ(a) iff there exist i, k ≤ n, and j, l ≤ d and two tuples b, c over adom(A) of length |y|, such that (R, A) |= β a, ra,b ij + ra,c kl 2 ∨ β a, ra,b ij + 1 ∨ β a, ra,b ij − 1 , where ra,b ij is the jth root of pa,b i and ra,c kl is the kth root of pa,c j . Proof of Lemma 13.20. One direction is trivial: if there is a witness of a given form, then there is a witness. For the other direction, assume that (R, A) |= ϕ(a). We then must show that there exists a0 ∈ R of the form ra,b ij +ra,c kl 2 or ra,b ij ± 1 such that (R, A) |= β(a, a0). Let b1, . . . , bM be the enumeration of all the tuples of length |y| consisting of elements of adom(A). Consider all univariate polynomials p a,bj i (z), and let rijk be the kth root of p a,bj i (z), for k ≤ d. Let S be the family of all elements of the form rijk, i ≤ n, j ≤ M, k ≤ d. It follows from our assumptions that S = ∅ and adom(A) ⊆ S, since one of the polynomials is yi − z. We let rmin and rmax be the minimum and the maximum elements of S, respectively. Suppose (R, A) |= β(a, a0). If a0 ∈ S, then there is a polynomial pi, a tuple b, and j ≤ d such that a0 = ra,b ij . By selecting c = b, k = i, l = j, we see that a0 is of the required form. 13.4 Restricted Quantifier Collapse 263 Assume a0 ∈ S. There are three possible cases: 1. a0 < rmin, or 2. a0 > rmax, or 3. there exist r1, r2 ∈ S such that r1 < a0 < r2, and there is no other r ∈ S with r1 < r < r2. We claim that for every pi and every bj: sign p a,bj i (a0) = sign p a,bj i (rmin − 1) in case 1 sign p a,bj i (a0)) = sign p a,bj i (rmax + 1) in case 2 (13.7) sign p a,bj i (a0)) = sign p a,bj i r1 + r2 2 in case 3. Indeed, in the third case, suppose sign p a,bj i (a0) = sign p a,bj i (r1+r2 2 ) . Then the interval [a0, r1+r2 2 ] contains a real root of p a,bj i (z), which then must be in S. We conclude that there is an element of S between r1 and r2, a contradiction. The other two cases are similar. Let a1 be (rmin − 1) for case 1, (rmax + 1) for case 2, and r1+r2 2 for case 3. Then for every tuple bj, j ≤ M, and every atomic formula αi, we have αi(a, bj, a0) ↔ αi(a, bj, a1). (13.8) This follows from (13.7) and the fact that FO(σ) atomic formulae may not contain variable z. We can now use (13.8) to conclude that β(a, a0) ↔ β(a, a1). Clearly, the equivalence (13.8) propagates through Boolean combinations of formulae. Furthermore, notice that if for a finite set A and m > 0, α(a, b, b, a0) ↔ α(a, b, b, a1) for every b ∈ A and every b ∈ Am , then (∃x ∈ A α(a, x, b, a0)) ↔ (∃x ∈ A α(a, x, b, a1)) for every b ∈ Am . This shows that (13.8) propagates through active-domain quantification, and hence β(a, a0) ↔ β(a, a1). Thus, if (R, A) |= β(a, a0), then (R, A) |= β(a, a1). Since a1 is of the right form (either r − 1, or r + 1 for r ∈ S, or r+r′ 2 for r, r′ ∈ S), this concludes the proof of the lemma. To conclude the proof of the theorem, we note that Lemma 13.20 can be translated into an FO definition as follows. For each FO(R) atomic formula α(x, y, z), and for any two tuples u, v of the same length as y, we define the following formulae: 264 13 Embedded Finite Models • α 1/2 ikjl(x, y, u, v), for i, k ≤ n, j, l ≤ d, says that α(x, y, z) holds when z is equal to rx,u ij +rx,v kl 2 . That is, ∃z∃z1∃z2 rootj [px,u i ](z1) ∧ rootl [px,v k ](z2) ∧ (2z = z1 + z2) ∧ α(x, y, z) . • α+ ij(x, y, u) for i ≤ n, j ≤ d, says that α(x, y, z) holds for z = rx,u ij + 1; that is, ∃z∃z1 rootj [px,u i ](z1) ∧ (z = z1 + 1) ∧ α(x, y, z) . • α− ij(x, y, u) for i ≤ n, j ≤ d, says that α(x, y, z) holds for z = rx,u ij − 1; the FO definition is similar to the one given above, except that we use a conjunct z = z1 − 1. Note that by quantifier elimination for R, we may assume that all formulae α 1/2 ikjl(x, y, u, v), α+ ij(x, y, u), and α− ij(x, y, u) are quantifier-free. For i, k ≤ n, and j, l ≤ d, let γ 1/2 ikjl(x, y, u, v) be the Boolean combination BC(αs) where each atomic FO(R) formula α is replaced by α 1/2 ikjl(x, y, u, v). Let β 1/2 ijkl(x, u, v) be Qy1 ∈adom . . . Qym ∈adom γ 1/2 ikjl(x, y, u, v). Likewise, we define γ+ ij (x, y, u) to be the Boolean combination BC(αs) where each atomic FO(R) formula α is replaced by α+ ij(x, y, u), and let β+ ij(x, u) be γ+ ij (x, y, u) preceded by the quantifier prefix of β. Finally, we define β− ij (x, u) as β+ ij(x, u), except by using formulae α− ij(x, y, u). Now Lemma 13.20 says that ∃z β(x, z) is equivalent to ∃u∈adom∃v∈adom i,k≤n j,l≤d β 1/2 ijkl(x, u, v) ∨ β+ ij (x, u) ∨ β− ij (x, u) , which is an FOact(R, σ) formula. This completes the proof of the translation for the case of structures A with adom(A) = ∅. To deal with empty structures A, consider a formula ϕ(x), and let ϕ′ (x) be an FO(R) formula obtained from ϕ(x) by replacing each atomic FO(σ) subformula by false. Note that if adom(A) = ∅, then (R, A) |= ϕ(a) iff R |= ϕ′ (a). By quantifier elimination, we may assume that ϕ′ is quantifierfree. Hence, ϕ is equivalent to ¬∃y∈adom(y = y) ∧ ϕ′ (x) ∨ ∃y∈adom(y = y) ∧ ϕact(x) , (13.9) where ϕact is constructed by the algorithm for the case of nonempty structures. Clearly, (13.9) will work for both empty and nonempty structures. Since (13.9) is an FOact(R, σ) formula, this completes the proof. 13.5 The Random Graph and Collapse to MSO 265 Corollary 13.21. Every generic query in FO(R, σ) is expressible in FOact( R, < , σ). In particular, every such query is local, and even is not expressible in FO(R, σ). What other structures have RQC? There are many known examples, some of them presented as exercises at the end of the chapter. It follows immediately from Theorem 13.19 that R, +, < has RQC. Another example is given by R, +, ·, ex , the expansion of the real field with the function x → ex . The field of complex numbers is known to have RQC, as well as several structures on finite strings. See Exercises 13.10 – 13.14. 13.5 The Random Graph and Collapse to MSO The real field is a structure with a decidable theory. So is the structure Z = Z, +, < , which also admits RQC (see Exercise 13.10). In fact both admit quantifier elimination: for Z, one has to add all the definable relations (x − y) mod k = 0, as well as constant 1. Could it be true that one can guarantee RQC for every structure M with decidable theory? We give a negative answer here, which establishes a different kind of collapse: of FO(M, σ) to MSO under the active-domain semantics. The structure is the random graph RG = U, E , introduced in Chap. 12. This is any undirected graph on a countably infinite set U that satisfies every sentence that is true in almost all finite undirected graphs. Recall that the set of all such sentences forms a complete theory with infinite models, and that this theory is decidable and ω-categorical. The random graph satisfies the extension axioms EAn,m (12.2), for each n ≥ m ≥ 0. These say that for every finite n-element subset S of U, and an m-element subset T of S, there exists z ∈ S such that (z, x) ∈ E for all x ∈ T , and (z, x) ∈ E for all x ∈ S − T . Recall that MSO (see Chap. 7), is a restriction of second-order logic in which second-order variables range over sets. We define MSOact(M, σ) as MSO over the vocabulary that consists of both Ω and σ, every first-order quantifier is an active-domain quantifier (i.e., ∃x∈adom or ∀x∈adom), and every MSO quantifier is restricted to the active domain. We write such MSO quantifiers as ∃X ⊆ adom or ∀X ⊆ adom. The semantics is as follows: (M, A) |= ∃X ⊆ adom ϕ(X, ·) if for some set C ⊆ adom(A), it is the case that (M, A) |= ϕ(C, ·). Theorem 13.22. For every σ, FO(RG, σ) = MSOact(RG, σ). Proof. The idea is to use the extension axioms to model MSO queries. Consider an MSOact formula ϕ(x) QX1 ⊆adom . . . QXm ⊆adom Qy1 ∈adom . . . Qyn ∈adom α(X, x, y), 266 13 Embedded Finite Models where the Xi’s are second-order variables, the yj’s are first-order variables, and α is a Boolean combination of σ- and RG-formulae in variables x, y, and formulae Xi(xj) and Xi(yj). Construct a new FO(RG, σ) formula ϕ′ (x) by replacing each QXi ⊆ adom with Qzi ∈ adom ∪ x (which is FO-definable), and changing every atomic subformula Xi(u) to E(zi, u). In other words, a subset Xi of the active domain is identified by an element zi from which there are edges to all elements of Xi, and no edges to the elements of the active domain which do not belong to Xi. It is then easy to see, from the extension axioms, that ϕ′ is equivalent to ϕ. Hence, MSOact(RG, σ) ⊆ FO(RG, σ). For the other direction, proceed by induction on the FO(RG, σ) formulae. The only nontrivial case is that of unrestricted existential quantification. Suppose we have an MSOact(RG, σ) formula ϕ(x, z) ≡ QX ⊆adom Qy∈adom α(X, x, y, z), where x = (x1, . . . , xn), and α again is a Boolean combination of atomic σand RG-formulae, as well as formulae Xi(u), where u is one of the first-order variables z, x, y. We want to find an MSOact formula equivalent to ∃z ϕ. Such a formula is a disjunction of the form ∃z ∈adom ϕ ∨ i ϕ(x, xi) ∨ ∃z ∈ adom ϕ. Both ∃z ∈ adom ϕ and ϕ(x, xi) are MSOact(RG, σ) formulae. To eliminate z from ∃z ∈ adom ϕ, all we have to know about z is its connections to x and to the active domain in the random graph; the former is taken care of by a disjunction listing all subsets of {1, . . . , n}, and the latter by a secondorder quantifier over the active domain. For I ⊆ {1, . . . , n}, let χI(x) be a quantifier-free formula saying that no xi, xj with i ∈ I, j ∈ I, could be equal. We introduce a new second-order variable Z and define an MSOact formula ψ(x) as ∃Z ⊆adom I⊆{1,...,n} χI(x) ∧ QX ⊆adom Qy∈adom αZ I (X, Z, x, y) , where αZ I (X, Z, x, y) is obtained from α by: 1. replacing each E(z, xi) by true for i ∈ I and false for i ∈ I, 2. replacing each E(z, yj) by Z(yj), and 3. replacing each Xi(z) by false. The extension axioms then ensure that ψ is equivalent to ∃z ∈ adom ϕ. The active-generic collapse, as it turns out, can be extended to MSO. Proposition 13.23. Every generic query in MSOact(RG, σ) is expressible in MSO over σ-structures. 13.6 An Application: Constraint Databases 267 Proof. First, we notice that there exists an infinite subset Z of RG such that for every pair a, b ∈ Z, there is no edge between a and b (such a subset is easy to construct using one of the concrete representations of the random graph). Next, we show by induction on the formulae that for every MSOact(RG, σ) formula ϕ(X, x) and every infinite set Z′ ⊆ Z, there is an infinite set Z′′ ⊆ Z and an MSO formula ϕ′ (X, x) of vocabulary σ such that for every σ-structure A, and an interpretation of x, X as c, C over adom(A), (RG, A) |= ϕ(C, c) ↔ ϕ′ (C, c). Indeed, atomic formulae E(x, y) can be replaced by false. The rest of the proof is exactly the same as the proof of Lemma 13.15: the active-domain MSO quantifiers are handled exactly as the active-domain FO quantifiers. Next, the same proof as in Lemma 13.16 shows that if ϕ defines a generic query, then it is equivalent to ϕ′ over all σ-structures. This proves the propo- sition. Corollary 13.24. The class of generic queries expressible in FO(RG, σ) is precisely the class of queries definable in MSO over σ-structures. Thus, RG provides an example of a structure with quantifier elimination and decidable first-order theory (see Exercise 12.8) that does not admit RQC, but at the same time, one can establish meaningful bounds on the expressiveness of queries. For example, each generic query in FO(RG, σ) can be evaluated in PH, and string languages definable in FO(RG, σ) are precisely the regular languages. 13.6 An Application: Constraint Databases The framework of constraint databases can be described formally as the logic FO(M, σ), where each m-relation S in σ is interpreted not as a finite set, but as a definable subset of Um . That is, there is a formula αS(x1, . . . , xm) of FO(M) such that S is the set {a | M |= αS(a)}. The main application of constraint databases is in querying spatial information. The key idea of constraint databases is that regions are represented by FO formulae over some underlying structure: typically either the real field R, or Rlin = R, +, −, 0, 1, < . That is, they are described by polynomial or linear constraints over the reals. To illustrate how linear constraints can be used to describe a specific spatial database, consider the following example, representing an approximate map of Belgium (a real map will have many more constraints, but the basic ideas are the same). Fig. 13.1 shows the map itself, while Fig. 13.2 shows how regions and cities are described by constraints. One can then use FO(R, σ) or FO(Rlin, σ) to query those databases as if they were usual relational databases that store infinitely many points. For 268 13 Embedded Finite Models Liège Charleroi Brussels Bruges Bastogne Hasselt Antwerp Walloon Region Brussels Region Flemish Region 23222120191817161514131211101 2 3 4 5 6 7 8 9 6 17 16 15 14 13 12 11 10 9 8 1 2 3 4 5 7 Fig. 13.1. Spatial information map of Belgium example, to find all points in the Walloon region that are east of Hasselt one would write ϕ(x, y) = Walloon(x, y) ∧ ∃u, v Hasselt(u, v) ∧ x > u . (13.10) To find all the points in the Walloon region that are on the direct line from Hasselt to Li`ege, one writes a formula ϕ(x, y) as the conjunction of Walloon(x, y) and ∃u, v, s, t, λ     Hasselt(u, v) ∧ Li´ege(s, t) ∧ 0 ≤ λ ∧ λ ≤ 1 ∧ x = λu + (1 − λ)s ∧ y = λv + (1 − λ)t     . (13.11) In these examples, (13.10) is an FO( R, < , σ) query, while (13.11) needs to be expressed in the more expressive language FO(R, σ). We now give one simple application of embedded finite models to constraint databases. A basic property of regions is their topological connectivity. Most regions represented in geographical databases are connected (and the few examples of unconnected ones to be rather well known, as they usually lead to nasty political problems). But can we test this property in FO-based query 13.6 An Application: Constraint Databases 269 Cities Name Geometry Antwerp (x = 10)∧ (y = 16) Bastogne (x = 19)∧ (y = 6) Bruges (x = 5)∧ (y = 16) Brussels (x = 10.5)∧ (y = 12.5) Charleroi (x = 10)∧ (y = 8) Hasselt (x = 16)∧ (y = 14) Li`ege (x = 17)∧ (y = 11) Regions Name Geometry Brussels (y ≤ 13) ∧ (x ≤ 11)∧ (y ≥ 12) ∧ (x ≥ 10) Flanders (y ≤ 17) ∧ (5x − y ≤ 78)∧ (x − 14y ≤ −150)∧ (x + y ≥ 45)∧ (3x − 4y ≥ −53)∧ (¬((y ≤ 13) ∧ (x ≤ 11)∧ ∧(y ≥ 12) ∧ (x ≥ 10))) Walloon ((x − 14y ≥ −150) ∧ (y ≤ 12)∧ (19x + 7y ≤ 375)∧ (x − 2y ≤ 15) ∧ (x ≥ 13)∧ (5x + 4y ≥ 89)) ∨ ((3y − x ≥ 5) ∧ (x + y ≥ 45)∧ (x − 14y ≥ −150) ∧ (x ≥ 13)) Fig. 13.2. A spatial database of Belgium languages? We now give a simple proof of the negative answer, by reduction to collapse results. Theorem 13.25. Topological connectivity is not expressible in FO(R, σ). Proof. Assume, to the contrary, that topological connectivity of sets in R3 is definable (one can show that connectivity of sets on the plane is undefinable as well; the proof involves a slightly more complicated reduction and is the subject of Exercise 13.5). We show that graph connectivity is then definable. Suppose we have a finite undirected graph G with adom(G) ⊂ R. For each edge (a, b) in G, we define the segment s(a, b) in R3 between (a, 1, 0) and (0, 0, b). Each point in s(a, b) is of the form (λa, λ, (1 − λ)b) for some 0 ≤ λ ≤ 1. Note that this implies that s(a, b) ∩ s(c, d) = ∅ can only happen if a = c or b = d, since (λa, λ, (1 − λ)b) = (µc, µ, (1 − µ)d) implies λ = µ and thus for λ = 0, 1 we have a = c and b = d, for λ = 0 we get b = d, and for λ = 1 we get a = c. Now we encode each edge (a, b) by the set e(a, b) = s(a, b) ∪ s(b, a) ∪ s(a, a) ∪ s(b, b) (see Fig. 13.3). Note that e(a, b) is a connected set, and that e(a, b) ∩ e(c, d) = ∅ iff the edges (a, b) and (c, d) have a common node. We then define a new set XG in R3 as XG = (a,b)∈G e(a, b). It follows that XG is topologically connected iff G is connected as a graph. Since the transformation G → XG is definable in FO(R, σ), the assumption 270 13 Embedded Finite Models a b 1 a b Fig. 13.3. Embedding an edge (a, b) into R3 that topological connectivity is definable implies that so is graph connectivity. However, we know from Corollary 13.21 that graph connectivity cannot be expressed. This contradiction proves the theorem. 13.7 Bibliographic Notes The framework of embedded finite models originated in database theory, in connection with attempts to understand query languages that use interpreted operations, as well as query languages for constraint databases. Constraint databases were introduced by Kanellakis, Kuper, and Revesz [142] (see also the surveys by Kuper, Libkin, and Paredaens [158], Libkin [168], and Van den Bussche [242]). Soon after [142] was published, it became clear that many questions about languages for constraint databases reduce to questions about embedded finite models. For example, Grumbach and Su [115] present many reductions to the finite case. Collapse results as a technique for proving bounds on FO(M, σ) were introduced by Paredaens, Van den Bussche, and Van Gucht [197], where the restrcited-quantifier collapse for Rlin was proved. The collapse for the real field was shown by Benedikt and Libkin [19] (in fact the proof in [19] applies to a larger class of o-minimal structures; see [243]). The active-generic collapse was shown by Otto and Van den Bussche [193]; the proof given here follows [19]. For the basics of Ramsey theory, see Graham, Rothschild, and Spencer [103]. The collapse to MSO over the random graph is from [168], although one direction was proved earlier by [193]. 13.8 Exercises 271 Inexpressibility of connectivity by reduction to the finite case was first shown in [115]; for a different approach that characterizes topological properties expressible in FO(R, {S}), where S is binary, see Kuijpers, Paredaens, and Van den Bussche [157]. For a study of these problems over complex numbers, we refer to Chapuis and Koiran [36]. See also Exercise 13.6. Although we said in the beginning of the chapter that no collapse results were proved with the help of Ehrenfeucht-Fra¨ıss´e games, results by Fournier [83] show how to use games to establish bounds on the quantifier rank for expresssing certain properties over embedded finite models. An example is presented in Exercise 13.8. In this chapter we used a number of well-known results in classical model theory, such as decidability and quantifier elimination for the real field R (see Tarski [229]) and undecidability of the FO theory of Q, +, · (see Robinson [206]). Sources for exercises: Exercise 13.4: Benedikt and Libkin [19] Exercise 13.6: Chapuis and Koiran [36] Exercise 13.7: Grumbach and Su [115] Exercise 13.8: Fournier [83] Exercise 13.9: Hull and Su [127] Exercise 13.10: Flum and Ziegler [82] (see also [168] for a self-contained proof) Exercise 13.11: Benedikt and Libkin [19] Exercise 13.12: Flum and Ziegler [82] Exercise 13.13: Barrington et al. [15] Exercises 13.14–13.16: Benedikt et al. [21] 13.8 Exercises Exercise 13.1. Give an example of a noncomputable query expressible in FO(N, σ). Exercise 13.2. Prove that it is undecidable if a query expressible in FO(M, σ) is generic (even if the theory of M is decidable). Exercise 13.3. Suppose that S is a binary relation symbol, and R is a ternary one, and both are interpreted as sets definable over the real field R = R, +, ·, 0, 1, < . Show how to express the following in FO(R, {S, R}): • S is a graph of a function f : R → R; • S is a graph of a continuous function f : R → R; • S is a graph of a differentiable function f : R → R; • R is a trajectory of an object: that is, a triple (x, y, t) ∈ R gives a position (x, y) at time t; • a formula ϕ(x, y, v) which holds iff v is the speed of the object at time t (assuming that R defines a trajectory). 272 13 Embedded Finite Models Exercise 13.4. Prove a generalization of the Ramsey property (i.e., each activesemantics sentence expressing a generic query can be written using just the order relation) for SO, ∃SO, FO(Cnt), and a fixed point logic of your choice. Also prove that Lω ∞ω does not have such a generalized Ramsey property. Exercise 13.5. Use a reduction different from the one in the proof of Theorem 13.25 to show that topological connectivity of subsets of R2 is not definable in FO(R, {S}), where S is binary. Exercise 13.6. Prove that topological connectivity of subsets of C2 which are definable in C, +, −, ·, 0, 1 cannot be expressed in FO( C, +, −, ·, 0, 1 , {S}), where S is binary. Exercise 13.7. Prove that if S and S′ are interpreted as subsets of R2 definable in R, then none of the following is expressible in FO(R, {S, S′ }): • S contains at least one hole (assuming S is a closed set). • S has a Eulerian traversal. That is, if S is a union of line segments, then it has a traversal going through each line segment exactly once. • S and S′ are homeomorphic. Use reductions to the finite case for all three problems. Exercise 13.8. Show that in FO(R, σ) one can express even for sets of cardinality up to n using a sentence of quantifier rank O( √ log n). Exercise 13.9. Prove the natural-active collapse for U∅ = U, ∅ . Exercise 13.10. Prove the restricted quantifier collapse for Z, +, < . Exercise 13.11. An ordered structure M = U, Ω, < is called o-minimal if every definable subset of U is a finite union of points and open intervals (a, b), (−∞, a), (a, ∞). Prove the restricted quantifier collapse for an arbitrary o-minimal structure. Hint: you will need the following uniform bounds result of Pillay and Steinhorn [198]. If ϕ(x, y) is an FO(M) formula, then there exists a constant k such that for every b, the set {a | M |= ϕ(a, b)} is a union of fewer than k points and open intervals. One can use this result to infer that R, +, ·, ex admits the restricted quantifier collapse, since Wilkie [248] proved that it is o-minimal. Exercise 13.12. We say that a structure M has the finite cover property if there is a formula ϕ(x, y) such that for every n > 0, one can find tuples a1, . . . , an such that ∃x V j=i ϕ(x, aj) holds for each i ≤ n, but ∃x V j≤n ϕ(x, aj) does not hold. • Prove that if M does not have the finite cover property, then it admits the restricted quantifier collapse. • Conclude that C, +, · and N, succ admit the restricted quantifier collapse. 13.8 Exercises 273 Exercise 13.13. We say that a language L ⊆ Σ∗ has a neutral letter if there exists a ∈ Σ such that for every two strings s, s′ ∈ Σ∗ , we have s · s′ ∈ L iff s · a · s′ ∈ L. Now let Ω be a set of arithmetic predicates. We say that a language L is FO(Ω)definable if there is an FO sentence ΦL of vocabulary σΣ ∪ Ω such that MΩ s |= ΦL iff s ∈ L. Here MΩ s is the structure Ms expanded with the interpretation of Ωpredicates on its universe. The following statement is known as the Crane Beach conjecture for Ω: if L is FO(Ω)-definable and has a neutral letter, then it is star-free. • Use Exercise 13.10 to prove that the Crane Beach conjecture is true when Ω = {+++} (the graph of the addition operation). • Prove that the Crane Beach conjecture is false when Ω = {+++,×××} (hint: use Theorem 6.12). Exercise 13.14. Consider the structure Σ∗ , ≺, (fa)a∈Σ , where ≺ is the prefix relation, and fa : Σ∗ → Σ∗ is defined by fa(x) = x · a. Prove that this structure has the restricted quantifier collapse. Prove that it still has the restricted quantifier collapse when augmented with the following: • The predicate PL, for each regular language L, that is true of s iff s is in L. • The functions ga : Σ∗ → Σ∗ defined by ga(x) = a · x. Exercise 13.15. Suppose S is an infinite set, and C ⊆ 2S is a family of subsets of S. Let F ⊂ S be finite; we say that C shatters F if the collection {F ∩ C | C ∈ C} is ℘(F), the powerset of F. The Vapnik-Chervonenkis (VC) dimension of C is the maximal cardinality of a finite set shattered by C. If arbitrarily large finite sets are shattered by C, we let the VC dimension be ∞. If M is a structure and ϕ(x, y) is an FO(M) formula, with | x |= n, | y |= m, then for each a ∈ Un , we define ϕ(a, M) = {b ∈ Um | M |= ϕ(a, b)}, and let Fϕ(M) be {ϕ(a, M) | a ∈ Un }. Families of sets arising in such a way are called definable families. We say that M has finite VC dimension if every definable family in M has finite VC dimension. Prove that if M admits the restricted quantifier collapse, then it has finite VC dimension. Exercise 13.16. Consider an expansion M of Σ∗ , ≺, (fa)a∈Σ with the predicate el(x, y) which is true iff |x |=|y |. We have seen this structure in Chap. 7 (Exercise 7.20); it defines precisely the regular relations. Prove that FO(M, σ) cannot express even. Exercise 13.17.∗ For the structure M of Exercise 13.16, is FOgen act (M, σ) contained in FOact(U<, σ)? 14 Other Applications of Finite Model Theory In this final chapter, we briefly outline three different application areas of finite model theory. In mathematical logic, finite models are used as a tool for proving decidability results for satisfiability of FO sentences. In the area of temporal logics and verification, one analyzes the behavior of certain logics on some special finite structures (Kripke structures). And finally, it was recently discovered that many constraint satisfaction problems can be reduced to the existence of a homomorphism between two finite structures. 14.1 Finite Model Property and Decision Problems The classical decision problem in mathematical logic is the satisfiability problem for FO sentences: that is, Given a first-order sentence Φ, does it have a model? We know that in general, satisfiability is undecidable. However, a complete classification of decidable fragments in terms of quantifier-prefix classes exists. For the rest of the section, we assume that the vocabulary is purely relational. We have already seen classes of formulae defined by their quantifier prefixes in Sect. 12.4. For a regular expression r over the alphabet {∃, ∀}, we denote by FO(r) the set of all prenex sentences Q1x1 . . . Qnxn ϕ(x1, . . . , xn), where the string Q1 . . . Qn is in the language denoted by r. Here, each Qi is either ∃ or ∀, and ϕ is quantifier-free. It is known that there are precisely two maximal prefix classes for which the satisfiability problem is decidable: these are FO(∃∗ ∀∗ ) (known as the BernaysSch¨onfinkel class), and FO(∃∗ ∀∃∗ ) (known as the Ackermann class). The proof technique in both cases relies on the following property. 276 14 Other Applications of Finite Model Theory Definition 14.1. We say that a class K of sentences has the finite model property if for every sentence Φ in K, either Φ is unsatisfiable, or it has a finite model. In other words, in a class K that has the finite model property, every satisfiable sentence has a finite model. It turns out that both FO(∃∗ ∀∗ ) and FO(∃∗ ∀∃∗ ) have the finite model property, and, furthermore, there is an upper bound on the size of a finite model of Φ in terms of Φ , the size of Φ. We prove this for the BernaysSch¨onfinkel class. Proposition 14.2. If Φ is a satisfiable sentence of FO(∃∗ ∀∗ ), then it has a model whose size is at most linear in Φ . Proof. Let Φ be ∃x1 . . . ∃xn∀y1 . . . ∀ym ϕ(x, y), where ϕ is quantifier-free. Let ψ(x) be ∀y ϕ(x, y). Since Φ is satisfiable, it has a model A. Let a1, . . . , an witness the existential quantifiers: that is, A |= ψ(a). Let A′ be the finite substructure of A whose universe is {a1, . . . , an}. Since ψ is a universal formula, it is preserved under taking substructures. Hence, A′ |= ψ(a), and therefore, A′ |= Φ. Thus, we have shown that Φ has a model whose universe has at most n elements. This immediately gives us the decision procedure for the class FO(∃∗ ∀∗ ): given a sentence Φ with n existential quantifiers, look at all nonisomorphic structures whose universes are of size up to n, and check if any of them is a model of Φ. This algorithm also suggests a complexity bound: one can guess a structure A with |A| ≤ n, and check if A |= Φ. Notice that in terms of Φ , the size of such a structure could be exponential. For each relation symbol R of arity m, there could be up to nm different tuples in RA . Since there is no a priori bound on the arity of R, it may well depend on Φ , which gives us an exponential upper bound on A . Hence, the algorithm runs in nondeterministic exponential time. It turns out that one cannot improve this bound. Theorem 14.3. The satisfiability problem for FO(∃∗ ∀∗ ) is Nexptime- complete. If we have a vocabulary of bounded arity (i.e., there is a constant k such that every relation symbol has arity at most k), then the size of a structure on n elements is at most polynomial in n. Thus, in this case one has to check if A |= ϕ, where A is polynomial in n. As we know from the results on the combined complexity of FO, this can be done in Pspace. Hence, for a vocabulary of bounded arity, the satisfiability problem for FO(∃∗ ∀∗ ) is in Pspace. 14.1 Finite Model Property and Decision Problems 277 We now see an application of this decidability result in database theory. In Chap. 6, we studied conjunctive queries: those of the form ∃xϕ, where ϕ is a conjunction of atomic formulae. We also saw (Exercise 6.19) that containment of conjunctive queries is NP-complete. Another class of queries often used in database theory is unions of conjunctive queries; that is, queries of the form Q1 ∪ . . . ∪ Qm, where each Qi is a conjunctive query. Can the decidability of containment be extended to union of conjunctive queries? That is, is it decidable whether Q(A) ⊆ Q′ (A) for all A, when Q and Q′ are unions of conjunctive queries? We now give the positive answer using the decidability of the Bernays-Sch¨onfinkel class. Putting all existential quantifiers in front, we can assume without loss of generality that Q is given by ϕ(x) ≡ ∃y α(x, y), and Q′ by ψ(x) ≡ ∃y β(x, y), where α and β are monotone Boolean combinations of atomic formulae. Our goal is to check whether Φ ≡ ∀x (ϕ(x) → ψ(x)) is a valid sentence. Assuming that y and z are distinct variables, we can rewrite Φ as ∀x ∀y ∃z ¬α(x, y) ∨ β(x, z) . We know that Φ is valid iff ¬Φ is not satisfiable. But ¬Φ is equivalent to ∃x ∃y ∀z α ∧ ¬β ; that is, to an FO(∃∗ ∀∗ ) sentence. This gives us the fol- lowing. Proposition 14.4. Fix a relational vocabulary σ. Let Q and Q′ be unions of conjunctive queries over σ. Then testing whether Q ⊆ Q′ is decidable in Pspace. The complexity bound given by the reduction to the Bernays-Sch¨onfinkel class is not the optimal one, but it is not very far off: for a fixed vocabulary σ, the complexity of containment of unions of conjunctive queries is known to be Πp 2 -complete. We now move to the Ackermann class FO(∃∗ ∀∃∗ ). Again, we have the finite model property. Theorem 14.5. Let Φ be an FO(∃∗ ∀∃∗ ) sentence. If Φ is satisfiable, then it has a model whose size is at most exponential in Φ . Even though the size of the finite model jumps from linear to exponential, the complexity of the decision problem does not get worse, and in fact in some cases the problem becomes easier. Theorem 14.6. The satisfiability problem for FO(∃∗ ∀∃∗ ) is Nexptimecomplete. Furthermore, when restricted to sentences that do not mention equality, the problem becomes Exptime-complete. Finally, we consider finite variable restrictions of FO. Recall that FOk refers to the fragment of FO that consists of formulae in which at most k distinct variables are used. 278 14 Other Applications of Finite Model Theory green ¬red ¬yellow ¬red ¬green yellow red ¬green ¬yellow Fig. 14.1. An example of a Kripke structure Theorem 14.7. FO2 has the finite model property: each satisfiable FO2 sentence has a finite model whose size is at most exponential in Φ . Furthermore, the satisfiability problem for FO2 is Nexptime-complete. The satisfiability problem for FOk , k > 2, is undecidable. 14.2 Temporal and Modal Logics In this section, we look at logics that are used in verifying temporal properties of reactive systems. The finite structure in this case is usually a transition system, or a Kripke structure. It can be viewed as a labeled directed graph, where the nodes describe possible states the system could be in, and the edges indicate when a transition from one state to another is possible. To describe possible states of the system, one uses a collection of propositional variables, and specifies which of them are true in a given state. An example of a Kripke structure is given in Fig. 14.1. We have three propositional variables, red, green, and yellow. The states are those in which only one variable is true, and the other two are false. As expected, from a red light one can go to green, from green to yellow, and from yellow to red, and the system can stay in any of these states. Sometimes edges of Kripke structures are labeled too, but since it is easy to push those labels back into the states, we shall assume that edges are not labeled. Thus, formally, a Kripke structure, for a finite alphabet Σ, is a finite structure K = S, E, (Pa)a∈Σ , where S is the set of states, E is a binary relation on S, and for each a ∈ Σ, Pa is a unary relation on S, i.e., a subset of S. Since assigning relations Pa can be viewed as labeling states with letters from Σ, we shall also refer to the labeling function λ : S → 2Σ , given by 14.2 Temporal and Modal Logics 279 λ(s) = {a ∈ Σ | s ∈ Pa}. We now define the simplest of the logics we deal with in this section: the propositional modal logic, ML. Its formulae are given by the following grammar: ϕ, ψ ::= a (a ∈ Σ) | ϕ ∧ ψ | ¬ϕ | ϕ | ♦ϕ. (14.1) The semantics of ML formulae is given with respect to a Kripke structure K and a state s. That is, each formula defines a set of states where it holds. The formal definition of the semantics is as follows: • (K, s) |= a, a ∈ Σ iff a ∈ λ(s); • (K, s) |= ϕ ∧ ψ iff (K, s) |= ϕ and (K, s) |= ψ; • (K, s) |= ¬ϕ iff (K, s) |= ϕ; • (K, s) |= ϕ iff (K, s′ ) |= ϕ for all s′ such that (s, s′ ) ∈ E; • (K, s) |= ♦ϕ iff (K, s′ ) |= ϕ for some s′ such that (s, s′ ) ∈ E. Thus, is the “for all” modality, and ♦ is the “there exists” modality: ϕ (♦ϕ) means that ϕ holds in every (in some) state to which there is an edge from the current state. Notice also that ♦ is superfluous since ♦ϕ is equivalent to ¬ ¬ϕ. ML can be translated into FO as follows. For each ML formula ϕ, we define an FO formula ϕ◦ (x) such that (K, s) |= ϕ iff K |= ϕ◦ (s). This is done as follows: • a◦ ≡ Pa(x); • (ϕ ∧ ψ)◦ ≡ ϕ◦ ∧ ψ◦ ; • (¬ϕ)◦ ≡ ¬ϕ◦ ; • ( ϕ)◦ ≡ ∀y R(x, y) → ∀x x = y → ϕ◦ (x) . For the translation of ϕ, we employed the technique of reusing variables that was central in Chapter 11. Thus, ϕ◦ is always an FO2 formula, as it uses only two variables: x and y. Summing up, we obtained the following. Proposition 14.8. Every formula of the propositional modal logic ML is equivalent to an FO2 formula. Consequently, every satisfiable formula ϕ of ML has a model which is at most exponential in ϕ . The expressiveness of ML is rather limited; in particular, since it is a fragment of FO, it cannot express reachability properties which are of utmost importance in verifying properties of finite-state systems. We thus move to more expressive logics, LTL and CTL. 280 14 Other Applications of Finite Model Theory The formulae of the linear time temporal logic, LTL, are given by the following grammar: ϕ, ϕ′ ::= a (a ∈ Σ) | ¬ϕ | ϕ ∧ ϕ′ | Xϕ | ϕUϕ′ . (14.2) The formulae of the computation tree logic, CTL, are given by ϕ, ϕ′ ::= a (a ∈ Σ) | ¬ϕ | ϕ ∧ ϕ′ | EXϕ | AXϕ | E(ϕUϕ′ ) | A(ϕUϕ′ ). (14.3) In both of these logics, we talk about properties of paths in the Kripke structure. A path in K is an infinite sequence of nodes π = s1s2 . . . such that (si, si+1) ∈ E for all i. Of course, in a finite structure, some of the nodes must occur infinitely often on a path. The connective X means “next time”, or “for the next node on the path”. The connective U is “until”: ϕ holds until some point where ϕ′ holds. E is the existential quantifier “there is a path”, and A is the universal quantifier: “for all paths”. To give the formal semantics, we introduce a logic that subsumes both LTL and CTL. This logic, denoted by CTL∗ , has two kinds of formulae: state formulae denoted by ϕ, and path formulae denoted by ψ. These are given by the following two grammars: ϕ, ϕ′ ::= a (a ∈ Σ) | ¬ϕ | ϕ ∧ ϕ′ | Eψ | Aψ ψ, ψ′ ::= ϕ | ¬ψ | ψ ∧ ψ′ | Xψ | ψUψ′ . (14.4) The semantics of a state formula is again given with respect to a Kripke structure K and a state s. The semantics of a path formula ψ is given with respect to K and a path π in K. If π = s1s2s3 . . ., we shall write πk for the path starting at sk; that is, sksk+1 . . .. Formally, we define the semantics as follows: • (K, s) |= a, a ∈ Σ iff a ∈ λ(s); • (K, s) |= ϕ ∧ ϕ′ iff (K, s) |= ϕ and (K, s) |= ϕ′ ; • (K, s) |= ¬ϕ iff (K, s) |= ϕ; • (K, s) |= Eψ iff there is a path π = s1s2 . . . such that s1 = s and (K, π) |= ψ; • (K, s) |= Aψ iff for every path π = s1s2 . . . such that s1 = s, we have (K, π) |= ψ; • if ϕ is a state formula, and π = s1s2 . . ., then (K, π) |= ϕ iff (K, s1) |= ϕ; • (K, π) |= ψ ∧ ψ′ iff (K, π) |= ψ and (K, π) |= ψ′ ; • (K, π) |= ¬ψ iff (K, π) |= ψ; • (K, π) |= Xψ iff (K, π2 ) |= ψ; 14.2 Temporal and Modal Logics 281 • (K, π) |= ψUψ′ if there exists k ≥ 1 such that (K, πk ) |= ψ′ and (K, πi ) |= ψ for all i < k. Note that LTL formulae are path formulae, and CTL formulae are state formulae. LTL formulae are typically evaluated along a single infinite path (hence the name linear temporal logic). On the other hand, CTL is well-suited to describe branching processes (hence the name computation tree logic). If we want to talk about an LTL formula ψ being true in a given state of a Kripke structure, we shall mean that the formula Aψ is true in that state. Some derived formulae are often useful in describing temporal properties. For example, Fψ ≡ trueUψ, means “eventually”, or sometime in the future, ψ holds, and Gψ ≡ ¬F¬ψ means “always”, or “globally”, ψ holds (true itself can be assumed to be a formula in any of the logics: for example, a ∨ ¬a). Thus, AGψ means that ψ holds along every path starting from a given state, and EFψ means that along some path, ψ eventually holds. For the example in Fig. 14.1, consider a CTL formula AG(yellow → AFgreen), saying that if the light is yellow, it will eventually become green. This formula is actually false in the structure shown in Fig. 14.1, since yellow can continue to hold indefinitely long due to the loop. However, AG yellow → (AGyellow∨AFgreen) , saying that either yellow holds forever or eventually changes to green, is true in that structure. The main difference between CTL and LTL is that CTL is better suited for talking about branching paths that start in a given node (this is the reason logics like CTL are sometimes referred to as branching-time logics), while LTL, on the other hand, is better suited for talking about properties of a single path starting in a given node (and thus one speaks of a linear-time logic). For example, consider the CTL formula AG(EFa). It says that along every path from a given node, from every node there is a path that leads to a state labeled a. It is known that this formula is not expressible in LTL. The formula A(FGa), saying that on every path, starting from some node a will hold forever, is a state formula resulting by applying the A quantifier to the LTL formula FGa; this formula is not expressible in CTL. While all the examples seen so far could have been specified in other logics used in this book – for example, MSO or LFP – the main advantage of these temporal logics is that the model-checking problem for them can be solved efficiently. The model-checking problem is to determine whether (K, s) |= ϕ, for some Kripke structure K, state s, and a formula ϕ. The data complexity for CTL∗ and its sublogics can easily be seen to be polynomial (since CTL∗ formulae can be expressed in LFP), but it turns out that the situation is much better than this. Theorem 14.9. The model-checking problem for ML, LTL, CTL, and CTL∗ is fixed-parameter linear. For logics ML and CTL it can be solved in time O( ϕ · K ) and for LTL and CTL∗ , the bound is 2O( ϕ ) · K . 282 14 Other Applications of Finite Model Theory We illustrate the idea of the proof for the case of ML. Suppose we have a formula ϕ and a Kripke structure K. Consider all the subformulae ϕ1, . . . , ϕk of ϕ listed in an order that ensures that if ϕj is a subformula of ϕi, then j < i. The algorithm then inductively labels each state s of K with either ϕi or ¬ϕi, depending on which formula holds in that state. For the base case, there is nothing to do since the states are already labeled with either a or ¬a for each a ∈ Σ. For the induction, the only nontrivial case is when ϕi ≡ ϕj for some j ≤ i. Then for each state s, we check all the states s′ with (s, s′ ) ∈ E, and see if all such s′ have been labeled with ϕj in the jth step: if so, we label s by ϕi; if not, we label it by ¬ϕi. This algorithm can be implemented in time O( ϕ · K ). Next, we look at the connection between temporal and modal logics and other logics for finite structures we have seen. We already mentioned that ML can be embedded into FO2 . What about LTL? We can answer this question for a simple kind of Kripke structures used in Chap. 7: these are structures of the vocabulary σΣ = (<, (Pa)a∈Σ), used to represent strings. Theorem 14.10. Over finite strings viewed as structures of vocabulary σΣ, LTL and FO are equally expressive: LTL = FO. Interestingly, Theorem 14.10 holds for ω-strings as well, but this is outside the scope of this book. For CTL, one needs to talk about different paths, and hence one should be able to express reachability properties such as “can a state labeled a be reached from a state labeled b”? This suggests a close connection between CTL and logics that can express the transitive closure operator. We illustrate this by means of the following example. Consider a CTL formula AFa stating that along every path, a eventually holds. We now express this in a variant of Datalog. Let (Π1, T ) be the following Datalog¬ program: R(x, y) :– ¬Pa(x), E(x, y) R(x, y) :– ¬Pa(z), R(x, z), E(z, y) This program computes a subset of the transitive closure: the set of pairs (b, b′ ) for which there is a path b = b1, b2, . . . , bn−1, bn = b′ such that none of the bi’s, i < n, is labeled a. Next, we define a program (Π2, U) that uses R as an extensional predicate: U(x) :– R(x, x) U(x) :– ¬Pa(x), E(x, y), U(y) Suppose we have an infinite path over K. Since K is finite, it must have a loop. If there is a loop such that R(x, x) holds, then there is an infinite path from x such that ¬a holds along this path. If we have any other path such that ¬a holds along it, then it starts with a few edges and eventually enters a loop in 14.2 Temporal and Modal Logics 283 which no node is labeled a. Hence, U is the set of nodes from which there is an infinite path on which ¬a holds. Thus, taking the program (Π3, Q) given by Q(x) :– ¬U(x) we get a program that computes AFa. Notice that this program is stratified (for each stratum, the negated predicates are those defined in the previous strata) and linear (each intensional predicate appears at most once in the right hand sides of rules). The above translation techniques can be extended to prove the following. Theorem 14.11. CTL formulae can be expressed in either of the following: • the linear stratified Datalog¬; • the transitive closure logic TrCl. Next, we define a fixed point modal logic, called the µ-calculus and denoted by Calcµ, that subsumes LTL, CTL, and CTL∗ . Consider the propositional modal logic ML, and extend its syntax with propositional variables x, y, . . ., viewed as monadic second-order variables (i.e., each such variable denotes a set of states). Now formulae have free variables. Suppose we have a formula ϕ(x, y) where x occurs positively in ϕ. Then µx.ϕ(x, y) is a formula with free variables y. To define the semantics of ψ(y) ≡ µx.ϕ(x, y) on a Kripke structure K, assume that each yi from y is interpreted as a propositional variable: that is, a subset Yi of S consisting of nodes where it holds. Then ϕ(x, Y ) defines an operator FY ϕ : 2S → 2S given by FY ϕ (X) = {s ∈ S | (K, s) |= ϕ(X, Y )}. If x occurs positively, then this operator is monotone. We define the semantics of the µ operator by (K, s) |= µx.ϕ(x, Y ) ⇔ s ∈ lfp(FY ϕ ). Consider, for example, the formula µx.a ∨ x. This formula is true in (K, s) if along each path starting in s, a will eventually become true. Hence, this is the CTL formula AFa. In general, every CTL∗ formula can be expressed in Calcµ. Each Calcµ formula ϕ can be translated into an LFP formula ϕ◦ (x) such that (K, s) |= ϕ iff K |= ϕ◦ (s). Furthermore, one can show that Calcµ formulae can be translated into MSO formulae as well. Summing up, we have the following relationship between the temporal logics: ML LTL CTL CTL∗ Calcµ LFP MSO . 284 14 Other Applications of Finite Model Theory a a a a a K1 K2 Fig. 14.2. Bisimulation equivalence In the µ-calculus, it is common to use both least and greatest fixed points. The latter are definable by νx.ϕ(x) ≡ ¬µx.¬ϕ(¬x), assuming that x occurs positively in ϕ. Notice that negating both ϕ and each occurrence of x in it ensures that if x occurs positively in ϕ, then it occurs positively in ¬ϕ(¬x), and hence the least fixed point is well-defined. Using the greatest and the least fixed points, the formulae of the µ-calculus can be written in the alternating style so that negations are applied only to propositions. We shall denote the fragment of the µ-calculus that consists of such alternating formulae with alternation depth at most k by Calck µ. Theorem 14.12. The complexity of the model-checking problem for Calck µ is O( ϕ · K k ). Since Calcµ can be embedded into LFP, its data complexity is polynomial. The combined complexity is known to be in NP ∩ coNP. Furthermore, Calcµ has the finite model property: if ϕ is a satisfiable formula of Calcµ, then there is a Kripke structure K of size at most exponential in ϕ such that (K, s) |= ϕ for some s ∈ S. Finally, we present another way to connect temporal logics with other logics seen in this book. Since logics like Calcµ talk about temporal properties of paths, they cannot distinguish structures in which all paths agree on all temporal properties, even if the structures themselves are different. For example, consider the structures K1 and K2 shown in Fig. 14.2. Even though they are different, all the paths realized in these structures are the same: an infinite path on which every node is labeled a. Calcµ cannot see the difference between them, although these structures are easily distinguished by the FO sentence “There is a node with two distinct successors.” One can formally capture this notion of indistinguishability using the definition of bisimilarity. Let K = S, E, (Pa)a∈Σ and K′ = S′ , E′ , (P′ a)a∈Σ . We say that (K, s) and (K′ , s′ ) are bisimilar if there is a binary relation R ⊆ S ×S′ such that • (s, s′ ) ∈ R; 14.3 Constraint Satisfaction and Homomorphisms of Finite Models 285 • if (u, u′ ) ∈ R, then Pa(u) iff P′ a(u′ ), for all a ∈ Σ; • if (u, u′ ) ∈ R and (u, v) ∈ E, then there is v′ ∈ S′ such that (v, v′ ) ∈ R and (u′ , v′ ) ∈ E′ ; • if (u, u′ ) ∈ R and (u′ , v′ ) ∈ E′ , then there is v ∈ S such that (v, v′ ) ∈ R and (u, v) ∈ E. A property of Kripke structures is bisimulation-invariant if whenever it holds in (K, s), it also holds in every (K′ , s′ ) which is bisimilar to (K, s). As we have seen, even FO can express properties which are not bisimulationinvariant, but Calcµ and its sublogics only express bisimulation-invariant properties. The following result shows how to use bisimulation-invariance to relate temporal logics and other logics seen in this book. Theorem 14.13. • The class of bisimulation-invariant properties expressible in FO is precisely the class of properties expressible in ML. • The class of bisimulation-invariant properties expressible in MSO is precisely the class of properties expressible in Calcµ. 14.3 Constraint Satisfaction and Homomorphisms of Finite Models Constraint satisfaction problems are problems of the following kind. Suppose we are given a set V of variables, a finite domain D where the variables can take values, and a set of constraints C. The problem is whether there exists an assignment of values to variables that satisfies all the constraints. Each constraint in the set C is specified as a pair (v, R) where v is a tuple of variables from V , of length n, and R is an n-ary relation on D. The assignment of values to variables is then a mapping h : V → D. Such a mapping satisfies the constraint (v, R) if h(v) ∈ R. For example, satisfiability of certain propositional formulae can be viewed as a constraint satisfaction problem. Consider the MONOTONE 3-SAT problem. That is, we have a CNF formula ϕ(x1, . . . , xm) in which every clause is either (xi ∨ xj ∨ xk), or (¬xi ∨ ¬xj ∨ ¬xk). Consider the constraint satisfaction problem where V = {x1, . . . , xn}, D = {0, 1}, and for each clause (xi∨xj∨xk) we have a constraint ((xi, xj, xk), {0, 1}3 −{(0, 0, 0)}), and for each clause (¬xi ∨¬xj ∨¬xk) we have a constraint ((xi, xj, xk), {0, 1}3 −{(1, 1, 1)}). Then the resulting constraint satisfaction problem (V, D, C) has a solution iff ϕ is satisfiable. There is a nice representation of constraint satisfaction problems in terms of the existence of a certain homomorphism between finite structures. Suppose we are given a constraint satisfaction problem P = (V, D, C). Let RP 1 , . . . , RP l list all the relations mentioned in C. Let σP = (R1, . . . , Rl). We define two σP -structures as follows: 286 14 Other Applications of Finite Model Theory AP = V, {v | (v, RP 1 ) ∈ C}, . . . , {v | (v, RP m) ∈ C} BP = D, RP 1 , . . . , RP m . Then P has a solution ⇔ there exists a homomorphism h : AP → BP . Thus, the constraint satisfaction problem is really the problem of checking whether there is a homomorphism between two structures. We thus use the notation CSP(A, B) ⇔ there exists a homomorphism h : A → B. To see another example, let Km be the clique on m elements. Then CSP(G, Km) holds iff G is m-colorable. The constraint satisfaction problem can easily be related to conjunctive query evaluation. Suppose we have a vocabulary σ that consists only of relation symbols, and a σ-structure A. Let A = {a1, . . . , an}. We define the Boolean conjunctive query CQA as CQA ≡ ∃x1 . . . ∃xn R∈σ (ai1 ,...,aim )∈RA R(xi1 , . . . , xim ). Proposition 14.14. CSP(A, B) is true iff B |= CQA. If C and C′ are two classes of structures, then we write CSP(C, C′ ) for the class of problems CSP(A, B) where A ∈ C and B ∈ C′ . We use All for the class of all finite structures. The m-colorability example shows that CSP(All, All) contains NP-hard problems. Furthermore, each problem in CSP(All, All) can be solved in NP: given A and B, we simply guess a mapping h : A → B, and check, in polynomial time, if it is a homomorphism between A and B. Thus, CSP(All, All) is NP-complete. This naturally leads to the following question: under what conditions is CSP(C, C′ ) tractable? We first answer this question in the setting suggested by the examples of MONOTONE 3-SAT and m-colorability. In both of these examples, we were interested in the problem of the form CSP(All, B); that is, in the existence of a homomorphism into a fixed structure. This is a very common class of constraint satisfaction problems. We shall write CSP(B) for CSP(All, B). Thus, the first question we address is when CSP(B) can be guaranteed to be tractable. All problems of the form CSP(B) whose complexity is known fall into two categories: they are either tractable, or NP-complete. This is a real dichotomy: if Ptime = NP, there are NP problems which are neither tractable 14.3 Constraint Satisfaction and Homomorphisms of Finite Models 287 nor NP-complete. In fact, it has been conjectured that for any B, the problem CSP(B) is either tractable, or NP-complete. In general, this conjecture remains unproven, but some partial solutions are known. For example: Theorem 14.15. For every B with |B| ≤ 3, CSP(B) is either tractable, or NP-complete. Moreover, for the case of |B| = 2 (so-called Boolean constraint satisfaction problem), one can classify precisely for which structures B the corresponding problem CSP(B) is tractable. For more general structures B, one can use logical definability to find some fairly large classes that guarantee tractability. If one tries to think of a logic in which CSP(B) can be expressed, one immediately thinks of MSO. Indeed, suppose that the universe of B is {b0, . . . , bn−1}. Then the MSO sentence characterizing CSP(B) is of the form ∃X0 . . . ∃Xn−1 Ψ, where Ψ is an FO sentence stating that, on a structure A expanded with n sets X0, . . . , Xn−1, the sets Xi form a partition of A, and the map defined by sending all elements of Xi into bi, for i = 0, . . . , n − 1, is a homomorphism from A to B. However, while in many cases MSO is tractable, in general it is not suitable to establish tractability results without putting restrictions on a class of structures A, since MSO can express NP-complete problems. To express CSP(B) in a tractable logic, we instead consider the negation of CSP(B): that is, ¬CSP(B) = {A | there is no homomorphism h : A → B}. If A ∈ ¬CSP(B) and A is a substructure of A′ , then A′ ∈ ¬CSP(B). This monotonicity property suggests that for some B, the class ¬CSP(B) could be definable in a rather expressive tractable monotone language such as Datalog. If this were the case, then CSP(B) would be tractable as well. Trying to express ¬CSP(B) in Datalog may be a bit hard, but it turns out that instead one could attempt to express ¬CSP(B) in a richer infinitary logic. Theorem 14.16. For each B, the problem ¬CSP(B) is expressible in Datalog iff it is expressible in ∃Lω ∞ω. Thus, one general way of achieving tractability is to show that the negation of the constraint satisfaction problem is expressible in the existential fragment of the very rich finite variable logic Lω ∞ω. Moving back to the general problem CSP(C, C′ ), one may ask whether CSP(C, C′ ) is tractable whenever CSP(C, B) is tractable for all B ∈ C′ . The 288 14 Other Applications of Finite Model Theory answer to this is negative: for each fixed graph G, the problem CSP({Km | m ∈ N}, G) is tractable, but CSP({Km | m ∈ N}, All) is not. However, for the class of structures above, a uniform version of the tractability result can be shown. Theorem 14.17. Let CDatalogk be the class of structures B such that ¬CSP(B) is expressible by a Datalog program that uses at most k distinct variables. Then CSP(All, CDatalogk ) is in Ptime. Yet another tractable restriction uses the notion of treewidth encountered in Chap. 6. If we let TWk be the class of graphs of treewidth at most k, then one can show that ¬CSP(TWk, B) is expressible in Datalog (in fact, in the k-variable fragment of Datalog). Hence, CSP(TWk, B) is tractable. In fact, this can be generalized as follows. We call two structures A and B homomorphically equivalent if there exist homomorphisms h : A → B and h′ : B → A. Let HTWk be the class of all structures homomorphically equivalent to a structure in TWk. Theorem 14.18. CSP(HTWk, All) can be expressed in LFP (in fact, using at most 2k variables) and consequently is in Ptime. Thus, definability results for fixed point and finite variable logics describe rather large classes of tractable constraint satisfaction problems. 14.4 Bibliographic Notes A comprehensive survey of decidable and undecidable cases for the satisfiability problem is given in B¨orger, Gr¨adel, and Gurevich [25]. It describes both the Bernays-Sch¨onfinkel and Ackermann classes, and proves complexity bounds for them. The finite model property for FO2 is due to Mortimer [184]; the complexity bound is from Gr¨adel, Kolaitis, and Vardi [100]. The Πp 2 -completeness of containment of unions of conjunctive queries is due to Sagiv and Yannakakis [211]. There are a number of books and surveys in which temporal and modal logics are described in detail: van Benthem [240], Clarke, Grumberg, and Peled [37], Emerson [64, 65], Vardi [246]. Theorem 14.10 is from Kamp [141]. Abiteboul, Herr, and Van den Bussche [2] showed that Kamp’s theorem no longer holds if one moves from strings to arbitrary structures. It is also known that for the translation from LTL to FO, three variables suffice (i.e., over strings, LTL equals FO3 , see, e.g., Schneider [214]), but two variables do not suffice (as shown by Etessami, Vardi, and Wilke [69]). The example of expressing a CTL property in Datalog is from Gottlob, Gr¨adel, and Veith [93], and Theorem 14.11 is from [93] and Immerman and Vardi [136]. Equivalence of bisimulation-invariant FO and modal logic is from van Benthem [240], and 14.4 Bibliographic Notes 289 the corresponding result for MSO and Calcµ is from Janin and Walukiewicz [138]; for a related result about CTL∗ , see Moller and Rabinovich [183]. Constraint satisfaction is a classical AI problem (see, e.g., Tsang [235]). The idea of viewing constraint satisfaction as the existence of a homomorphism between two structures is due to Feder and Vardi [77]. They also suggested using expressibility in Datalog as a tool for proving tractability, and formulated the dichotomy conjecture. Theorem 14.15 is due to Schaefer [213] (for |B| = 2) and Bulatov [28] (for |B| = 3). The existence of complexity classes between Ptime and NP-complete, mentioned before Theorem 14.15, is due to Ladner [159]. Other results in that section are from Kolaitis and Vardi [156] and Dalmau, Kolaitis, and Vardi [48]. The converse of Theorem 14.18 was proved recently by Grohe [112]. References 1. S. Abiteboul, K. Compton, and V. Vianu. Queries are easier than you thought (probably). In ACM Symp. on Principles of Database Systems, 1992, ACM Press, pages 23–32. 2. S. Abiteboul, L. Herr, and J. Van den Bussche. Temporal connectives versus explicit timestamps to query temporal databases. Journal of Computer and System Sciences, 58 (1999), 54–68. 3. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases, AddisonWesley, 1995. 4. S. Abiteboul, M. Y. Vardi, and V. Vianu. Fixpoint logics, relational machines, and computational complexity. Journal of the ACM, 44 (1997), 30–56. 5. S. Abiteboul and V. Vianu. Fixpoint extensions of first-order logic and dataloglike languages. In Proc. IEEE Symp. on Logic in Computer Science, 1989, pages 71–79. 6. S. Abiteboul and V. Vianu. Computing with first-order logic. Journal of Computer and System Sciences, 50 (1995), 309–335. 7. J.W. Addison, L. Henkin, A. Tarski, eds. The Theory of Models. NorthHolland, 1965. 8. F. Afrati, S. Cosmadakis, and M. Yannakakis. On datalog vs. polynomial time. Journal of Computer and System Sciences, 51 (1995), 177–196. 9. A. Aho and J. Ullman. The universality of data retrieval languages. In Proc. ACM Symp. on Principles of Programming Languages, 1979, ACM Press, pages 110–120. 10. M. Ajtai. Σ1 1 formulae on finite structures. Annals of Pure and Applied Logic, 24 (1983), 1–48. 11. M. Ajtai and R. Fagin. Reachability is harder for directed than for undirected graphs. Journal of Symbolic Logic, 55 (1990), 113–150. 12. M. Ajtai, R. Fagin, and L. Stockmeyer. The closure of monadic NP. Journal of Computer and System Sciences, 60 (2000), 660–716. 13. M. Ajtai and Y. Gurevich. Monotone versus positive. Journal of the ACM, 34 (1987), 1004–1015. 292 References 14. G. Asser. Das Repr¨asentantenproblem im Pr¨adikatenkalk¨ul der Ersten Stufe mit Identit¨at. Zeitschrift f¨ur Mathematische Logik und Grundlagen der Mathematik, 1 (1955), 252–263. 15. D.A.M. Barrington, N. Immerman, C. Lautemann, N. Schweikardt, and D. Th´erien. The Crane Beach conjecture. In IEEE Symp. on Logic in Computer Science, 2001, pages 187–196. 16. D.A.M. Barrington, N. Immerman, and H. Straubing. On uniformity within NC1 . Journal of Computer and System Sciences, 41 (1990), 274–306. 17. J. Barwise. On Moschovakis closure ordinals. Journal of Symbolic Logic, 42 (1977), 292–296. 18. J. Barwise and S. Feferman, eds. Model-Theoretic Logics. Springer-Verlag, 1985. 19. M. Benedikt and L. Libkin. Relational queries over interpreted structures. Journal of the ACM, 47 (2000), 644–680. 20. M. Benedikt and L. Libkin. Tree extension algebras: logics, automata, and query languages. In IEEE Symp. on Logic in Computer Science, 2002, pages 203–212. 21. M. Benedikt, L. Libkin, T. Schwentick, and L. Segoufin. Definable relations and first-order query languages over strings. Journal of the ACM, 50 (2003), 694–751. 22. A. Blass, Y. Gurevich, and D. Kozen. A zero-one law for logic with a fixed-point operator. Information and Control, 67 (1985), 70–90. 23. A. Blumensath and E. Gr¨adel. Automatic structures. In IEEE Symp. on Logic in Computer Science, 2000, pages 51–62. 24. H. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on Computing, 25 (1996), 1305–1317. 25. E. B¨orger, E. Gr¨adel, and Y. Gurevich. The Classical Decision Problem. Springer-Verlag, 1997. 26. V. Bruy`ere, G. Hansel, C. Michaux, and R. Villemaire. Logic and precognizable sets of integers. Bulletin of the Belgian Mathematical Society, 1 (1994), 191–238. 27. J.R. B¨uchi. Weak second-order arithmetic and finite automata. Zeitschrift f¨ur Mathematische Logik und Grundlagen der Mathematik, 6 (1960), 66–92. 28. A. Bulatov. A dichotomy theorem for constraints on a three-element set. IEEE Symp. on Foundations of Computer Science, 2002, pages 649–658. 29. S. R. Buss. First-order proof theory of arithmetic. In Handbook of Proof Theory, Elsevier, Amsterdam, 1998, pages 79–147. 30. J. Cai, M. F¨urer, and N. Immerman. On optimal lower bound on the number of variables for graph identification. Combinatorica, 12 (1992), 389–410. 31. P.J. Cameron. The random graph revisited. In Eur. Congr. of Mathematics, Vol. 1, Progress in Mathematics, Birkh¨auser, 2001, pages 267–274. 32. A. Chandra and D. Harel. Computable queries for relational databases. Journal of Computer and System Sciences, 21 (1980), 156–178. 33. A. Chandra and D. Harel. Structure and complexity of relational queries. Journal of Computer and System Sciences, 25 (1982), 99–128. References 293 34. A. Chandra and P. Merlin. Optimal implementation of conjunctive queries in relational data bases. In ACM Symp. on Theory of Computing, 1977, pages 77–90. 35. C.C. Chang and H.J. Keisler. Model Theory. North-Holland, 1990. 36. O. Chapuis and P. Koiran. Definability of geometric properties in algebraically closed fields. Mathematical Logic Quarterly, 45 (1999), 533–550. 37. E. Clarke, O. Grumberg, and D. Peled. Model Checking. The MIT Press, 1999. 38. H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree Automata: Techniques and Applications. Available at www.grappa.univ-lille3.fr/tata. October 2002. 39. S.A. Cook. The complexity of theorem-proving procedures. In Proc. ACM Symp. on Theory of Computing, 1971, ACM Press, pages 151–158. 40. S.A. Cook. Proof complexity and bounded arithmetic. Manuscript, Univ. of Toronto, 2002. 41. S.A. Cook and Y. Liu. A complete axiomatization for blocks world. In Proc. 7th Int. Symp. on Artificial Intelligence and Mathematics, January, 2002. 42. S. Cosmadakis. Logical reducibility and monadic NP. In Proc. IEEE Symp. on Foundations of Computer Science, 1993, pages 52–61. 43. S. Cosmadakis, H. Gaifman, P. Kanellakis, and M. Vardi. Decidable optimization problems for database logic programs. In ACM Symp. on Theory of Computing, 1988, pages 477–490. 44. B. Courcelle. Graph rewriting: an algebraic and logic approach. In Handbook of Theoretical Computer Science, Vol. B, North-Holland, 1990, pages 193–242. 45. B. Courcelle. On the expression of graph properties in some fragments of monadic second-order logic. In [134], pages 33–62. 46. B. Courcelle. The monadic second-order logic on graphs VI: on several representations of graphs by relational structures. Discrete Applied Mathematics, 54 (1994), 117–149. 47. B. Courcelle and J. Makowsky. Fusion in relational structures and the verification of monadic second-order properties. Mathematical Structures in Computer Science, 12 (2002), 203–235. 48. V. Dalmau, Ph. Kolaitis, and M. Vardi. Constraint satisfaction, bounded treewidth, and finite-variable logics. Proc. Principles and Practice of Constraint Programming, Springer-Verlag LNCS 2470, 2002, pages 310–326. 49. A. Dawar. A restricted second order logic for finite structures. Logic and Computational Complexity, Springer-Verlag, LNCS 960, 1994, pages 393–413. 50. A. Dawar, K. Doets, S. Lindell, and S. Weinstein. Elementary properties of finite ranks. Mathematical Logic Quarterly, 44 (1998), 349–353. 51. A. Dawar and Y. Gurevich. Fixed point logics. Bulletin of Symbolic Logic, 8 (2002), 65-88. 52. A. Dawar and L. Hella. The expressive power of finitely many generalized quantifiers. Information and Computation, 123 (1995), 172–184. 53. A. Dawar, S. Lindell, and S. Weinstein. Infinitary logic and inductive definability over finite structures. Information and Computation, 119 (1995), 160–175. 294 References 54. A. Dawar, S. Lindell, and S. Weinstein. First order logic, fixed point logic, and linear order. In Computer Science Logic, Springer-Verlag LNCS Vol. 1092, 1995, pages 161–177. 55. L. Denenberg, Y. Gurevich, and S. Shelah. Definability by constant-depth polynomial-size circuits. Information and Control, 70 (1986), 216–240. 56. M. de Rougemont. Second-order and inductive definability on finite structures. Zeitschrift f¨ur Mathematische Logik und Grundlagen der Mathematik, 33 (1987), 47–63. 57. G. Dong, L. Libkin, and L. Wong. Local properties of query languages. Theoretical Computer Science, 239 (2000), 277–308. 58. R. Downey and M. Fellows. Parameterized Complexity. Springer-Verlag, 1999. 59. D.-Z. Du, K.-I. Ko. Theory of Computational Complexity. Wiley-Interscience, 2000. 60. H.-D. Ebbinghaus and J. Flum. Finite Model Theory. Springer-Verlag, 1995. 61. H.-D. Ebbinghaus, J. Flum, and W. Thomas. Mathematical Logic. SpringerVerlag, 1984. 62. A. Ehrenfeucht. An application of games to the completeness problem for formalized theories. Fundamenta Mathematicae, 49 (1961), 129–141. 63. T. Eiter, G. Gottlob, and Y. Gurevich. Existential second-order logic over strings. Journal of the ACM, 47 (2000), 77–131. 64. E.A. Emerson. Temporal and modal logic. In Handbook of Theoretical Computer Science, Vol. B, North-Holland, 1990, pages 995–1072. 65. E.A. Emerson. Model checking and the mu-calculus. In [134], pages 185–214. 66. H. Enderton. A Mathematical Introduction to Logic. Academic-Press, 1972. 67. P. Erd¨os and A. R´enyi. Asymmetric graphs. Acta Mathematicae Academiae Scientiarum Hungaricae, 14 (1963), 295–315. 68. K. Etessami. Counting quantifiers, successor relations, and logarithmic space. Journal of Computer and System Sciences, 54 (1997), 400–411. 69. K. Etessami, M.Y. Vardi, and T. Wilke. First-order logic with two variables and unary temporal logic. Information and Computation, 179 (2002), 279–295. 70. R. Fagin. Generalized first-order spectra and polynomial-time recognizable sets. In Complexity of Computation, R. Karp, ed., SIAM-AMS Proceedings, 7 (1974), 43–73. 71. R. Fagin. Monadic generalized spectra. Zeitschrift f¨ur Mathematische Logik und Grundlagen der Mathematik, 21 (1975), 89–96. 72. R. Fagin. A spectrum hierarchy. Zeitschrift f¨ur Mathematische Logik und Grundlagen der Mathematik, 21 (1975), 123–134. 73. R. Fagin. Probabilities on finite models. Journal of Symbolic Logic, 41 (1976), 50–58. 74. R. Fagin. Finite-model theory — a personal perspective. Theoretical Computer Science, 116 (1993), 3–31. 75. R. Fagin. Easier ways to win logical games. In [134], pages 1–32. 76. R. Fagin, L. Stockmeyer, and M.Y. Vardi. On monadic NP vs monadic co-NP. Information and Computation, 120 (1994), 78–92. References 295 77. T. Feder and M.Y. Vardi. The computational structure of monotone monadic SNP and constraint satisfaction: a study through datalog and group theory. SIAM Journal on Computing, 28 (1998), 57–104. 78. T. Feder and M.Y. Vardi. Homomorphism closed vs. existential positive. IEEE Symp. on Logic in Computer Science, 2003, pages 311–320. 79. S. Feferman and R. Vaught. The first order properties of products of algebraic systems. Fundamenta Mathematicae, 47 (1959), 57–103. 80. J. Flum, M. Frick, and M. Grohe. Query evaluation via tree-decompositions. Journal of the ACM, 49 (2002), 716–752. 81. J. Flum and M. Grohe. Fixed-parameter tractability, definability, and modelchecking. SIAM Journal on Computing 31 (2001), 113–145. 82. J. Flum and M. Ziegler. Pseudo-finite homogeneity and saturation. Journal of Symbolic Logic, 64 (1999), 1689–1699. 83. H. Fournier. Quantifier rank for parity of embedded finite models. Theoretical Computer Science, 295 (2003), 153–169. 84. R. Fra¨ıss´e. Sur quelques classifications des syst`emes de relations. Universit´e d’Alger, Publications Scientifiques, S´erie A, 1 (1954), 35–182. 85. M. Frick and M. Grohe. The complexity of first-order and monadic secondorder logic revisited. In IEEE Symp. on Logic in Computer Science, 2002, pages 215–224. 86. M. Furst, J. Saxe, and M. Sipser. Parity, circuits, and the polynomial-time hierarchy. Mathematical Systems Theory, 17 (1984), 13–27. 87. H. Gaifman. Concerning measures in first-order calculi. Israel Journal of Mathematics, 2 (1964), 1–17. 88. H. Gaifman. On local and non-local properties, Proc. Herbrand Symp., Logic Colloquium ’81, North-Holland, 1982. 89. H. Gaifman and M.Y. Vardi. A simple proof that connectivity is not first-order definable. Bulletin of the EATCS, 26 (1985), 43–45. 90. F. G´ecseg and M. Steinby. Tree languages. In Handbook of Formal Languages, Vol. 3. Springer-Verlag, 1997, pages 1–68. 91. F. Gire and H. K. Hoang. A more expressive deterministic query language with efficient symmetry-based choice construct. In Logic in Databases, Int. Workshop LID’96, Springer-Verlag, 1996, pages 475–495. 92. Y.V. Glebskii, D.I. Kogan, M.A. Liogon’kii, and V.A. Talanov Range and degree of realizability of formulas in predicate calculus (in Russian). Kibernetika,2 (1969), 17–28. 93. G. Gottlob, E. Gr¨adel, and H. Veith. Datalog LITE: a deductive query language with linear time model checking. ACM Transactions on Computational Logic, 3 (2002), 42–79. 94. G. Gottlob and C. Koch. Monadic datalog and the expressive power of languages for Web information extraction. Journal of the ACM, 51 (2004), 74–113. 95. G. Gottlob, Ph. Kolaitis, and T. Schwentick. Existential second-order logic over graphs: charting the tractability frontier. In IEEE Symp. on Foundations of Computer Science, 2000, pages 664–674. 296 References 96. G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. Journal of the ACM, 48 (2001), 431–498. 97. E. Gr¨adel. Capturing complexity classes by fragments of second order logic. Theoretical Computer Science, 101 (1992), 35–57. 98. E. Gr¨adel and Y. Gurevich. Metafinite model theory. Information and Computation, 140 (1998), 26–81. 99. E. Gr¨adel, Ph. Kolaitis, L. Libkin, M. Marx, J. Spencer, M.Y. Vardi, Y. Venema, S. Weinstein. Finite Model Theory and its Applications. Springer-Verlag, 2004. 100. E. Gr¨adel, Ph. Kolaitis, and M.Y. Vardi. On the decision problem for twovariable first-order logic. Bulletin of Symbolic Logic, 3 (1997), 53–69. 101. E. Gr¨adel and G. McColm. On the power of deterministic transitive closures. Information and Computation, 119 (1995), 129–135. 102. E. Gr¨adel and M. Otto. Inductive definability with counting on finite structures. Proc. Computer Science Logic, 1992, Springer-Verlag, pages 231–247. 103. R.L. Graham, B.L. Rothschild and J.H. Spencer. Ramsey Theory. John Wiley & Sons, 1990. 104. E. Grandjean. Complexity of the first-order theory of almost all finite structures. Information and Control, 57 (1983), 180–204. 105. E. Grandjean and F. Olive. Monadic logical definability of nondeterministic linear time. Computational Complexity, 7 (1998), 54–97. 106. M. Grohe. The structure of fixed-point logics. PhD Thesis, University of Freiburg, 1994. 107. M. Grohe. Fixed-point logics on planar graphs. In IEEE Symp. on Logic in Computer Science, 1998, pages 6–15. 108. M. Grohe. Equivalence in finite-variable logics is complete for polynomial time. Combinatorica, 19 (1999), 507–532. 109. M. Grohe. The parameterized complexity of database queries. In ACM Symp. on Principles of Database Systems, 2001, ACM Press, pages 82–92. 110. M. Grohe. Large finite structures with few Lk -types. Information and Computation, 179 (2002), 250–278. 111. M. Grohe. Parameterized complexity for the database theorist. SIGMOD Record, 31 (2002), 86–96. 112. M. Grohe. The complexity of homomorphism and constraint satisfaction problems seen from the other side. In IEEE Symp. on Foundations of Computer Science, 2003, pages 552–561. 113. M. Grohe and T. Schwentick. Locality of order-invariant first-order formulas. ACM Transactions on Computational Logic, 1 (2000), 112–130. 114. M. Grohe, T. Schwentick, and L. Segoufin. When is the evaluation of conjunctive queries tractable? In ACM Symp. on Theory of Computing, 2001, pages 657–666. 115. S. Grumbach and J. Su. Queries with arithmetical constraints. Theoretical Computer Science, 173 (1997), 151–181. 116. Y. Gurevich. Toward logic tailored for computational complexity. In Computation and Proof Theory, M. Richter et al., eds., Springer Lecture Notes in References 297 Mathematics, Vol. 1104, 1984, pages 175–216. 117. Y. Gurevich. Logic and the challenge of computer science. In Current trends in theoretical computer science, E. B¨orger, ed., Computer Science Press, 1988, pages 1–57. 118. Y. Gurevich, N. Immerman, and S. Shelah. McColm’s conjecture. In IEEE Symp. on Logic in Computer Science, 1994, 10–19. 119. Y. Gurevich and S. Shelah. Fixed-point extensions of first-order logic. Annals of Pure and Applied Logic, 32 (1986), 265–280. 120. W. Hanf. Model-theoretic methods in the study of elementary logic. In [7], pages 132–145. 121. L. Hella. Logical hierarchies in PTIME. Information and Computation, 129 (1996), 1–19. 122. L. Hella, Ph. Kolaitis, and K. Luosto. Almost everywhere equivalence of logics in finite model theory. Bulletin of Symbolic Logic, 2 (1996), 422–443. 123. L. Hella, L. Libkin, and J. Nurmonen. Notions of locality and their logical characterizations over finite models. Journal of Symbolic Logic, 64 (1999), 1751–1773. 124. L. Hella, L. Libkin, J. Nurmonen, and L. Wong. Logics with aggregate operators. Journal of the ACM, 48 (2001), 880–907. 125. W. Hodges. Model Theory. Cambridge University Press, 1993. 126. J. Hopcroft and J. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. 127. R. Hull and J. Su. Domain independence and the relational calculus. Acta Informatica, 31 (1994), 513–524. 128. N. Immerman. Upper and lower bounds for first order expressibility. Journal of Computer and System Sciences, 25 (1982), 76–98. 129. N. Immerman. Relational queries computable in polynomial time (extended abstract). In ACM Symp. on Theory of Computing, 1982, ACM Press, pages 147–152. 130. N. Immerman. Relational queries computable in polynomial time. Information and Control, 68 (1986), 86–104. 131. N. Immerman. Languages that capture complexity classes. SIAM Journal on Computing, 16 (1987), 760–778. 132. N. Immerman. Nondeterministic space is closed under complementation. SIAM Journal on Computing, 17 (1988), 935–938. 133. N. Immerman. Descriptive Complexity. Springer-Verlag, 1998. 134. N. Immerman and Ph. Kolaitis, eds. Descriptive Complexity and Finite Models, Proc. of a DIMACS workshop. AMS, 1997. 135. N. Immerman and E. Lander. Describing graphs: a first order approach to graph canonization. In Complexity Theory Retrospective, Springer-Verlag, Berlin, 1990. 136. N. Immerman and M.Y. Vardi. Model checking and transitive-closure logic. In Proc. Int. Conf. on Computer Aided Verification, Springer-Verlag LNCS 1254, 1997, pages 291–302. 298 References 137. D. Janin and J. Marcinkowski. A toolkit for first order extensions of monadic games. Proc. of Symp. on Theoretical Aspects of Computer Science, SpringerVerlag LNCS vol. 2010, Springer Verlag, 2001, 353–364. 138. D. Janin and I. Walukiewicz. On the expressive completeness of the propositional mu-calculus with respect to monadic second order logic. In Proc. of CONCUR’96, Springer-Verlag LNCS 1119, 1996, pages 263–277. 139. D.S. Johnson. A catalog of complexity classes. In Handbook of Theoretical Computer Science, Vol. A, North-Holland, 1990, pages 67–161. 140. N. Jones and A. Selman. Turing machines and the spectra of first-order formulas. Journal of Symbolic Logic, 39 (1974), 139–150. 141. H. Kamp. Tense logic and the theory of linear order. PhD Thesis, University of California, Los Angeles, 1968. 142. P. Kanellakis, G. Kuper, and P. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51 (1995), 26–52. 143. C. Karp. Finite quantifier equivalence. In [7], pages 407–412. 144. M. Kaufmann and S. Shelah. On random models of finite power and monadic logic. Discrete Mathematics, 54 (1985), 285–293. 145. B. Khoussainov and A. Nerode. Automata Theory and its Applications. Birkh¨auser, 2001. 146. S. Kleene. Arithmetical predicates and function quantifiers. Transactions of the American Mathematical Society, 79 (1955), 312–340. 147. Ph. Kolaitis. Languages for polynomial-time queries – an ongoing quest. In Proc. 5th Int. Conf. on Database Theory, Springer-Verlag, 1995, pages 38–39. 148. Ph. Kolaitis. On the expressive power of logics on finite models. In [99]. 149. Ph. Kolaitis and J. V¨a¨an¨anen. Generalized quantifiers and pebble games on finite structures. Annals of Pure and Applied Logic, 74 (1995), 23–75. 150. Ph. Kolaitis and M.Y. Vardi. The decision problem for the probabilities of higher-order properties. In ACM Symp. on Theory of Computing, 1987, pages 425–435. 151. Ph. Kolaitis and M.Y. Vardi. 0-1 laws and decision problems for fragments of second-order logic. Information and Computation, 87 (1990), 301–337. 152. Ph. Kolaitis and M.Y. Vardi. Infinitary logic and 0-1 laws. Information and Computation, 98 (1992), 258–294. 153. Ph. Kolaitis and M.Y. Vardi. Fixpoint logic vs. infinitary logic in finite-model theory. In IEEE Symp. on Logic in Computer Science, 1992, pages 46–57. 154. Ph. Kolaitis and M.Y. Vardi. On the expressive power of Datalog: tools and a case study. Journal of Computer and System Sciences, 51 (1995), 110–134. 155. Ph. Kolaitis and M.Y. Vardi. 0-1 laws for fragments of existential secondorder logic: a survey. In Proc. Mathematical Foundations of Computer Science, Springer-Verlag LNCS 1893, 2000, pages 84–98. 156. Ph. Kolaitis and M.Y. Vardi. Conjunctive-query containment and constraint satisfaction. Journal of Computer and System Sciences, 61 (2000), 302–332. 157. B. Kuijpers, J. Paredaens, and J. Van den Bussche. Topological elementary equivalence of closed semi-algebraic sets in the real plane. Journal of Symbolic Logic, 65 (2000), 1530–1555. References 299 158. G. Kuper, L. Libkin, and J. Paredaens, eds. Constraint Databases. SpringerVerlag, 2000. 159. R.E. Ladner. On the structure of polynomial time reducibility. Journal of the ACM, 22 (1975), 155–171. 160. R.E. Ladner. Application of model theoretic games to discrete linear orders and finite automata. Information and Control, 33 (1977), 281–303. 161. C. Lautemann, N. Schweikardt, and T. Schwentick. A logical characterisation of linear time on nondeterministic Turing machines. In Proc. Symp. on Theoretical Aspects of Computer Science, Springer-Verlag LNCS 1563, 1999, pages 143– 152. 162. C. Lautemann, T. Schwentick, and D. Th´erien. Logics for context-free languages. In Proc. Computer Science Logic 1994, Springer-Verlag, 1995, pages 205–216. 163. J.-M. Le Bars. Fragments of existential second-order logic without 0-1 laws. In IEEE Symp. on Logic in Computer Science, 1998, pages 525–536. 164. J.-M. Le Bars. The 0-1 law fails for monadic existential second-order logic on undirected graphs. Information Processing Letters, 77 (2001), 43–48. 165. D. Leivant. Inductive definitions over finite structures. Information and Computation 89 (1990), 95–108. 166. L. Libkin. On counting logics and local properties. ACM Transactions on Computational Logic, 1 (2000), 33–59. 167. L. Libkin. Logics capturing local properties. ACM Transactions on Computational Logic, 2 (2001), 135–153. 168. L. Libkin. Embedded finite models and constraint databases. In [99]. 169. L. Libkin and L. Wong. Query languages for bags and aggregate functions. Journal of Computer and System Sciences, 55 (1997), 241–272. 170. L. Libkin and L. Wong. Lower bounds for invariant queries in logics with counting. Theoretical Computer Science, 288 (2002), 153–180. 171. S. Lindell. An analysis of fixed-point queries on binary trees. Theoretical Computer Science, 85 (1991), 75–95. 172. A.B. Livchak. Languages for polynomial-time queries (in Russian). In Computer-based Modeling and Optimization of Heat-power and Electrochemical Objects Sverdlovsk, 1982, page 41. 173. J. Lynch. Almost sure theories. Annals of Mathematical Logic, 18 (1980), 91–135. 174. J. Lynch. Complexity classes and theories of finite models. Mathematical Systems Theory, 15 (1982), 127–144. 175. R.C. Lyndon. An interpolation theorem in the predicate calculus. Pacific Journal of Mathematics, 9 (1959), 155–164. 176. J. Makowsky. Model theory and computer science: an appetizer. In Handbook of Logic in Computer Science, Vol. 1, Oxford University Press, 1992. 177. J. Makowsky. Algorithmic aspects of the Feferman-Vaught Theorem. Annals of Pure and Applied Logic, 126 (2004), 159–213. 178. J. Makowsky and Y. Pnueli. Arity and alternation in second-order logic. Annals of Pure and Applied Logic, 78 (1996), 189–202. 300 References 179. J. Marcinkowski. Achilles, turtle, and undecidable boundedness problems for small datalog programs. SIAM Journal on Computing, 29 (1999), 231–257. 180. O. Matz, N. Schweikardt, and W. Thomas. The monadic quantifier alternation hierarchy over grids and graphs. Information and Computation, 179 (2002), 356–383. 181. G.L. McColm. When is arithmetic possible? Annals of Pure and Applied Logic, 50 (1990), 29–51. 182. R. McNaughton and S. Papert. Counter-Free Automata. MIT Press, 1971. 183. F. Moller and A. Rabinovich. On the expressive power of CTL. In IEEE Symp. on Logic in Computer Science, 1999, pages 360-369. 184. M. Mortimer. On language with two variables. Zeitschrift f¨ur Mathematische Logik und Grundlagen der Mathematik, 21 (1975), 135–140. 185. Y. Moschovakis. Elementary Induction on Abstract Structures. North-Holland, 1974. 186. F. Neven. Automata theory for XML researchers. SIGMOD Record, 31 (2002), 39–46. 187. F. Neven and T. Schwentick. Query automata on finite trees. Theoretical Computer Science, 275 (2002), 633–674. 188. J. Nurmonen. On winning strategies with unary quantifiers. Journal of Logic and Computation, 6 (1996), 779–798. 189. J. Nurmonen. Counting modulo quantifiers on finite structures. Information and Computation, 160 (2000), 62–87. 190. M. Otto. A note on the number of monadic quantifiers in monadic Σ1 1 . Information Processing Letters, 53 (1995), 337–339. 191. M. Otto. Bounded Variable Logics and Counting: A Study in Finite Models. Springer-Verlag, 1997. 192. M. Otto. Epsilon-logic is more expressive than first-order logic over finite structures. Journal of Symbolic Logic, 65 (2000), 1749–1757. 193. M. Otto and J. Van den Bussche. First-order queries on databases embedded in an infinite structure. Information Processing Letters, 60 (1996), 37–41. 194. C. Papadimitriou. A note on the expressive power of Prolog. Bulletin of the EATCS, 26 (1985), 21–23. 195. C. Papadimitriou. Computational Complexity. Addison-Wesley, 1994. 196. C. Papadimitriou and M. Yannakakis. On the complexity of database queries. Journal of Computer and System Sciences, 58 (1999), 407–427. 197. J. Paredaens, J. Van den Bussche, and D. Van Gucht. First-order queries on finite structures over the reals. SIAM Journal on Computing, 27 (1998), 1747–1763. 198. A. Pillay and C. Steinhorn. Definable sets in ordered structures. III. Transactions of the American Mathematical Society, 309 (1988), 469–476. 199. E. Pezzoli. Computational complexity of Ehrenfeucht-Fra¨ıss´e games on finite structures. Computer Science Logic 1998, Springer-Verlag, LNCS 1584, pages 159–170. 200. B. Poizat. Deux ou trois choses que je sais de Ln. Journal of Symbolic Logic, 47 (1982), 641–658. References 301 201. B. Poizat. A Course in Model Theory: An Introduction to Contemporary Mathematical Logic. Springer-Verlag, 2000. 202. M. Rabin. Decidability of second-order theories and automata on infinite trees. Transactions of the American Mathematical Society, 141 (1969), 1–35. 203. R. Rado. Universal graphs and universal functions. Acta Arithmetica, 9 (1964), 331–340. 204. N. Robertson and P. Seymour. Graph minors V. Excluding a planar graph. Journal of Combinatorial Theory, Series B, 41 (1986), 92–114. 205. N. Robertson and P. Seymour. Graph minors XIII. The disjoint paths problem. Journal of Combinatorial Theory, Series B, 63 (1995), 65–110. 206. J. Robinson. Definability and decision problems in arithmetic. Journal of Symbolic Logic, 14 (1949), 98–114. 207. E. Rosen. Some aspects of model theory and finite structures. Bulletin of Symbolic Logic, 8 (2002), 380–403. 208. E. Rosen and S. Weinstein. Preservation theorems in finite model theory. In Logic and Computational Complexity, Springer-Verlag LNCS 960, 1994, pages 480–502. 209. J. Rosenstein. Linear Orderings. Academic Press, 1982. 210. B. Rossman. Successor-invariance in the finite. In IEEE Symp. on Logic in Computer Science, 2003, pages 148–157. 211. Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. Journal of the ACM, 27 (1980), 633–655. 212. V. Sazonov. Polynomial computability and recursivity in finite domains. Elektronische Informationsverarbeitung und Kybernetik, 16 (1980), 319–323. 213. T. Schaefer. The complexity of satisfiability problems. In Proc. 10th Symp. on Theory of Computing, 1978, pages 216–226. 214. K. Schneider. Verification of Reactive Systems. Springer-Verlag, 2004. 215. T. Schwentick. On winning Ehrenfeucht games and monadic NP. Annals of Pure and Applied Logic, 79 (1996), 61–92. 216. T. Schwentick. Descriptive complexity, lower bounds and linear time. In Proc. of Computer Science Logic, Springer-Verlag LNCS 1584, 1998, pages 9–28. 217. T. Schwentick and K. Barthelmann. Local normal forms for first-order logic with applications to games and automata. In Proc. 15th Symp. on Theoretical Aspects of Computer Science (STACS’98), Springer-Verlag, 1998, pages 444– 454. 218. D. Seese. The structure of models of decidable monadic theories of graphs. Annals of Pure and Applied Logic, 53 (1991), 169–195. 219. D. Seese. Linear time computable problems and first-order descriptions. Mathematical Structures in Computer Science, 6 (1996), 505–526. 220. O. Shmueli. Decidability and expressiveness of logic queries. In ACM Symp. on Principles of Database Systems, 1987, ACM Press, pages 237–249. 221. M. Sipser. Introduction to the Theory of Computation. PWS Publishing, 1997. 222. L. Stockmeyer. The complexity of decision problems in automata and logic. PhD Thesis, MIT, 1974. 302 References 223. L. Stockmeyer. The polynomial-time hierarchy. Theoretical Computer Science, 3 (1977), 1–22. 224. L. Stockmeyer and A. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. Journal of the ACM, 49 (2002), 753–784. 225. H. Straubing. Finite Automata, Formal Logic, and Circuit Complexity. Birkh¨auser, 1994. 226. R. Szelepcs´enyi. The method of forced enumeration for nondeterministic automata. Acta Informatica, 26 (1988), 279–284. 227. V.A. Talanov and V.V. Knyazev. The asymptotic truth value of infinite formulas (in Russian), Proc. All-Union seminar on discrete mathematics and its applications, Moscow State University, Faculty of Mathematics and Mechanics, 1986, pages 56–61. 228. R. Tarjan and M. Yannakakis. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM Journal on Computing, 13 (1984), 566–579. 229. A. Tarski. A Decision Method for Elementary Algebra and Geometry. Univ. of California Press, 1951. Reprinted in Quantifier Elimination and Cylindrical Algebraic Decomposition, B. Caviness and J. Johnson, eds. Springer-Verlag, 1998, pages 24–84. 230. J. Thatcher and J. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory, 2 (1968), 57–81. 231. W. Thomas. Classifying regular events in symbolic logic. Journal of Computer and System Sciences, 25 (1982), 360–376. 232. W. Thomas. Logical aspects in the study of tree languages. In Proc. 9th Int. Colloq. on Trees in Algebra and Programming (CAAP’84), Cambridge University Press, 1984, pages 31–50. 233. W. Thomas. Languages, automata, and logic. In Handbook of Formal Languages, Vol. 3, Springer-Verlag, 1997, pages 389–455. 234. B. A. Trakhtenbrot.The impossibilty of an algorithm for the decision problem for finite models (in Russian), Doklady Academii Nauk SSSR, 70 (1950), 569– 572. 235. E. Tsang. Foundations of Constraint Satisfaction. Academic Press, 1993. 236. G. Tur´an. On the definability of properties of finite graphs. Discrete Mathematics, 49 (1984), 291–302. 237. J. V¨a¨an¨anen. Generalized quantifiers. Bulletin of the EATCS, 62 (1997), 115– 136. 238. J. V¨a¨an¨anen. Unary quantifiers on finite models. Journal of Logic, Language and Information, 6 (1997), 275–304. 239. J. V¨a¨an¨anen. A Short Course in Finite Model Theory. University of Helsinki. 44pp. Available at www.math.helsinki.fi/logic/people/jouko.vaananen. 240. J. van Benthem. Modal Logic and Classical Logic. Bibliopolis, 1983. 241. D. Van Dalen. Logic and Structure. Springer-Verlag, 1994. 242. J. Van den Bussche. Constraint databases: a tutorial introduction. SIGMOD Record, 29 (2000), 44–51. References 303 243. L. van den Dries. Tame Topology and O-Minimal Structures. Cambridge University Press, 1998. 244. M.Y. Vardi. The complexity of relational query languages. In Proc. ACM Symp. on Theory of Computing, 1982, 137–146. 245. M.Y. Vardi. On the complexity of bounded-variable queries. In ACM Symp. on Principles of Database Systems, ACM Press, 1995, pages 266–276. 246. M.Y. Vardi. Why is modal logic so robustly decidable? In [134], pages 149–183. 247. H. Vollmer. Introduction to Circuit Complexity. Springer-Verlag, 1999. 248. A.J. Wilkie. Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function. Journal of the American Mathematical Society, 9 (1996), 1051–1094. 249. M. Yannakakis. Algorithms for acyclic database schemes. In Proc. Conf. on Very Large Databases, 1981, pages 82–94. 250. M. Yannakakis. Perspectives on database theory. In IEEE Symp. on Foundations of Computer Science, 1995, pages 224–246. List of Notation ∃! 5 σ 13 A 13 R 13 STRUCT[σ] 14 FO 14 (A, A′ ) 15 σn 16 ∼= 17 ϕ(A) 17 ϕ(A, a) 17 Σ∗ 17 ǫ 17 DTIME 19 NTIME 19 Ptime 19 NP 19 Pspace 20 NLog 20 DLog 20 PH 20 Σp i 20 Πp i 20 even 24 ≡n 28 min 29 max 29 qr 32 FO[k] 32 tpk(A, a) 34 ≃k 36 parity 41 BA r 46 NA r 46 ⇆ 47 hlr 47 lr 49 ≈ 50 degree 55 deg set 55 STRUCTl[σ] 55 ⇆thr d,m 61 (FO + C)inv 69 (L+ <)inv 69 SA r 74 enc(A) 88 A 88 C 89 C 90 All 91 FO(+++,×××) 95 BIT 95 ⋊⋉ 103 H(ϕ) 105 CQk 107 SO 113 MSO 115 ∃SO 115 ∃MSO 115 ∀SO 115 ∀MSO 115 MSO[k] 116 mso-tpk 116 ≡MSO 117 FO(Cnt) 142 ∃ix 142 FO(Q) 144 L∞ω 145 145 145 L∞ω(Cnt) 146 #x.ϕ 146 rk 146 L∗ ∞ω(Cnt) 147 ≡bij 151 304 LIST OF NOTATION 305 Laggr 159 ℘ 178 lfp 178 ifp 179 pfp 180 Fϕ 180 IFP 180 PFP 180 LFP 181 LFPsimult 185 |ϕ|A 189 |a|A ϕ 189 ∃LFP 197 TrCl 199 trcl 199 posTrCl 200 FOk 212 Lk ∞ω 212 Lω ∞ω 212 PG 215 dom 218 rng 218 Iβ 218 Iα 218 tpFOk 220 A≤k 222 ϕm a (x) 222 ≈FOk 225 ≺FOk 226 ck 229 Ck 229 Grn 235 µn(P) 235 µ(P) 235 EAn,m 238 EAk 238 RG 241 EA 241 ∃SO(r) 243 adom 250 FO(M, σ) 250 ∃x∈adom 250 ∀x∈adom 250 FOact(M, σ) 251 N 253 RQC 255 FOgen (M, σ) 256 R 260 Rlin 267 K 278 ♦ϕ 279 ϕ 279 LTL 280 X 280 U 280 CTL 280 E 280 A 280 CTL∗ 280 µx.ϕ 283 CSP 286 Index Ackermann class 275 Active domain 250 formula 251 quantifier 251 Aggregate logic 159 expressiveness of 160 operator 159 Almost everywhere equivalence 245 Arity hierarchy 176 Asymptotic probability 235 of connectivity 236 of even and parity 237 of extension axioms 238 Automaton and MSO 124 deterministic 17 nondeterministic 17 tree (ranked) 130 tree (unranked) 133 Back-and-forth 36 k 218 Ball 46 Bernays-Sch¨onfinkel class 275 BNDP (bounded number of degrees property) 55 Boolean combination 15 Capturing complexity class 168 coNP 169 DLog 208 NLog 200, 208 NP 169 PH 173 Pspace 194 Ptime 192, 208 Circuit Boolean 89 family of 90 uniform 95 majority, or threshold 155 Class of structures MSO-inductive 140 of bounded treewidth 110,135 of small degree 55 Collapse active-generic 256,257 natural-active 255 restricted quantifier 255 and VC dimension 273 fails over integers 260 for the real field 261 to MSO 265 Combined complexity of conjunctive queries 104 of FO 99 of LFP 207 of MSO 139 Completeness fails over finite models 166 of games for FO 35 of games for MSO 117 Complexity combined 88 data 88 expression 88 fixed-parameter linear 100 fixed-parameter tractable 100 parameterized 100 Complexity class AC0 91 capturing of 168 coNP 20 DLog 20 Nexptime 21 Nlin 139 NLog 20 NP 19 306 INDEX 307 PH 20 Ptime 19 TC0 155 Composition method for FO 30–31,42 for MSO 118,140 Conjecture Crane Beach 273 Gurevich’s 204 McColm’s 210,234 Conjunctive query (CQ) 102 acyclic 105 combined complexity of 104 containment of 111 evaluation of 106,107, 110,111 union of 277 Connective Boolean 15 infinitary 145 Connectivity 23 and L∗ ∞ω(Cnt) 153 and embedded finite models 254,260, 265 and FO 23, 37 and Hanf-locality 48 and MSO 120 topological 268, 272 Constraint satisfaction 285–288 and bounded treewidth 288 and conjunctive queries 286 and homomorphism 286 dichotomy for 287 Data complexity of FO 92 of FO(Cnt) 155 of LFP 194 of MSO 134 over strings and trees 135 of µ-calculus 284 of temporal logics 281 of TrCl 200–203 Database constraint 267–270 relational 1–4 Datalog 196 and existential least fixed point logic 197 and Ptime 199 monotonicity of 197 with negation 196 Duplicator 26 Encoding of formulae 87 of structures 88 Extension axioms 238 and random graph 241 and zero-one law 240,244 asymptotic probability of 238 using in collapse results 265 Extensional predicates 196 Failure in the finite Beth’s theorem 42 compactness theorem 24 completeness theorem 166 Craig’s theorem 42 L¨owenheim-Skolem theorem 166 Los-Tarski theorem 42 Finite variable logic (Lω ∞ω) and fixed point logics 214 and pebble games 216 definition of 212 First-order logic (FO) 14 expressive power of 28–31,37– 40 games for 32 Fixed-parameter linearity of acyclic conjunctive queries 106 of FO for small degrees 101 of MSO and bounded treewidth 135 of MSO over strings and trees 135 of temporal logics 281 Fixed-parameter tractability and bounded treewidth 110 308 INDEX of FO on planar graphs 102 Fixed point 178 inflationary 179 least 178 partial 180 simultaneous 184 stages of 184, 186,188 FO with counting (FO(Cnt)) 142 Formula atomic 14 C-invariant 68 Hintikka 40 quantifier-free 14 FPL 100 FPT 100 Gaifman graph 45 Gaifman-locality 48 Game Ajtai-Fagin 123 and ∃MSO 123 bijective 59, 151 and L∗ ∞ω(Cnt) 151 Ehrenfeucht-Fra¨ıss´e 26 for FO 26 for MSO 116 Fagin 122 pebble 215 and Lω ∞ω 216 Halting problem 19, 166 Hanf-locality 47 Hypergraph 105 tree decomposition of 105–108 Inexpressibility of connectivity in FO(All) 94 in ∃MSO 120 in L∗ ∞ω(Cnt) 153 of arbitrary graphs in FO 23 of finite graphs in FO 37, 52 using Hanf-locality 48 even in fixed point logics 217 in FO 25 in Lω ∞ω 217 in MSO 118 of ordered sets 28 Hamiltonicity in MSO 126 parity in FO(All) 94 Inflationary fixed point logic (IFP) 180 Intensional predicates 196 Isomorphism 17 partial 27 with the k-back-and-forth property 218 Join 102 Kripke structure 278 bisimilarity of 284 Language 17 regular 18 and MSO 124 star-free 127 and FO 127 Least fixed point logic (LFP) 181 Linear order affects expressive power 69, 119,150, 153, 214 definability of 227 FO definability of 28–31 Locality of aggregate logic 160 of FO 52 of L∗ ∞ω(Cnt) 153 of order-invariant FO 73 Locality rank 49 bounds on 54,64 Hanf 47 Logic aggregate 159 CTL 280 CTL∗ 280 existential fixed point 197 INDEX 309 finite variable 212 first-order 14 FO with counting 142 infinitary 145,212 inflationary fixed point 180 least fixed point 181 Lω ∞ω 212 L∗ ∞ω(Cnt) 147 LTL 280 monadic second-order 115 µ-calculus 283 partial fixed point 180 propositional modal 279 second-order 113 existential (∃SO) 115 universal (∀SO) 115 SO-Horn 208 SO-Krom 208 transitive closure 199 Model 13 embedded finite 250 finite 13 Model-checking problem 87, 100,281 Monadic second-order logic (MSO) 115 existential (∃MSO) 115 equals MSO over strings 126 universal (∀MSO) 115 different from ∃MSO 120 µ-calculus (Calcµ) 283 Neighborhood 46 Normal form for LFP 192,194 for SO 115 for TrCl 201 Occurrence negative 181 positive 181 Operator 178 based on a formula 180 inductive 178 Order invariance 69 separation results for fixed point logics 217 FO 69 FO(Cnt) 158 Lω ∞ω 214 L∗ ∞ω(Cnt) 153 MSO 119 undecidability of 174 Ordered conjecture 210 Partial fixed point logic (PFP) 180 Polynomial hierarchy 20 and MSO 134 capturing of 173 Polynomial time 19 capturing of in ∃SO 208 over ordered structures 192 over unordered structures 204–205 Projection 103 Property bisimulation-invariant 285 finite model 276 and satisfiability 276–278 Ramsey 257 and collapse 259 Propositional modal logic (ML) 279 Quantifier active domain 251 counting 141 existential 14 generalized and Ptime 204 H¨artig 144 Rescher 144 unary 144 prefix 173,175,243, 275 rank 32 second-order 114 universal 14 310 INDEX unrestricted 251 Quantifier elimination and collapse results 255 for the random graph 247 for the real field 261 Query 17 Boolean 17 complexity of 88 conjunctive 102 definable in a logic 17 Gaifman-local 48 Hanf-local 47 invariant 68 order-invariant 69 weakly local 73 r.e. see Recursively enumerable Random graph 241 and quantifier elimination 247 collapse over 265 representations of 248 theory of 242 Rank in L∗ ∞ω(Cnt) 146 quantifier 32 for unary quantifiers 144 in FO 32 in SO 115 Reachability 2, 122 and Gaifman-locality 49 for directed graphs in ∃MSO 122 for undirected graphs in ∃MSO 122 Recursive 19 Recursively enumerable 19 RQC (restricted quantifier collapse) 255 Satisfiability for Ackermann class 277 for Bernays-Sch¨onfinkel class 276 for FO2 278 Second-order logic (SO) 113 Selection 103 Sentence 15 atomic 33 finitely satisfiable 165 finitely valid 165 quantifier rank of 32 satisfiable 16 valid 16 Simultaneous fixed point 184 elimination of 186 Sphere 74 Spoiler 26 Structure 13 canonical for FOk 229 Kripke 278 rigid 234 k-rigid 227 Symbol constant 13 function 13 relation 13 Term 14 counting 146 Theorem Abiteboul-Vianu 230 Ajtai’s 94 Beth’s 42 B¨uchi’s 124 compactness 16 completeness 16 Cook’s 173 Courcelle’s 135 Craig’s 42 Ehrenfeucht-Fra¨ıss´e 32 Fagin’s 169 Furst-Saxe-Sipser 94 Gaifman’s 60 Grohe-Schwentick 73 Gurevich’s 69 Gurevich-Shelah 191 Immerman–Szelepcs´enyi 200 Immerman-Vardi 192 L¨owenheim-Skolem 16 Los-Tarski 42 INDEX 311 Lyndon’s 43 Ramsey’s 257 Stage comparison 189 Tarski-Knaster 179 Trakhtenbrot’s 165 Theory 16 complete 242 consistent 16 decidable 242 ω-categorical 242 Threshold equivalence 61 Transitive closure 3, 17 expressible in Datalog 196 expressible in fixed point logics 182 inexpressible in aggregate logic 160 inexpressible in FO 52 violates locality 49 Transitive closure logic (TrCl) 199 positive 200 Tree 129 automata 130 decomposition 105, 107 regular languages and MSO 131 unranked 132 automata 133 regular and MSO 133 Treewidth 107 bounded 108, 110,135, 140 Turing machine 18 and logic 166–168,170– 172,193–194,201 deterministic 18 time and space bounds 19 Type atomic 226 FOk 220 expressibility of 221–225 ordering of 227–229 in L∗ ∞ω(Cnt) 152 rank-k, FO 34 expressibility of 35 finite number of 35 rank-k, MSO 116 and automata 125–126 expressibility of 116 Variable bound 15 free 14 Vocabulary 13 purely relational 14 relational 14 Zero-one law 237 and extension axioms 240 failure for MSO 247 for Lω ∞ω 237 for FO and fixed point logics 237 for fragments of SO 243–245 Name Index Abiteboul, S. VIII, 206, 207,229, 230,232, 246, 288 Afrati, F. 207 Aho, A. VII Ajtai, M. 94, 108,123, 136,174,206 Asser, G. 174 Barrington, D. A. M. 108,161, 271 Barthelmann, K. 63 Barwise, J. 232 Benedikt, M. 137,270 Blass, A. 246,247 Blumensath, A. 137 Bodlaender, H. 137 B¨orger, E. 288 Bruy`ere, V. 137 B¨uchi, J. VIII, 11, 124,136 Bulatov, A. 289 Buss, S. 108 Cai, J. 206 Cameron, J. 246 Chandra, A. VII, 108, 109,206 Chang, C.C. 21 Chapuis, O. 271 Clarke, E. 288 Compton, K. 246 Cook, S. A. 40, 108,174 Cosmadakis, S. 137, 207 Courcelle, B. 135,137 Dalmau, V. 289 Dawar, A. 109, 206,207, 232,233 Denenberg, L. 108 de Rougemont, M. 136,233 Dong, G. 63 Downey, R. 108 Ebbinghaus, H.-D. VIII, 21,40, 83, 136, 206 Ehrenfeucht, A. 26, 32, 40 Eiter, T. 174 Emerson, E. A. 288 Enderton, H. 21 Erd¨os, P. 246 Etessami, K. 161, 288 Fagin, R. VII, 6, 62, 120,122,123, 136,165, 168–174,193– 195,200, 204,246 Feder, T. 40, 289 Feferman, S. 137,232 Fellows, M. 108 Flum, J. VIII, 21, 40, 83, 108– 110,136, 206, 271 Fournier, H. 271 Fra¨ıss´e, R. 26, 32, 40 Frick, M. 108, 109,137 F¨urer, M. 206 Furst, M. 94, 108 Gaifman, H. 40, 45, 48, 63, 246 Gire, F. 206 Glebskii, Y. 246 Gottlob, G. 108,109, 174,207, 288 Gr¨adel, E. 137,161, 174,206,288 Graham, R. 270 Grandjean, E. 137,246 Grohe, M. 73, 83, 108– 110,137, 206,207, 233, 289 Grumbach, S. 270 Grumberg, O. 288 Gurevich, Yu. 40, 69, 73, 83, 108,161,174, 191,192, 204,206, 228,246,247, 288 Hanf, W. 47,62 312 NAME INDEX 313 Harel, D. VII, 206 Hella, L. 63,161, 206,207, 246 Herr, L. 288 Hoang, H. 206 Hodges, W. 21,246 Hopcroft, J. 11, 21 Hull, R. VIII, 206, 271 Immerman, N. VII, 108, 109,161, 192,195,200, 206,226, 232,271, 288 Janin, D. 136,289 Johnson, D. 21 Jones, N. 174 Kamp, H. 288 Kanellakis, P. 136,270 Karp, C. 246 Kaufmann, M. 246 Keisler, H. J. 21 Khoussainov, B. 21 Kleene, S. 174 Knaster, B. 179 Knyazev, V. 246 Koch, C. 207 Koiran, P. 271 Kolaitis, Ph. 161,174, 206,210, 232,233, 246,247, 288,289 Kozen, D. 246, 247 Kuijpers, B. 271 Kuper, G. 270 Ladner, R. 136,289 Lander, E. 161 Lautemann, C. 174, 271 Le Bars, J.-M. 246, 247 Leivant, D. 206 Leone, N. 108, 109 Libkin, L. 63, 83, 137,161, 270 Lindell, S. 207, 232,233 Livchak, A. B. 206 Luosto, K. 246 Lynch, J. 137, 246 Lyndon, R. 43,207 Makowsky, J. 40, 136, 137,174 Marcinkowski, J. 136, 207 Matz, O. 137 McColm, G. 206,210, 233 McNaughton, R. 136 Merlin, P. 108, 109 Meyer, A. 137 Moller, F. 289 Mortimer, M. 288 Moschovakis, Y. 206 Nerode, A. 21 Neven, F. 136,137 Nurmonen, J. 63, 161 Olive, F. 137 Otto, M. 83, 137,206, 207,232,270 Papadimitriou, C. 21, 108,109, 206 Papert, S. 136 Paredaens, J. 270, 271 Peled, D. 288 Pezzoli, E. 40 Pillay, A. 272 Pnueli, Y. 174 Poizat, B. 21, 232 Rabin, M. 140 Rabinovich, A. 289 Rado, R. 246 R´enyi, A. 246 Revesz, P. 270 Robertson, N. 110, 140 Robinson, J. 271 Rosen, E. 40 Rosenstein, J. 40 Rossman, B. 83 Rothschild, B. 270 Sagiv, Y. 288 Saxe, J. 94, 108 Sazonov, V. 206 Scarcello, F. 108,109 Schaefer, T. 289 314 NAME INDEX Schweikardt, N. 137,271 Schwentick, T. 63, 73, 83, 108,109,136, 137, 174 Seese, D. 108,137 Segoufin, L. 108, 109 Selman, A. 174 Seymour, P. 110,140 Shelah, S. 108,191, 192,206, 207,228,246 Shmueli, O. 207 Sipser, M. 21, 94, 108 Spencer, J. 270 Steinhorn, C. 272 Stockmeyer, L. 62, 108,136, 137,174 Straubing, H. 108,161 Su, J. 270 Szelepcs´enyi, R. 200,206 Talanov, V. 246 Tarjan, R. 108 Tarski, A. 179,271 Thatcher, J. 137 Th´erien, D. 174,271 Thomas, W. VIII, 21, 136,137 Trakhtenbrot, B. VII, 165, 166,170, 171,174, 193,195 Tur´an, G. 136 Ullman, J. D. VII, 11, 21 V¨a¨an¨anen, J. VIII, 161 van Benthem, J. 288 van Dalen, D. 21 Van den Bussche, J. 270, 271,288 Van Gucht, D. 270 Vardi, M. Y. VII, 40, 62, 108,136, 192,195, 200,206, 210,226, 232,233,246, 247, 288,289 Vaught, R. 137 Veith, H. 288 Vianu, V. VIII, 206, 207,229, 230,232, 246 Vollmer, H. 109,161 Walukiewicz, I. 289 Weinstein, S. 40, 207,232,233 Wilke, T. 288 Wilkie, A. 272 Wong, L. 63, 83 Wright, J. 137 Yannakakis, M. 108,109, 206,207, 288 Ziegler, M. 271