Set theory Giulio Lo Monaco Academic year 2019/2020, second semester Masaryk University, Brno Contents 1 Basics of first-order logic 1 1.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Well-formed formulas . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Some consequences of the axioms of ZFC 4 2.1 ZFC set theory, a starter . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 A couple of important theorems . . . . . . . . . . . . . . . . . . . 6 2.3 Numeric sets and their cardinalities . . . . . . . . . . . . . . . . . 6 3 Ordinal numbers and the cumulative hierarchy 8 3.1 Ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 The cumulative hierarchy and regularity . . . . . . . . . . . . . . 10 1 Basics of first-order logic The first lecture will be dedicated to the fundamentals of first-order logic, so that we have the means to understand what it means to formulate a mathematical statement in a rigorous manner, and what it means that such a statement is satisfied. With this in mind, we will proceed to all the structures, generally called sets, that satisfy all the axioms of ZFC axiomatic system of set theory. 1.1 Formal languages A language is the collection of the following data: • a non-empty set of sorts, denoted as si; • symbols for variables, each assigned to a sort; • the symbols for equality ≈, disjunction ∨, conjunction ∧, negation ¬, implication →; • the existential quantifier ∃ and the universal quantifier ∀; 1 • for each n ≥ 0, symbols for n-ary predicates, equipped with an assignment of sorts s1, . . . , sn; • for each n ≥ 0, symbols for n-ary functions, equipped with an assignment of sorts s1, . . . , sn and an additional sort s. Remark 1.1. When we say “set”, for the moment, we don’t mean the same thing as the object of study of set theory. These are rather just naive collections of things, with no axioms to be respected and no real mathematical content. The context in which we use the word “set” with this meaning is in fact often called naive set theory, as opposed to axiomatic set theory. Remark 1.2. In writing s1, s2 etc., one shouldn’t think of subscript numbers as actual numbers, rather as pure symbols used with no other purpose than giving names to sorts (or variables, or functions...) Definition 1.3. For n = 0, an n-ary (or nullary) function symbol is also called a constant symbol. 1.2 Terms Our next step will be to inductively define terms in a given language. Terms are particular strings of symbols that can be thought as representing the objects about which formal statements can be formulated. Every term comes with an assignment of a sort. Definition 1.4. Fixed a language L, the collection of terms is inductively defined by: 1. every constant or variable symbol is a term of the corresponding sort; 2. if t1, . . . , tn are terms of respective sorts s1, . . . , sn and f is a function of sorts s1, . . . , sn, s, then the string ft1 . . . tn is a term of sort s. 1.3 Well-formed formulas Finally, well-formed formulas, or wff’s for short, are those string of symbols in the given language that are meant to represent what we commonly call mathematical statements. They can also be defined inductively, starting from the already known terms, and using the remaining part of the language, which we didn’t use in order to define terms. Definition 1.5. Atomic formulas are strings of one of the following forms: 1. t1 ≈ t2, where t1 and t2 are terms; 2. Pt1 . . . tn where P is a symbol for predicate of sorts s1, . . . , sn and t1, . . . tn are terms such that ti is of sort si. Definition 1.6. Well-formed formulas (of first-order, finitary logic) are defined inductively by the following steps: 1. all atomic formulas are wff’s; 2 2. if φ and ψ are wff’s, then so are ¬φ and φ → ψ; 3. if Θ is a finite set of wff’s, then ∨Θ and ∧Θ are wff’s; 4. if φ is a wff and x is a symbol for variable, then ∃xφ and ∀xφ are wff’s. 1.4 Interpretations So far, terms and well-formed formulas are just meaningless strings of symbols, satisfying the only condition that they have been obtained as explained in the previous sections. We would like to associate some meaning to such strings, and subsequently to be able to decide whether the statement that they represent is true or false. First, we will say what an interpretation is, without attaching any judgment of truthfulness or falsehood to its outcomes for the moment. Let us fix a language L. We will describe an interpretation of its basic components, and then we will move on to interpret terms and wff’s as well. • To every sort s, we associate a set I(s) (again, in the naive sense). • To every n-ary predicate symbol P of sorts s1, . . . , sn we associate a subset I(P) ⊆ I(s1) × . . . × I(sn). • Finally, to every n-ary function symbol f of sorts s1, . . . , sn, s we associate a function I(f) : I(s1) × . . . × I(sn) → I(s). Now we need some notation. For a sequence ¯x = (x1, . . . , xn) of variables, where xi is of sort si, we write C¯x for I(s1) × . . . × I(sn), and we call it the “context of variables ¯x”. Now the idea is that a term t of sort s will be interpreted as a function C¯x → I(s), where ¯x is the sequence of all variable symbols appearing at least once in the explicit formulation of t. Heuristically, we should think of this as taking every possible choice of values for the given variables to the result of plugging this choice in the definition of t. More explicitly, this is done as follows: 1. if t = x of sort s, then I(t) = id : I(s) → I(s); 2. if t = ft1 . . . tn with ti of sort si, then I(t) is the composite C¯x n i=1 I(si) I(s). (I(ti))n i=1 I(f) A formula φ will be interpreted as a subset of C¯x, where C¯x is now the context of all variables appearing in the formulation of φ. Heuristically, it selects all the possible choices for the given variables that make φ hold once substituted to the respective variable symbols in it. • Given terms t1 and t2, then I(t1 ≈ t2) = {¯a ∈ C¯x|I(t1)(¯a) = I(t2)(¯a)}; • For t = Pt1 . . . tn, with ti of sort si, then I(t) = {¯a ∈ C¯x|g(¯a) ∈ I(P)}, where g = (I(ti))n i=1 : C¯x → n i=1 I(si); 3 • I(¬φ) = C¯x \ I(φ); • I(∨j∈J φj) = ∪j∈J I(φj); • I(∧j∈J φj) = ∩j∈J I(φj); • I(φ → ψ) = (C¯x \ I(φ)) ∪ I(ψ) Remark 1.7. In order to understand the interpretation of the implication symbol, remember that the expression φ → ψ is logically equivalent to ¬φ ∨ ψ Finally, we interpret the quantifiers as follows. Observe that, if y is a variable of sort s , not necessarily among the variables of ¯x, if variable y appears in φ, then I(φ) ⊆ C¯x,y, whereas we want to find an interpretation of ∃yφ and ∀yφ as subsets of C¯x. • I(∃yφ) = {¯a ∈ C¯x| there exists b ∈ I(s ) such that (¯a, b) ∈ I(φ)}; • I(∀yφ) = {¯a ∈ C¯x| for all b ∈ I(s ) we have that (¯a, b) ∈ I(φ)}. 1.5 Truth Now we turn our attention to deciding whether a formula is true under a given interpretation. We use the machinery of sequent calculus, which apparently brings in an additional complication, in that it introduces a new symbol to the pile that we already have, but in fact offers a slick and elegant way of defining truth. Definition 1.8. A sequent is a string of the form Θ ⇒ φ, where Θ is a finite set of formulas, φ is a single formula and ⇒ is a new symbol, not comprised in the language. We will say that a sequent of the form above is true under a given interpretation I if, being C¯x the context of all variables in Θ and φ, then I(∧Θ) ⊆ I(φ) as subsets of C¯x. An closer inspection will reveal that this definition of truth reflects in fact the intuitive notion of it that we commonly have (think: if all the premises in the antecedent Θ are true, then we conclude that φ is also true). 2 Some consequences of the axioms of ZFC The purpose is to understand what a theory is and what it means to model it. We will then narrow our attention to the case of ZFC set theory. 2.1 ZFC set theory, a starter Definition 2.1. • A theory is a pair (L, A), where L is a formal language and A is a set of sequents, called the axioms of the theory. • A model for a theory is an interpretation I making all its axioms true. 4 Remark 2.2. Up to this point, whenever we used the word “set” we were doing so in a naive sense. The sets which are used in order to define an interpretation, or the set of axioms of a theory, are all objects of naive set theory. Sets in a more rigorous sense are elements of a model for an axiomatic set theory, e.g. ZFC. Definition 2.3. Let x be a variable symbol and φ a formula in which x appears. Then x is said to be free in φ if its first occurrence is not quantified, i.e. it is not preceded by ∀ or ∃. Given a formula φ(x1, . . . , xn) where the variables xis are free, and given other a1, . . . , an variable symbols, then φ(a1, . . . , an) will denote the formula φ where all the occurrences of the variable symbols xis have been replaced by the corresponding symbols ais. Example 2.4. ZFC set theory has a language containing a binary relation symbol ∈ and a constant symbol ∅. We use the following shortcuts: • ∃!xφ(x) ↔ [∃xφ(x) ∧ ∀(φ(y) → y ≈ x) • x ≈ y ∪ z ↔ ∀w[w ∈ x ↔ (w ∈ y ∨ w ∈ z)] • x ⊆ y ↔ ∀z(z ∈ x → z ∈ y) Omitting the axioms of regularity and choice for the moment, the remaining part of the theory is: • Axiom of extensionality ∀x∀y∀z(z ∈ x ↔ z ∈ y) ↔ x ≈ y • Axiom schema of separation Given a well-formed formula φ(x) where x is free, then there is an axiom ∀z∃ya ∈ y ↔ (a ∈ z ∧ φ(a)) • Axiom of pairing ∀x∀y∃z(x ∈ z ∧ y ∈ z) • Axiom of union ∀F∃A∀x∀y(x ∈ y ∧ y ∈ F) → x ∈ A • Axiom schema of replacement Given a well-formed formula φ(x, y, A) where x, y and A are free, then there is an axiom ∀A([∀x ∈ A∃!yφ(x, y, A)] → ∃B∀y[y ∈ B ↔ ∃xφ(x, y, A)]) • Axiom of infinity ∃x[∅ ∈ x ∧ ∀y(y ∈ x → y ∪ {y} ∈ x)] • Axiom of power set ∀x∃y∀z(z ⊆ y → z ∈ y) Remark 2.5. Separation and replacement are not axioms, rather families of axioms, indexed by the (naive) set of formulas in our language. ZFC set theory is thereforse comprised of an infinite set of axioms. Example 2.6. Given a family of sets (zi)i∈I, then ∧i∈I(x ∈ zi) is a well-formed formula, therefore we can define intersection by choosing one zj and then setting ∩i∈Izi = {x ∈ zj| ∧i∈I x ∈ zi}, where we used an axiom of separation. Note that, although ∨i∈I(x ∈ zi) is also a well-formed formula, we cannot similarly define union by using separation, because there is a priori no set containing all of zis where we can apply our formula. This is precisely why we need the axiom of union. 5 2.2 A couple of important theorems Remember that we say that two sets have the same cardinality, and we write |A| = |B|, if there is a bijective function between them. We also write |A| ≤ |B| if there is an injective function A → B, and |A| < |B| will then be an abbreviation for |A| ≤ |B| ∧ |A| = |B|. We denote the power set of A, whose existence is ensured by the corresponding axiom, as P(A). Theorem 2.7 (Cantor’s theorem). Given a set A, then |A| < |P(A)|. Proof. The function A → P(A) defined by a → {a} is evidently injective, therefore |A| ≤ |P(A)|. We now show that no map from A to its power set can be surjective. Consider an arbitrary map f : A → P(A), and use separation to define a subset X = {a ∈ A|a /∈ f(a)}. Suppose there is b ∈ A such that f(b) = X. Then by examining the definition we will obtain that b ∈ X ⇔ b /∈ X, which is a contradiction. It is then impossible to have such a b, so f is not surjective. Theorem 2.8 (Cantor-Bernstein theorem). Given two sets A and B, suppose there are two injective functions A → B and B → A. Then |A| = |B|. This is proven in the theory section of the course. In particular, the proof uses the following Theorem 2.9 (Tarski’s fixed point theorem). Let X be a complete lattice and f : X → X an order-preserving function. Then there is a fixed point for f, i.e. ∃xf(x) = x. Proof. By separation, define Y = {y ∈ X|y ≤ f(y)}. Now take x = supY which exists because X is complete. We have then that ∀y ∈ Y, y ≤ f(y) ≤ f(x) because f is order-preserving, therefore by definition of supremum we obtain x ≤ f(x). Moreover, this relation implies, again since f is order-preserving, that f(x) ≤ f2 (x), which means that f(x) ∈ Y . Consequently, f(x) ≤ x. Now we have that x ≤ f(x) and f(x) ≤ x which immediately gives x = f(x). 2.3 Numeric sets and their cardinalities We now use the axioms of ZFC to construct the most common numeric sets, namely N, Z, Q and R. Definition 2.10. Given a set X, we define its successor s(X) := X ∪ {X}. Now we set 0 := ∅ and, inductively, n + 1 := s(n) = {0, . . . , n}. In particular, for instance, we have 1 = {∅}, then 2 = {∅, {∅}}, then 3 = {∅, {∅}, {∅, {∅}}} and so on. We now want to have a set containing all the objects thus defined, i.e. the set of natural numbers. In order to do that, we have to resort to the axiom of infinity. Definition 2.11. We will say that a set X is inductive if ∅ ∈ X and y ∈ X → s(y) ∈ X. 6 Observe now that what the axiom of infinity really says is “there exists an inductive set”. Now define ω := ∩X taken over the class of all inductive sets. Note that this is indeed well-defined, because it may be expressed formally as follows. Granted by the axiom of infinity that there is at least one inductive set X, we will use separation to define ω = {x ∈ X|Y inductive → x ∈ Y }. The resulting set is sometimes denoted as ω and called the set of finite ordinal numbers, sometimes as N and called the set of natural numbers. The reason for this double notation will be made clear later on. For the moment, we can just take the two as synonyms for the same object. By basic abstract algebra, we know that there is an operation of sum on N. We will use that to define integer numbers. The idea is to express an integer as a pair of natural numbers, the first of which can be thought of as having a plus sign in front, and the second as having a minus sign. The formal definition is as follows. Define an equivalence relation on N × N by (n, m) ∼ (n , m ) if and only if n + m = m + n (check that it is indeed an equivalence relation). Now set Z := N × N/ ∼. Observe that whenever n ≥ m we have that (n, m) ∼ (n − m, 0), and whenever n < m we have that (n, m) ∼ (0, m − n). Therefore, the information of an integer amounts to a natural number with the addition of a binary option, that we interpret as sign, precisely as we wanted. In a very similar way, we want to define rational numbers starting from integers. Again by abstract algebra, we know that there are both an addition and a multiplication on Z. We exploit multiplication in a manner that is much alike what we did above, just being careful not to allow division by zero. So, on the set Z × (Z\{0}) we define an equivalence relation via (p, q) ∼ (p , q ) if and only if pq = qp . Now set Q := Z × (Z \ {0})/ ∼. There are a few ways to define real numbers starting from rationals. We choose the construction through Dedekind cuts, because it turns out that it is the most convenient for what our next purpose. Definition 2.12. A subset A ⊆ Q is called downclosed if ∀x, y ∈ Q, if x ∈ A and y ≤ x then y ∈ A. Definition 2.13. A Dedekind cut is a proper downclosed subset of Q that has no maximal element. Now we call R the set of all Dedekind cuts. The idea here is to express each real number as rational approximations from below. We need to requirement that Dedekind cuts be proper subsets in order to prevent +∞ to be an element of R. Furthermore, we need them not to have a maximal element in order to avoid having double representations for all real numbers that happen to be rational. The next step is, we wonder what the cardinalities of the sets thus defined are. We set by definition ℵ0 := |N|. Proposition 2.14. |Z| = ℵ0. Proof. By Cantor-Bernstein theorem, it will suffice to define two injective functions N → Z and Z → N. The first one can be defined by n → +n. 7 For the second one, we have to distinguish two cases. Whenever n ≥ 0, it will be taken to 2n. If instead n < 0, it will be taken to −2n − 1. The two functions are clearly injective, so we are finished. Proposition 2.15. |Q| = ℵ0. Proof. We use Cantor-Bernstein theorem again. The assignment n → n 1 defines an injective function N → Q. For the opposite direction, we proceed by steps. For each rational number, there exists a unique representation p q such that p and q are coprime and q is positive. Therefore the assignment p q → (p, q) defines an injective function Q → Z × Z. By the previous proposition, we also have Z × Z ∼= N × N. We are now reduced to proving that there is an injective function N × N → N. We graphically represent the elements of the domain as in the table (0,0) (0,1) (0,2) (0,3) · · · (1,0) (1,1) (1,2) (1,3) · · · (2,0) (2,1) (2,2) (2,3) · · · (3,0) (3,1) (3,2) (3,3) · · · ... ... ... ... ... With this in mind, what we do is sending (0, 0) to 0, then the second ascending diagonal (pairs with sum 1) to the next two naturals, then the third ascending diagonal (pairs with sum 2) to the next three naturals, and so on. More precisely, we define a function by the formula (just for n + m > 0) (n, m) → n+m k=1 k + m + 1. It is straightforward to check that this function is indeed injective. Moreover, since for each natural number s there is only a finite number of pairs (n, m) such that n + m = s, it will eventually hit all natural numbers, and is thus in fact already surjective, as well. Proposition 2.16. |R| = |P(N)|. Proof. As above, we use Cantor-Bernstein. The set of real numbers is overtly a subset of P(Q), so there is an inclusion R ⊆ P(Q) ∼= P(N) by the previous proposition. For the opposite direction, observe that for a set X, its power set P(X) is isomorphic to the set of functions X → 2. Such a function can be seen as the subset of X containing precisely all elements that are sent to 1. Now, we define a function P(N) → R by sending a function h : N → 2 to the decimal representation 0.h(0)h(1)h(2) . . ., which in turn gives a Dedekind cut by successive approximations. This assignment is also easily checked to be injective. 3 Ordinal numbers and the cumulative hierar- chy We mentioned that there is a double notation for the set of natural numbers, which are sometimes called N, some other times ω. In the context of numeric 8 sets, the former notation is preferred. The latter is instead used whenever regarding the set of natural numbers as the set of finite ordinals. The ways of extending this set in the two contexts are therefore very different. While we are interested in the algebraic operations defined on natural numbers in order to extend them to integer, rational, real numbers and so on, we will be here interested in their order structure. 3.1 Ordinals Definition 3.1. A set X is said to be transitive if ∀y(y ∈ X) → (y ⊆ X). Definition 3.2. Let E be a binary relation defined on a set X. We say that X is strictly well-ordered by E if the relation E+ defined by xE+ y ↔ xEy or x = y gives a well-ordering of X. Definition 3.3. An ordinal is a transitive set that is strictly well-ordered by the relation ∈. Remark 3.4. The natural numbers as defined in the previous lectures are ordinals. Moreover, the set ω exhausts all finite ordinals. We now proceed to define new ordinals that extend the hierarchy we started building with ω. We will do this by using essentially two different tools: the notion of successor and the axiom of replacement. Given an ordinal α, we can always define its successor α+1 by α∪{α}, the same way as we did with finite ordinals. Moreover, given a set of ordinals (αi)i∈I, the axiom of union yields that we can define their union α = ∪i∈Iαi. Exercise 3.5. Show that the two procedures above indeed give rise to ordinals. In particular, iterating the successor construction gives ordinals ω + n for every finite n. By the axiom of replacement, all the sets of this form form a set, therefore we can again find their union, and we call this ω +ω. Keeping in mind the same observations, we go on defining: • ω × n as ω + . . . + ω, repeated n times; • ω × ω as the union of all ordinals ω × n; • ωn as ω × . . . × ω, repeated n times; • ωω as the union of all ordinals ωn ; • ωωn and ωωω similarly; • ωω···ω and so on. Now observe that for an ordinal there are two possibilities: either it is the successor of some other ordinal, or it is not. In the latter case, we say that an ordinal is a limit ordinal. This is the case, for instance, for all the ordinal defined above except for ω + n. 9 3.2 The cumulative hierarchy and regularity Using ordinals, we now want to construct a chain of sets such that most constructions can be seen to return sets that are elements of some member of this chain. In fact, assuming the axiom of regularity, one can prove that all sets belong to this chain at some point. • Let us define V0 := ∅; • for an ordinal α, define Vα+1 := P(Vα); • if α is a limit ordinal, then Vα := ∪β<αVβ. In the third line, we know of course that the union exists because α is a set. Now we turn our attention to another of the axioms of ZFC that we’ve been deferring. • Axiom of regularity ∀x = ∅∃y ∈ x(x ∩ y = ∅) In particular, given an x as in the formula above, such an element y will be referred to as a minimal element of x. Therefore, an alternative formulation of regularity says that every set has a minimal element. Lemma 3.6. The following are true: 1. if x is a set of ordinals, then ∪x is an ordinal; 2. if x is a set of ordinals, then ∃α ordinal such that x ⊆ α. Proof. Exercise. Definition 3.7. Given a set x, let us define fx(0) = x and then inductively fx(n + 1) = ∪fx(n) for all finite ordinals. We will call the transitive closure of x the set TC(x) = ∪n∈ωfx(n). Remark 3.8. A set x is transitive if and only if TC(x) = x. Proposition 3.9. Assuming the axiom of regularity, then ∀X∃α such that α is an ordinal and X ∈ Vα. In the proof of this theorem, we will use the following abbreviations for a set s: • s ∈ ∪α∈ONVα for ∃α ordinal such that s ∈ Vα; • s ⊆ ∪α∈ONVα for ∀t ∈ s∃α ordinal such that t ∈ Vα. Proof. Suppose there is a set a that contradicts the statements, that is, a /∈ ∪α∈ONVα. By transitivity of the Vα’s, TC(a) /∈ ∪α∈ONVα. Therefore, without loss of generality, we may assume that a is transitive. Suppose then that a ⊆ ∪α∈ONVα. For all x ∈ a, let us say that the rank of x is the least α such that x ⊆ Vα. The range of the rank function on a is a set of ordinals, which is by Lemma 3.6(2) a subset of some ordinal β. Hence a ⊆ Vβ, so a ∈ Vβ+1, which is a contradiction. Therefore we may assume a ∪α∈ONVα. Hence there must be some x ∈ a with x /∈ ∪α∈ONVα. Let b = {x ∈ a|x /∈ 10 ∪α∈ONVα}. Since b = ∅, it has a minimal element y. Since y /∈ ∪α∈ONVα, we can argue as above that y ∪α∈ONVα, so there is z such that z ∈ y and z /∈ ∪α∈ONVα. Since a is transitive, we also have that z ∈ a and therefore, by the definition of b, z ∈ b. We conclude that z ∈ y ∩ b, which is a contradiction because y was taken to be minimal in b. Definition 3.10. Given a set X, we say that X has rank α, and write rk X = α, if α = min{α|X ∈ Vα}. Remark 3.11. It is clear that assuming regularity, it follows from Proposition 3.9 that every set has a well defined rank. Theorem 3.12. Regularity holds if and only if every set is an element of some Vα. Proof. One direction has already been proven in Proposition 3.9. For the converse, suppose that x = ∅ is a set and find a minimal element of it. By assumption and the remark above, we may define a rank function on x, whose range is a set a. Let α be minimal in a, and let y ∈ x such that rk y = α. Then we must have y ∩ x = ∅, because otherwise we would have z ∈ y ∩ x, therefore rk z < rk y, which contradicts the minimality of α. Lemma 3.13. The following statements are true: 1. if α < β are ordinals, then Vα ⊂ Vβ; 2. if α < β are ordinals, then Vα ∈ Vβ; 3. if α is an ordinal, then rk α = α; 4. if X is a set and ∃ max(rk x)x∈X, then rk X = max(rk x)x∈X + 1; 5. if X is a set and max(rk x)x∈X, then rk X = sup(rk x)x∈X; 6. rk (a, b) = max{rk a, rk b} + 2. 11