Brisk guide to Mathematics Jan Slovák and Martin Panák, Michal Bulant, Vladimir Ejov, Ray Booth Brno, Adelaide, 2020 Authors: Ray Booth Michal Bulant Vladimir Ezhov Martin Panák Jan Slovák With further help of: Aleš Návrat Michal Veselý Graphics and illustrations: Petra Rychlá 2018 Masaryk University, Flinders University Contents - practice Contents - theory Chapter 1. Initial warmup 4 Chapter 1. Initial warmup 4 A. Numbers and functions 4 1. Numbers and functions 4 B. Difference equations 9 2. Difference equations 10 C. Combinatorics 13 3. Combinatorics 14 D. Probability 18 4. Probability 18 E. Plane geometry 28 5. Plane geometry 27 F. Relations and mappings 41 6. Relations and mappings 41 G. Additional exercises for the whole chapter 47 Chapter 2. Elementary linear algebra 68 Chapter 2. Elementary linear algebra 68 1. Vectors and matrices 68 A. Vectors and matrices 68 2. Determinants 81 B. Determinants 83 3. Vector spaces and linear mappings 92 C. Vector spaces and linear mappings 92 4. Properties of linear mappings 112 D. Properties of linear maps 112 E. Additional exercises for the whole chapter 124 Chapter 3. Linear models and matrix calculus 140 1. Linear optimization 140 Chapter 3. Linear models and matrix calculus 140 2. Difference equations 149 A. Linear optimization 140 3. Iterated linear processes 158 B. Difference equations 147 4. More matrix calculus 168 C. Population models 155 5. Decompositions of the matrices and D. Markov processes 162 pseudoinversions 192 E. Unitary spaces 168 F. Matrix decompositions 172 Chapter 4. Analytic geometry 230 G. Additional exercises for the whole chapter 203 1. Affine and Euclidean geometry 230 2. Geometry of quadratic forms 252 Chapter 4. Analytic geometry 230 3. Proj ecti ve geometry 259 A. Affine geometry 230 B. Euclidean geometry 239 Chapter 5. Establishing the ZOO 277 C. Geometry of quadratic forms 255 1. Polynomial interpolation 277 D. Further exercise on this chapter 269 2. Real numbers and limit processes 288 3. Derivatives 311 Chapter 5. Establishing the ZOO 277 4. Infinite sums and power series 325 A. Polynomial interpolation 277 B. Topology of real numbers and their subsets 286 Chapter 6. Differential and integral calculus 371 C. Limits 288 1. Differentiation 371 D. Continuity of functions 305 2. Integration 391 E. Derivatives 308 3. Sequences, series and limit processes 416 F. Extremal problems 314 G. L'Hospital's rule 328 Chapter 7. Continuous tools for modelling 449 H. Infinite series 334 1. Fourier series 449 I. Power series 340 2. Integral operators 471 J. Additional exercises for the whole chapter 345 3. Metric spaces 482 Chapter 6. Differential and integral calculus 371 Chapter 8. Calculus with more variables 509 A. Derivatives of higher orders 371 1. Functions and mappings on R" 509 B. Integration 391 2. Integration for the second time 547 C. Power series 425 3. Differential equations 561 D. Extra examples for the whole chapter 439 Chapter 7. Continuous tools for modelling 449 A. Orthogonal systems of functions 449 B. Fourier series 453 C. Convolution and Fourier Transform 469 D. Laplace Transform 482 E. Metric spaces 484 F. Convergence 491 G. Topology 496 H. Additional exercises to the whole chapter 501 Chapter 8. Calculus with more variables 509 A. Multivariate functions 509 B. The topology of En 512 C. Limits and continuity of multivariate functions 514 D. Tangent lines, tangent planes, graphs of multivariate functions 516 E. Taylor polynomials 525 F. Externa of multivariate functions 526 G. Implicitly given functions and mappings 531 H. Constrained optimization 533 I. Volumes, areas, centroids of solids 548 J. First-order differential equations 566 K. Practical problems leading to differential equations 577 L. Higher-order differential equations 579 M. Applications of the Laplace transform 587 N. Numerical solution of differential equations 590 0. Additional exercises to the whole chapter 595 Chapter 9. Continuous models - further selected topics 606 A. Exeterior differential calculus 606 B. Applications of Stake's theorem 606 C. Equation of heat conduction 612 D. Variational Calculus 614 E. Complex analytic functions 614 Chapter 10. Statistics and probability methods 700 A. Dots, lines, rectangles 700 B. Visualization of multidimensional data 709 C. Classical and conditional probability 712 D. What is probability? 720 E. Random variables, density, distribution function 723 F. Expected value, correlation 734 G. Transformations of random variables 739 H. Inequalities and limit theorems 741 1. Testing samples from the normal distribution 747 J. Linear regression 758 K. Bayesian data analysis 760 L. Processing of multidimensional data 765 Chapter 11. Number theory 777 A. Basic properties of divisibility 777 B. Congruences 786 C. Solving congruences 799 D. Diophantine equations 818 Chapter 9. Continuous models - further selected topics 606 1. Exterior differential calculus and integration 606 2. Remarks on Partial Differential Equations 630 3. Remarks on Variational Calculus 661 4. Complex Analytic Functions 676 Chapter 10. Statistics and probability theory 700 1. Descriptive statistics 700 2. Probability 712 3. Mathematical statistics 758 Chapter 11. Elementary number theory 777 1. Fundamental concepts 777 2. Primes 781 3. Congruences and basic theorems 786 4. Solving congruences and systems of them 799 5. Diophantine equations 814 6. Applications - calculation with large integers, cryptography 818 Chapter 12. Algebraic structures 841 1. Posets and Boolean algebras 841 2. Polynomial rings 856 3. Groups 871 4. Coding theory 891 5. Systems of polynomial equations 899 Chapter 13. Combinatorial methods, graphs, and algorithms 923 1. Elements of Graph theory 923 2. A few graph algorithms 949 3. Remarks on Computational Geometry 971 4. Remarks on more advanced combinatorial calculations 989 E. Primality tests 821 F. Encryption 825 G. Additional exercises to the whole chapter 836 Chapter 12. Algebraic structures 841 A. Boolean algebras and lattices 841 B. Rings 855 C. Polynomial rings 857 D. Rings of multivariate polynomials 863 E. Algebraic structures 868 F. Groups 871 G. Burnside's lemma 894 H. Codes 897 I. Extension of the stereographic projection 905 J. Elliptic curves 906 K. Grobner bases 910 Chapter 13. Combinatorial methods, graphs, and algorithms 923 A. Fundamental concepts 923 B. Fundamental algorithms 934 C. Minimum spanning tree 944 D. Flow networks 946 E. Classical probability and combinatorics 950 F. More advanced problems from combinatorics 956 G. Probability in combinatorics 958 H. Combinatorial games 966 I. Generating functions 969 J. Additional exercises to the whole chapter 1003 Index 1010 Preface The motivation for this textbook came from many years of lecturing Mathematics at the Faculty of Informatics at the Masaryk University in Brno. The programme requires introduction to genuine mathematical thinking and precision. The endeavor was undertaken by Jan Slovák and Martin Panák since 2004, with further collaborators joining later. Our goal was to cover seriously, but quickly, about as much of mathematical methods as usually seen in bigger courses in the classical Science and Technology programmes. At the same time, we did not want to give up the completeness and correctness of the mathematical exposition. We wanted to introduce and explain more demanding parts of Mathematics together with elementary explicit examples how to use the concepts and results in practice. But we did not want to decide how much of theory or practice the reader should enjoy and in which order. All these requirements have lead us to the two column format of the textbook, where the theoretical explanation on one side and the practical procedures and exercises on the other side are split. This way, we want to encourage and help the readers to find their own way. Either to go through the examples and algorithms first, and then to come to explanations why the things work, or the other way round. We also hope to overcome the usual stress of the readers horrified by the amount of the stuff. With our text, they are not supposed to read through the book in a linear order. On the opposite, the readers should enjoy browsing through the text and finding their own thrilling paths through the new mathematical landscapes. In both columns, we intend to present rather standard exposition of basic Mathematics, but focusing on the essence of the concepts and their relations. The exercises are addressing simple mathematical problems but we also try to show the exploitation of mathematical models in practice as much as possible. We are aware that the text is written in a very compact and non-homogeneous way. A lot of details are left to readers, in particular in the more difficult paragraphs, while we try to provide a lot of simple intuitive explanation when introducing new concepts or formulating important theorems. Similarly, the examples display the variety from very simple ones to those requesting independent thinking. We would very much like to help the reader: • to formulate precise definitions of basic concepts and to prove simple mathematical results; • to percieve the meaning of roughly formulated properties, relations and outlooks for exploring mathematical tools; • to understand the instructions and algorithms underlying mathematical models and to appreciate their usage. These goals are ambitious and there are no simple paths reaching them without failures on the way. This is one of the reasons why we come back to basic ideas and concepts several times with growing complexity and width of the discussions. Of course, this might also look chaotic but we very much hope that this approach gives a better chance to those who will persist in their efforts. We also hope, this textbook should be a perfect beginning and help for everybody who is ready to think and who is ready to return back to earlier parts again and again. To make the task simpler and more enjoyable, we have added what we call "emotive icons". We hope they will spirit the dry mathematical text and indicate which parts should be read more carefully, or better left out in the first round. The usage of the icons follows the feelings of the authors and we tried to use them in a systematic way. We hope the readers will assign the meaning to icons individually. Roughly speaking, we are using icons to indicate complexity, difficulty etc.: Further icons indicate unpleasant technicality and need of patiance, or possible entertainment and pleasure: Similarly, we use various icons in the practical column: The practical column with the solved problems and exercises should be readable nearly independently of the theory. Without the ambition to know the deeper reasons why the algorithms work, it should be possible to read mainly just this column. In order to help such readers, some definitions and descriptions in the theoretical text are marked in order to catch the eyes easily when reading the exercises. The exercises and theory are partly coordinated to allow jumping there and back, but the links are not tight. The numbering in the two columns is distinguished by using the different numberings of sections, i.e. those like 1.2.1 belong to the theoretical column, while 1.B.4 points to the practical column. The equations are numbered within subsections and their quotes include the subsection numbers if necessary. In general, our approach stresses the fact that the methods of the so called discrete Mathematics seem to be more important for mathematical models nowadays. They seem also simpler to get percieved and grasped. However, the continuous methods are strictly necessary too. First of all, the classical continuous mathematical analysis is essential for understanding of convergence and robustness of computations. It is hard to imagine how to deal with error estimates and computational complexity of numerical processes without it. Moreover, the continuous models are often the efficient and effectively computable approximations to discrete problems coming from practice. As usual with textbooks, there are numerous figures completing the exposition. We very much advise the readers to draw their own pictures whenever necessary, in particular in the later chapters, where we provide only a few. The rough structure of the book and the dependencies between its chapters are depicted in the diagram below. The darker the color is, the more demanding is the particular chapter (or at least its essential parts). In particular, the chapters 7 and 9 include a lot of material which would perhaps not be covered in the regular course activities or required at exams in great detail. The solid arrows mean strong dependencies, while the dashed links indicate only partial dependencies. In particular, the textbook could support courses starting with any of the white boxes, i.e. aiming at standard linear algebra and geometry (chapters 2 through 4), discrete chapters of mathematics (11 through 13), and the rudiments of Calculus (5, 6, 8). í. Initial warm -V Ti 5. Establishing the ZOO 6. Differential and integral calculus 7. Continuous toots for modelling <■-- 2. Elementary linear algebra I I I ___\ \ 11. Elementary number theory 4. Analytic geometry -J K 3. Linear models and matrix calculus t 8. Calculus with more variables 9. Continuous models -further selected topics "7 10. Probability and statistics % \ 1 % 1 ' 12. Algebraic structures - > 13. Combinatorics, graphs, and algorithms All topics covered in the book are now included (with more or less details) in our teaching of large four semester courses within our Mathematics minor programme, complemented by numerical seminars. In our teaching, the first semester covers chapters 1 and 2 and selected topics from chapters 3 and 4. The second semester essentially includes chapters 5,6, and 7. The third semester is now split into two parts. The first one is covered by chapter 8 (with only a few glimpses towards the more advanced topics from chapter 9), while the rest of the semester is devoted to the rudiments of the graph theory in chapter 13. The last semester provids large parts of the chapters 11 through 13. Actually, the second semester could be offered in parallel with the first one, while the fourth semester could follow immediately after the first one. Probability and statistics (chapter 10) are offered as separate course in parallel. CHAPTER 1 Initial warmup "value, difference, position " - what it is and how to comprehend it? A. Numbers and functions We can already work with natural, integer, rational and real numbers. We explain why rational numbers are not sufficient for us (although computers are actually not able to work with any other) and we recall the complex numbers (because even the real numbers are not adequate for some calculations). l.A.l. Show that the integer 2 does not have a rational square root. Solution. Already the ancient Greeks knew that if we prescribe the area of square as a2 = 2, then we cannot find a rational a to satisfy it. Why? Assume we know that (p/q)2 = 2 for natural numbers p and q that do not have common divisors greater than 1 (otherwise we can further reduce the fraction p/q). Then p2 = 2q2 is an even number. Thus, on the left-hand side p2 is even. Therefore so is p because the alternative that p is odd would imply the contradiction that p2 is odd. Hence, p is even and so p2 is divisible by 4. So q2 is even and so q must be even too. This implies thatp and q both have 2 as a common factor, which is a contradiction. □ The goal of this first chapter is to introduce the reader to the fascinating world of mathematical thinking. The name of this chapter can be also understood as an encouragement for patience. Even the simplest tasks and ideas are easy only for those who have already seen similar ones. A full knowledge of mathematical thinking can be reached only through a long and complicated course of study. We start with the simplest thing: numbers. They will also serve as the first example of how mathematical objects and theories are built. The entire first chapter will become a quick tour through various mathematical landscapes (including germs of analysis, combinatorics, probability, geometry). Perhaps sometimes our definitions and ideas will often look too complicated and not practical enough. The simpler the objects and tasks are, the more difficult the mastering of depth and all nuances of the relevant tools and procedures might be. We shall come back to all of the notions again and again in the further chapters and hopefully this will be the crucial step in the ultimate understanding. Thus the advice: do not worry if you find some particular part of the exposition too formal or otherwise difficult - come back later for another look. 1. Numbers and functions Since the dawn of time, people want to know "how much" about something they have, or "how much" is something worth, "how long" will a particular task take, etc. The answer for such ideas is usually some We consider something to be a number, if it behaves according to the usual rules - either according to all the rules we accept, or maybe 5 only to some of them. For instance, the result of multiplication does not depend on the order of multiplicands. We have the number zero whose addition to another number does not change the result. We have the number one whose product with another number does not change the result. And so on. The simplest example of numbers are the positive integers which we denote Z+ = {1,2,3,...}. The natural numbers consist of either just the positive integers, or the positive integers together with the number zero. The number zero is kind of "number" CHAPTER 1. INITIAL WARM UP l.A.2. Remark. It can be proved that for all positive natural numbers n and x the n-th root tfx of x is either natural or it is not rational, see l.G.l. A Alt diagonals %f sjmr&s imdi'oml Next, we work out some examples with complex numbers. If you are not familiar with the basic concepts and properties of complex numbers, consult the paragraphs 1.1.3 through 1.1.4 in the other column. I.A.3. Calculate z\ + z2, z\ ■ z2, zi, \z2\, fj, for a) z\ = \ — 2i, z2 = 4.2 — 3; b) z\ = 2, Z2 = 2. Solution. a) z1 + z2 = 1 - 3 - 22 + 42 = -2 + 2i, z1 ■ z2 = 1 ■ (-3) - 8i2 + 6i + 4i = 5 + 102, zi = 1 + 2i, \z21 = \/42 + (-3)2 25 ^ 25 5, a 22 l-(-3)+8i2+6i-4i 25 b) Zl + Z2 = 2 + 2, Zl ■ Z2 = 22, Zl = 2, 1221 = 1, ^ = -22. □ l.A.4. Determine r (2+3i)(l+iv5) l-i%/3 -Z V?5 either considered to be a natural number, as is usual in computer science, or not a natural number as is usual in some other contexts. Thus the set of natural numbers is either Z+, or the setN = {0,1, 2,3,... }. To count "one, two, three,..." is learned already by children in their pre-school age. Later on, we meet all the integers Z = {..., —2, —1,0,1,2,... } and finally we get used to floating-point numbers. We know what a 1.19-multiple of the price means if we have a 19% tax. 1.1.1. Properties of numbers. In order to be able to work ■ properly with numbers, we need to be careful with their definition and properties. In mathematics, the basic statements about properties of objects, whose validity is assumed without the need to prove them, are called axioms. We list the basic properties of the operations of addition and multiplication for our calculations with numbers, which we denote by letters a,b,c,.... Both operations work by taking two numbers a, b. By applying addition or multiplication we obtain the resulting values a + b and a ■ b. Properties of numbers Properties of addition: (CGI) (a + b) + c = a+ (b + c), for all a, b, c (CG2) a + b = b + a, for all a, b (CG3) there exists 0 such that for all a, a + 0 = a (CG4) for all a there exists b such that a + b = 0. The properties (CG1)-(CG4) are called the properties of a commutative group. They are called respectively associativity, commutativity, the existence of a neutral element (when speaking of addition we usually say zero element), and the existence of an inverse element (when speaking of addition we also say the negative of a and denote it by —a). Properties of multiplication: (Rl) (a-b) ■ c = a - (b- c), for all a, b, c (R2) a ■ b = b ■ a, for all a, b (R3) there exists 1 such that for all a 1 ■ a = a (R4) a ■ (b + c) = a ■ b + a ■ c, for all a, b, c. The properties (R1)-(R4) are called respectively associativity, commutativity, the existence of a unit element and dis-tributivity of addition with respect to multiplication. The sets with operation +, ■ that satisfy the properties (CG1)-(CG2) and (R1)-(R4) are called commutative rings. Two further properties of multiplication are: (F) for every a^O there exists b such that a-b = \. (ID) if a-b = 0, then either a = 0 or b = 0 or both. The property (F) is called the existence of an inverse element with respect to multiplication (this element is then denoted by a-1. For normal arithmetic, this is called the reciprocal of a, the same as 1 la or -. 5 CHAPTER 1. INITIAL WARM UP Solution. Since the absolute value of the product (ratio) of any two complex numbers is the product (ratio) of their absolute values and every complex number has the same absolute value as its complex conjugate, we have that (2+3i)(l-Hy5) _ |2_|_32| ■ IH-'V^I 1—i\/3 \l-iV3\ V22 + 32 = VTs. \2 + 3i\ □ l.A.5. Simplify the expression (5y/3 + 5i)n for n = 2 and n = 12. Solution. Using binomial theorem for n = 2 we get (5^ + 5i)2 = 75 + ■ 5i - 25 = 50 + 50^- Taking powers one by one or doing an expansion using binomial theorem are in the case n = 12 too much time-consuming. Let us rather write the number in polar form 5^ + 5^ = 10 + |) = 10 (cos | +isin|) and using de Moivre theorem we easily obtain (5y/3 + 5i)12 = 1012 (cos ift + i sin ^) = 1012. Q l.A.6. Determine the distance d of the numbers z, z in the complex plane for y _ \/3 V3 _ -3 ^ — 2 t 2. Solution. It is not difficult to realize that complex conjugates are in the complex plane symmetric with respect to the x-axis and the distance of a complex number from the x-axis equals its imaginary part. That gives d = 3. □ l.A.7. Express the number z1 = 2 + 3i in polar form. Express the number 22 = 3(cos(7r/3) + i sin(7r/3)) in algebraic form. Solution. The absolute value of \z1 (the distance of the point with Cartesian coordinates [2,3] in the plane from the origin) is V22 + 32 = Vl3. From the right triangle in the diagram we compute sm(ip) = 3/\/l3, cos(tp) =2/\/l3. Thus ip = arcsin(3/v/13) = arccos(2/v/13) = 56.3°. In total, 13 13 ( cos ( arccos v v^;;+isinlarcsinv^, Transition from polar form to algebraic form is even simpler: z2 = 3 (cos (J^j + isiri ^ _3 . 3^3 ~ 2 +l' ~' = 2 The property (ID) then says that there exists no "divisors of zero". A divisor of zero is a number a, a ^ 0, such that there is a number b, b ^ 0, with ab = 0. 1.1.2. Remarks. The integers Z are a good example of a commutative group. The natural numbers are not such an example since they do not satisfy (CG4) (and possibly do not even contain the neutral element if one does not consider zero to be a natural number). If a commutative ring also satisfies the property (F), we speak of a field (often also about a commutative field). The last stated property (ID) is automatically satisfied if (F) holds. However, the converse statement is false. Thus we say that the property (ID) is weaker than (F). For example, the ring of integers Z does not satisfy (F) but does satisfy (ID). In such a case we use the term integral domain. Notice that the set of all non-zero elements in the field along with the operation of multiplication satisfies (Rl), (R2), (R3), (F) and thus is also a commutative group. However in this case, instead of addition we speak of multiplication. As an example, the set of all non-zero real numbers forms a commutative group under multiplication. The elements of some set with operations + and ■ satisfying (not necessarily all) stated properties (for example, a commutative field, an integral domain) may be called scalars. To denote them we usually use lowercase Latin letters, either from the beginning or from the end of the alphabet. We will use only these properties of scalars and thus our results will hold for any objects with such properties. This is the true power of mathematical theories - they do not hold just for a specific solved example. Quite the opposite, when we build ideas in a rational way they are always universal. We will try to emphasise this aspect, although our ambitions are modest due to the limited size of this book. Before coming to any use of scalars, we should make a short formal detour and pay attention to its existence. We shall come back to this in the very end of this chapter, when we shall deal with the formal language of Mathematics in general, cf. the constructions starting in 1.6.5. There we indicate how to get natural numbers N, integers Z, and rational numbers Q, while the real numbers R will be treated much later in chapter 5. At this point, let us just remark that it is not enough to pose the axioms of objects. We have to be sure that the given conditions are not in conflict and such objects might exist. We suppose the readers are sure about the existence of the domains N, Z, Q and can handle them easily. The real numbers are usually understood as a dense and better version of Q, but what about the domain of complex numbers? As is usual in mathematics, we will use variables (letters of alphabet or other symbols) to denote numbers, and it does not matter whether we know their value beforehand or not. □ 6 CHAPTER 1. INITIAL WARM UP l.A.8. Express z = cos 0 + cos j + i sin | in polar form. Solution. To express number z in polar form, we need to find its absolute value and argument. First we calculate the absolute value: = V ( cos 0 + cos ■ + 8111" \ 1 + + For the argument p, we have: cosw- M£) - l±i - vl COS p — |z| _ ^_ _ 2 , therefore p = tt/Q. Thus y/3 2 sin p = V3. _ Im(z) _ \/3 (cos ^ + i sin ^) □ l.A.9. Using de Moivre theorem, calculate (^cos -g- + J sin ^ Solution. We obtain COS- +2SU1-J 317T 317T 7tt 7tt a/3 1 = cos--\-i sin- = cos--\-i sin — =---1 -. 6 6 6 6 2 2 □ 1.A.10. Is the "square root" well defined function in complex numbers? Solution. No, it is only defined as a function with domain being non-negative real numbers and the image being the same set. In the complex domain, for any complex number z (except zero) there are two complex numbers such that their square is equal z. Both can be called square root and they differ by sign (square root of — 1 is according to this definition i as well as — i). □ 1.1.3. Complex numbers. We are forced to extend the domain of real numbers as soon as we want to see solutions of equations like x2 = b for all real numbers b. We know that this equation always has a solution x in the domain of real numbers, whenever b is non-negative. If b < 0, then such a real x cannot exist. Thus we need to find a larger domain, where this equation has a solution. The crucial idea is to add the new number i to the real numbers, the imaginary unit, for which we require i2 = — 1. Next we try to extend the definitions of addition and multiplication in order to preserve the usual behaviour of numbers (as summarised in 1.1.1). Clearly we need to be able to multiply the new number i by real numbers and sum it with real numbers. Therefore we need to work in our newly defined domain of complex numbers C with formal expressions of the form z = a + ib, being called algebraic form of z. The real number a is called the real part of the complex number z, the real number b is called the imaginary part of the complex number z, and we write Re(z) = a, lm(z) = b. It should be noted that if z = a + i b and w = c + id then z = w implies both a = c and b = d. In other words, we can equate both real and imaginary parts. For positive x we then get (i ■ x)2 = —l-x2 and thus we can solve the equations as requested. In order to satisfy all the properties of associativity and distributivity, we define the addition so that we add independently the real parts and the imaginary parts. Similarly, we want the multiplication to behave as if we multiply the pairs of real numbers, with the additional rule that i2 = —1, thus (a + i b) + (c + i d) = (a + c) + i (b + d), (a + ib) ■ (c + i d) = (ac — bd) + i (be + ad). Next, we have to verify all the properties (CGI-4), (Rl-4) and (F) of scalars from 1.1.1. But this is an easy exercise: zero is the number 0 + i 0, one is the number 1 + i 0, both these numbers are for simplicity denoted as before, that is, 0 and 1. For non-zero z = a + i b we easily check that z~x = (a2 + b2)~1(a — i b). All other properties are obtained by direct calculations. 1.1.4. The complex plane and polar form. A complex number is given by a pair of real numbers, therefore it corresponds to a point in the real plane I2. Our algebraic form of the complex numbers z = x + iy corresponds in this picture to understanding the x-coordinate axis as the real part while the y-coordinate axis is the imaginary part of the number. The absolute value of the complex number z is defined as its distance from the origin, thus \z\ = \Jx2 + y2. The reflection with respect to the real axis then corresponds to changing the sign of the imaginary part. We call this operation z z = x — iy the complex conjugation. Let us now consider complex numbers of the form z = cos ip + i sin p, where p is a real parameter giving the angle between the real axis and the line from the origin to z (measured in the positive, i.e. counter-clockwise sense). These 7 CHAPTER 1. INITIAL WARM UP l.A.ll. Complex numbers are not just a tool to obtain "weird" solutions to quadratic equations. They are necessary to determine solutions to cubic equations, even if these solutions are real. How can we express solution to the cubic equation a;3 + ax2 + bx + c = 0 in real coefficients a, b, c? We show a method developed in sixteenth century by Ferro, Cardano, Tartaglia and possibly others. Substitute x := t — a/3 (to remove the quadratic part from the equation) to obtain the equation: t3 + pt + q = 0, where p = b - a2/3 and q = c + (2a3 - 9a&)/27. Now introduce unknowns u, v satisfying the conditions u + v = t and 3uv + p = 0. Substitute the first condition into the previous equation to obtain u3 + v3 + (3uv + p)(u + v) + q 0. Now use the second equation to eliminate v. This yields u6 + qu3-^ = 0, which is a quadratic equation in the unknown s = u3. Thus £ ± /£! + El ~2 V 4 27' By back substitution, we obtain x = —p/3u + u — a/3. In the expression for u there is cube root. In order to obtain all three solutions we need to work with complex roots. The equation x3 = a, a =^ Q, with the unknown x has exactly three solutions in the domain of complex numbers (the fundamental theorem of algebra, see (12.2.8) on page 866). All these three solutions are called cube roots of a. Therefore the expression y/a has three meanings in the complex domain. If we want a single meaning for that expression, we usually consider it to be the solution with the smallest argument. 1.A.12. Show that the roots £i, ■ ■ ■, Cn of the equation xn = 1 form the vertices of the regular n-gon in the plane of the complex numbers. Solution. The argument of the roots is given by de Moivre theorem, namely the argument multiplied by n has to be a multiple of 2tt, the absolute value has to be one, so the roots are = cos(k^) + i sin(k^), k = 1,..., n, which are indeed the vertices of a regular polygon. □ numbers describe all points on the unit circle in the complex plane. Every non-zero complex number z can be then written as z = z | (cos

B assigning to each element x in the domain set A the value f(x) in the codomain set B. The set of all images f(x)eB is called the range of /. The set A or B can be a set of numbers, but there is nothing to stop them being sets of other objects. The mapping /, however it is described, must unambiguously determine a unique member of B for each member of A. In another terminology, the member x e A, is often called the independent variable. Then y = f(x) G B, is called the dependent variable. We also say that the value y = j(x) is a. function of the independent variable x in the domain of /. For now, we shall restrict ourselves to the case where the codomain B is a subset of scalars and we shall talk about scalarfunctions. 8 CHAPTER 1. INITIAL WARM UP 1.A.13. Show that the roots £1, £2,• • •, £n of the equation xn = 1 satisfy n £6 = 0. i=l Solution. Let £i be the root with the smallest positive argument. The other roots satisfy = £f (see the previous example), thus 71 71 i=l i=l Si = 0, where we have summed up the geometric sequence £i . □ More examples about complex numbers can be found in the end of the chapter, starting at l.G.2. 1.A.14. Solve the equation a;3 + x2 - 2x - 1 = 0. Solution. This equation has no rational roots (methods to determine rational roots will be introduced later, see (??)). Substitution into formulas obtained in l.A.ll yields p = b - a2/3 = -7/3, q = -7/27. It follows that \/28 ± 12v/314f u = -. 6 We can theoretically choose up to six possibilities for u (two for the choice of the sign and three independent choices of the cubic root). But we obtain only three distinct values for x. By substitution into the formulas, one of the roots is of the form 14 3(28 - 84^) \/28 - 84^ 1 +--ř-— - q = 1-247, 6 3 similarly for the other two (approximately —0.445 and —1.802). As noted before, we see that even if we have used complex numbers during the computation, all the solutions are real. □ B. Difference equations Difference equations (also called recurrence relations) are relations between elements of a sequence, ^ where an element of the sequence depends on 11 previous elements. To solve a difference equation means finding an explicit formula for n-th (that is, arbitrary) element of the sequence. The simplest way to define a function appears if A is a finite set. Then we can describe the function / by a table or a listing showing the image of each member of A. We have certainly seen many examples of such functions: Let / denote the pay of a worker in some company in certain year. The values of independent variable, that is, the domain of the function, are individual workers x from the set of all considered workers. The value j(x) is their pay for the given year. Similarly we can talk about the age of students or their teachers in years, the litres of beer and wine consumed by individuals from a given group, etc. Another example is a food dispensing machine. The domain of a function / would be the button pushed together with the money inserted to determine the selection of the food item Let A = {1,2,3} = B. The set of equalities/(l) = 1, /(2) = 3, /(3) = 3, defines a function / : A -> B. Generally, as there are 3 possible values for /(l), and the same for /(2), and /(3), there are 27 possible functions from A into B in total. But there are other ways to define a function than as a table. For example, the function / can denote the area of a planar region. Here, the domain consists of subsets of the plane (e.g. all triangles, circles or other planar regions with a defined area). The range of / consists of the respective areas of the regions. Rather than providing a list of areas for a finite number regions, we hope for a formula allowing us to compute the functional value f(P) for any given planar region P from a suitable class. Of course, there are many simple functions given by formulas, like the formula f(x) = 3x + 7 with A = B = R or A = B = N. Not all functions can be given by a formula or list. For example, let j(t) denote the speed of the car at time t. For any given car and time t, we know there will be the functional values j(t) denoting its speed. Which can of course be measured approximately, but usually not by a formula. Another example: Let j(n) be the nth digit in the decimal expansion of it = 3.1415 .... So for example /(4) = 5. The value of / (n) is defined but unknown if n is large enough. The mathematical approach in modelling real problems often starts from the indication of certain dependencies between some quantities and aims at explicit formulas for functions which describe them. Often a full formula is not available but we may obtain the values j(x) at least for some instances of the independent variable x, or we may be able to find a suitable approximation. We shall see all of the following types of expressions of the requested function / in this book: • exact finite expression (like the function j(x) = 3a; + 7 above); • infinite expression (we shall come to that only much later in chapter 5 when introducing the limit processes); • description of how the function's values change under a given change of the independent variable (this behaviour 9 CHAPTER 1. INITIAL WARM UP If an element of the sequence is determined only by the previous element, we call it a first order difference equation. This is a common real world problem, for instance when we want to find out how long will repayment of a loan take for fixed monthly repayment, or when we want to know how much shall we pay per month if we want to repay a loan in a fixed time. l.B.l. Michael wants to buy a new car. The car costs € 30 000. Michael wants to take out a loan and repay it with a fixed month repayment. The car company offers him a loan to buy the car with yearly interest of 6%. The repayment starts at the end of the first month of the loan. Michael would like to finish repaying the loan in three years. How much should he pay per month? Solution. Let P denote the sum Michael has to pay per month. After the first month Michael repays P, part of it is a repayment of the loan, part of it pays the interest. Let dk stand for the loan after k months and write C = 30 000 for the price of the car, and u = Sj&S for tne monthly interest rate. We know do = C = 30 000 and after the first month there is d1 = C - P + u ■ C. In general, after the fc-th month we have (1) dk = dk_i — P + m4-i = (1 + u)dk-i — P- Using the relation (1) from paragraph 1.2.3 we obtain dk given by (we write a = l + u) dk = dnak — P a-l Repaying the loan in three years means d^e = 0, thus (1 + u)36u P = 30 000 (1 + u)36 - 1 = 30000 f»»)^m. V (12.06/12)36 - 1 J □ Note that the recurrence relation (1) can be used for our case as long as all y (n) are positive, that is, as long as Michael still has to repay something. I.B.2. Consider the case from the previous example. For how long would Michael have to pay, if he repays €500 per month? Solution. Setting as before a = (l + 2jf) = 1.005, C = 30 000 the condition dk = 0 gives the equation k £i 200P will be displayed under the name difference equation in a moment and under different circumstances later on); • approximation of a not computable function with a known one (usually including some error estimates -this could be the case with the car above, say we know it goes with some known speed at the time t = 0, we break as much as possible on a known surface and we compute the decrease of speed with the help of some mathematical model); • finding only the probability of possible values of the function. For example the function giving the length of life of a given group of still living people, in dependence of some health related parameters. 1.1.6. Functions denned explicitly. Let us start with the most desirable case, when the function values are defined by a computable finite formula. Of course, we shall be interested also in the efficiency of the formulas, i.e. how fast the evaluations would be. In principle, real computations can involve only a finite number of summations and multiplications of numbers. This is how we define the polynomials, i.e. function of the form f(x) = an-xn+- ■ ■ +ai -x+a0, where a0,... ,an are known scalars, x is the unknown variable whose value we can insert. xn = 1 ■ x ■ ■ ■ x means the n-times repeated multiplication of the unit by x (in particular, a;0 = 1), and j(x) is the value of the indicated sum of products. This is fairly well computable formula for each n e N. The choice n = 0 provides the constant a0. The next example is more complicated. Factorial function Let A = Z+ be the set of positive integers. For each n e Z+, define the factorial function by n\ =n(n-l)(n-2)...3-2-l. For convenience we also define 0! = 1. (We will see why this is sensible later on). It is easy to see that n\ = n ■ (n — 1)! for all n > 1. -^r-C 200P-C So 1! = 1, 2! = 2 ■ 1 = 2, 3! = 3 ■ 2 ■ 1 = 6, 6! = 720 etc. The latter example deserves more attention. Notice that we could have defined the factorial by settings = B = Nand giving the equation j(n) = n ■ f(n — 1) for all n > 1. This does not yet define /, but for each n, it does determine what j(n) is in terms of its predecessor f(n — 1). This is sometimes called a recurrence relation. After choosing /(0) = 1, the recurrence now determines /(l) and hence successively /(2) etc., and so a function is defined. It is the factorial function as described above. 2. Difference equations The factorial function is one example of a function which can be defined on the natural numbers by means of a recurrence relation. 10 CHAPTER 1. INITIAL WARM UP By taking logarithms of both sides, we obtain _ ln(200P) - ln(200P - C) In a ' which for P = 500 gives approximately k = 71.5, thus Michael would be paying for 72 months (the last repayment would be less than €500). □ l.B.3. Determine the sequence {yn}™=1, which satisfies the following recurrence relation Vn+i = — + 1, n > 1, yi = 1. q Linear recurrences can naturally appear in geometric problems: l.B.4. Suppose n lines divide the plane into regions. What is the maximum number of regions that can be formed in this way? Solution. Let the number of regions be pn. If there is no line in the plane, then the whole plane is one region, thus p$ = l. If there are n lines, then adding an (n + l)-st line increases the number of regions by the number of regions this new line intersects. If no lines are parallel and no three lines intersect at the same point, the number of regions the (n + l)-st line crosses is one plus the number of its intersections with the previous lines (the crossed area will then be divided into two, thus the total number increases by one at every crossing). The new line has at most n intersections with the already-present n lines. The segment of the line between two intersections crosses exactly one region, thus the new line crosses at most n+l regions. Thus we obtain the recurrence relation Pn+l =pn+{n+ 1). Such a situation can often be seen when formulating mathematical models that describe real systems in economy, biology, etc. We will observe here only a few simple examples and return to this topic in chapter 3. 1.2.1. Linear difference equations of first order. A general difference equation of the first order (or first order recurrence) is an expression of the form f(n + l)=F(n,f(n)), where F is a known function with two arguments (independent variables). If we know the "initial" value /(0), we can compute /(l) = F(0, /(0)), then /(2) = F(l, /(l)) and so on. Using this, we can compute the value j(n) for arbitrary An example of such an equation is provided by the factorial function j(n) = n\ where: (n + l)! = (n + l)-n! In this way, the value of f(n + 1) depends on both n and the value of f(n), and formally we would express this recurrence in the form F(x, y) = (x + l)y. A very simple example is / (n) = C for some fixed scalar C and all n. Another example is the linear difference equation of first order (1) f(n + l) = a-f(n)+b, where a^O and b are fixed numbers. Such a difference equation is easy to solve if b = 0. Then it is the well-known recurrent definition of the geometric progression. We have /(l) = a/(0), /(2)=a/(l) = a2/(0), and so on. Hence for all n we have f(n) = a"/(0). This is also the relation for the Malthusian population growth model. This is based on the assumption that population size grows with a constant rate when measured at a sequence of fixed time intervals. We will prove a general result for first order equations with variable coefficients, namely: (2) f(n + l) = an-f(n)+bn. We use the usual notation for sum J2< the similar notation for the product Yl- We use also the convention that when the index set is empty, then the sum is zero and the product is one. 1.2.2. Proposition. The general solution of the first order difference equation (2) from the previous paragraph with the initial condition /(0) = yo is for n G N given by the formula /n-l \ n-2 I n-1 \ (1) f(n) = J] at y0 + £W J] aA bJ + bn-i- \i=0 J j=0 \i=j+l J 11 CHAPTER 1. INITIAL WARM UP for which po = 1. We obtain an explicit formula for pn either by applying the formula in 1.2.2 or directly: Pn = Pn-i + n = pn_2 + {n-l)+n pn-z + (n - 2) + (n - 1) + n ■ n(n + 1) n2 +n + 2 ■Po = i + □ Recurrence relations can be more complex than those of first order. We show an example of combinatorial problem, for whose solution a recurrence relation can be used. I.B.5. How many words of length 12 that consist only of (óT^S) letters A and B, but do not contain a sub-word BBB, are there? Solution. Let an denote the number of words of length n consisting of letters A and B but without BBB as a sub-word. Then for an (n > 3) the following recurrence holds Sn = 0~n-l + O-n-2 + "re-3i since the words of length n that satisfy the given condition either end with an A, or with an AB, or with an ABB. There are a„_i words ending with an A (preceding the last A there can be an arbitrary word of length n — 1 satisfying the condition). Analogously for the two remaining groups. Further, it is easily shown that ai = 2, a2 = 4, and a3 = 7. Using the recurrence relation we can then compute a12 = 1705. We could also derive an explicit formula for n-th element of the sequence using the theory, which we will develop in the chapter 3. □ l.B.6. Partial difference equations. The recurrence relation in the next problem has a more complex form in comparison to the form we have dealt with in our theory. So we cannot evaluate the arbitrary member in our sequence P(k,i) explicitly. We can only evaluate it by a subsequent computing from previous elements. Such an equation is called partial difference equation, since the terms of the equation are indexed by two independent variables (k, I). The score of a basketball match between the teams of Czech Republic and Russia after the first quarter is 12 : 9 for the Russian team. In how many ways could the score have developed? Proof. We use mathematical induction. The result clearly holds for n = 1 since /(l) = a0y0 + fro-Assuming that the statement holds for some fixed n, we compute: (/n-l \ n-2 ( n-l \ N I J\ ai ) yo + ^2 I II fli J + hn-l \i=0 J j=0 \i=j+l J j + bn \ n-l In \ n at) yo+e nat \b3[+bn> \i=0 J j=0 \i=j+l J as can be seen directly by multiplying out. □ Note that for the proof, we did not use anything about the numbers except for the properties of commutative ring. 1.2.3. Corollary. The general solution of the linear difference equation (l)from 1.2.1 with a/1 and initial condition /(0) = yo is (1) 1 — nn f(n) = any0 +--b. 1 - a Proof. If we set a{ and b{ to be constants and use the general formula 1.2.2(1), we obtain f{n) = any0 + b(\ + Yjan->-1' \ 3=0 We observe that the expression in the bracket is (1 + a H-----han_1). The sum of this geometric progression follows from l-a™ = (l-a)(l + a + ... + a™-1). □ The proof of the former proposition is a good example of a mathematical result, where the verification is quite easy, as soon as someone tells us the theorem. Mathematical induction is a natural method of proof. Note that for calculating the sum of a geometric progression we required the existence of the inverse element for non-zero scalars. We could not do that with integers only. Thus the last result holds for fields of scalars and we can thus use it for linear difference equations where the coefficients a, b and the initial condition /(0) = yo are rational, real or complex numbers. This last result also holds in the ring of remainder classes Zfc with prime k (we will define remainder classes in the paragraph 1.6.7). It is noteworthy that the formula (1) is valid with integer coefficients and integer initial conditions. Here, we know in advance that each j(n) is an integer, and the integers are a subset of rational numbers. Thus our formula necessarily gives correct integer solutions. Observing the proof in more detail, we see that 1 — an is always divisible by 1 — a, thus the last paragraph should not have surprised us. However it can be seen that with scalars from Z4 and say a = 3, we fail since 1 — a = 2 is a divisor of zero and as such does not have an inverse in Z4. 12 CHAPTER 1. INITIAL WARM UP Solution. We can divide all possible evolutions of the quarter with the final score k : I into six mutually exclusive possibilities, according to which team scored, and how much was it worth (1, 2 or 3 points). If we denote by P(k,i) the number of ways in which the score could have developed for a quarter that ended with k : I, then for k,l > 3 the following recurrence relation holds: (k,i) P(k-3,l) + P(k-2,l) + P(k-l,l) + P(k,l- + P(k,i-2) + P(k,i-3)- Using the symmetry of the problem, P(k,i) = P(i,k) - Further, for k > 3: (fc,0)' P(k,2) = P(k-3,2) + P(k-2,2) + P(k-1,2) + P(k,l) + P( P(k,l) = P(k-3,1) + P(k-2,1) + P(k-l,l) + P(k,0), P{k,0) = P{k-3,0) + P(k-2fi) + P{k-l,0)i which, along with the initial condition, gives P(o,o) = 1. P{i,o) = 1. ^(2,0) = 2, P(3,o) = 4, = 2, P(2jl) = (i.i: + P(0,1) + P(2,0) = 5, P(2,2) = P(0,2) + P(l,2) + + P(2,o) — 14- Hence by repeatedly using the above equations, we obtain eventually P(12g) = 497178513. We will discuss recurrent formulas (difference equations) of higher order with constant coefficients in chapter 3. C. Combinatorics In this section we use natural numbers to describe some ^ indivisible items located in real life space, and fyv&r^ deal with questions as how to compute the num-' ber of their (pre)orderings, choices, and so on. In many of these problems, "common sense" is sufficient. We just need to use the rules of product and sum in the right way, as we show in the following examples: The linear difference equation 1.2.1(1) can be neatly in-igu terpreted as a mathematical model for finance, cfj*5f"TjF'J e.g. savings or loan payoff with a fixed interest rate a and fixed repayment b. (The cases of sav-Vfliti ings and loans differ only in the sign of b). With varying parameters a and b we obtain a similar model with varying interest rate and repayment. We can imagine for instance that n is the number of months, an is the interest rate in the nth month, bn the repayment in the nth month. 1.2.4. A nonlinear example. When discussing linear difference equations, we mentioned a very primitive population growth model which depends directly on the momentary population size p. At firstsight.it is clear that such a model with a > 1 leads to a very rapid and unbounded growth. A more realistic model has such a population change Ap(n) = p(n + 1) — p(n) only for small values of p, that is Ap/p ~ r > 0. Thus if we want to let the population grow by 5% for a time interval only for small p, then we choose r to be 0.05. For some limiting value p = K > 0 the population may not grow. For even greater values it may even decrease, for instance if the resources for the feeding of the population are limited, or if individuals in a large population are obstacles to each other etc. Assume that the values yn = Ap(n)/p(n) change linearly in p(n). Graphically we can imagine this dependence as a line in the plane of the variables p and y. This line passes through the point [0, r], so that y = r when p = 0 This line also passes through [K, 0], since this gives the second condition, namely that when p = K the population does not change. Thus we set By setting y = yn = Ap(n)/p(n) and p = p(n) we obtain p(n + 1) — p(n) r p(n) K p(n) + r. By multiplying, we obtain a difference equation of first order with p(n) present as both a first and a second power. (1) p(n + l)=p(n)(l--p(n)+r). l.C.l. Mother wants to give John and Mary five pears and six apples. In how many ways can she divide the fruits among them? (We consider the pears to be indistinguishable. We consider the apples to be indistinguishable. The possibility that one of the children gets nothing is not excluded.) Try to think through the behaviour of this model for var-, ious values of r and K. In the diagram we ;/ can see the results for parameters r = 0.05 (that is, five percent growth in the ideal state), K = 100 (resources limit the population to the size 100), and as p(0) = 2 we have initially two individuals. 13 CHAPTER 1. INITIAL WARM UP Solution. The five pears can be divided in six ways (it is determined by the number of pears given to John, the rest goes to Mary.) The six apples can be divided in seven ways. These divisions are independent. Using the rule of product, the total number is 6 ■ 7 = 42. □ l.C.2. Determine the number of four-digit numbers, which either start with the digit 1 and do not end with the digit 2, or that end with the digit 2 but do not start with the digit 1 (of course, the first digit must not be zero). Solution. The set of numbers described in the statement consists of two disjoint sets. The total number is then obtained by summing the number of numbers in these two sets. In the first set there are numbers of the form "1XXY" where X is an arbitrary digit and Y is any digit except 2. Thus we can choose the second digit in ten ways, independently of that the third digit in ten ways and again independently the fourth digit in nine ways. These three choices then uniquely determine a number. By multiplication, there are 10 ■ 10 ■ 9 = 900 of such numbers. Similarly in the second set we have 8-10-10 = 800 numbers of the form "YXX2" (for the first digit we have only eight ways, since the number cannot start with zero and one is forbidden). By addition, the solution is 900 + 800 = 1700 numbers. □ In the following examples we will use the notions of combinations, and permutations (possibly with repetitions). I.C.3. During a conference, 8 speakers are scheduled. Determine the number of all possible orderings in which two given speakers do not speak one right after the other. Solution. Denote the two given speakers by A and B. If B follows directly after the speaker A, we can consider it as a speech by a single speaker AB. The number of all orderings where B speaks directly after A is therefore 71, the number of permutations of seven elements. By symmetry, the number of all orderings where A speaks directly after B is also 71. Since the number of all possible orderings of eight speakers is 8!, the solutionis 8! - 2 ■ 7!. □ l.C.4. How many rearrangements of the letters of the word PROBLEM are there, such that a) the letters B and R are next to each other, b) the letters B and R are not next to each other. Solution, a) The pair of letters B and R can be assumed to be a single indivisible "double-letter". In total we have six NM-LIN&\F^ "PETeNVE-NCE— >1o° bo V 1»- 0 So 7®c 15o Zoo Note that the original almost exponential growth slows down later. The population size approaches the desired limit of 100 individuals. For p close to one and K much greater than r, the right side of the equation (1) is approximately p(n)(l + r). That is, the behaviour is similar to that of the Malthusian model. On the other hand, if p is almost equal to K, the right side of the equation is approximately p(n). For an initial value of p greater than K the population size will decrease. For an initial value of p less than K the population size will increase.1 3. Combinatorics A typical "combinatorial" problem is to count in how many ways something can happen. For instance, in how many ways can we choose two different sandwiches from the daily offering in a grocery shop? In this situation we need first to decide what we mean by different. Do we then allow the choice of two "identical"' sandwiches? Many such questions occur in the context of card games and other games. The solution of particular problems, usually involves either some multiplication of particular results (if the individual possibilities are independent) or some addition (if their appearance is disjoint). This is demonstrated in many examples in the problem column (cf. several problems starting with l.C.l). 1.3.1. Permutations. Suppose we have a set of n (distinguishable) objects, and we wish to arrange them in some order. We can choose a first object in n ways, then a second in n — 1 ways, a third in n — 2 ways, and so on, until we choose the last object for which there is only one choice. The total number of possible arrangements is the product of these, hence there are exactly n\ = n(n — l)(n — 2)... 3 ■ 2 ■ 1 distinct orders of the objects. Each ordering of the elements of a set S is called a permutation of the elements of S. The number of permutations on a set with n elements is n\. ^This model is called the discrete logistic model. Its continuous version was introduced already in 1845 by Pierre Francois Verhulst. Depending on the proportions of the parameters r, K and p(0), the behaviour can be very diverse, including chaotical dynamics. There is much literature on this model. 14 CHAPTER 1. INITIAL WARM UP distinct letters and there are 6! words of six indivisible letters. We have to multiply this by two, since the double-letter can be either BR or RB. Thus the solution is 2 ■ 61. b) The events in b) form the complement to the part a) in the set of all rearrangements of the seven-letters. The solution is therefore 7! - 2 ■ 6!. □ l.C.5. In how many ways can an athlete place 10 distinct cups on 5 shelves, given that all 10 cups fit on any shelf? Solution. Add 4 indistinguishable items, say separators, to the cups. The number of all distinct orderings of cups and separators is 141/4! (the separators are indistinguishable). Each placement of cups into shelves corresponds to exactly one ordering of cups and separators. It is enough to say that the cups before the first separator in the ordering are placed in the first shelf (preserving the order), the cups between the first and the second separator in the second shelf, and so on. Thus the required number is 141/4!. □ l.C.6. Determine the number of four-digit numbers with exactly two distinct digits. (Recall that the first digit must not beO.) Solution. First solution. If 0 is one of the digits, then there are 9 choices for the other digit, which must also be the first digit. There are three numbers with a single 0, three numbers with two 0's, and just one number with three 0's. Thus there are 9(3+3+l)=63 numbers which contain the digit 0. Otherwise, choose the first digit for which there are 9 choices. There are then 8 choices for the other digit and 3+3+1 numbers for each choice, making 9 ■ 8 ■ (3 + 3 + 1) = 504 numbers which do not contain the digit 0. The solution is 504+63=567 numbers. Second solution. The two distinct digits used for the number can be chosen in (g0) ways. From the two chosen digits we can compose 24 — 2 distinct four-digit numbers (we subtract the 2 for the two four digit numbers which use only one of the chosen digits). In total we have (3°) (24 - 2) = 630 numbers. But in this way, we have also computed the numbers that start with zero. Of these there are (^) (23 — 1) = 63. Thus the solution is 630 — 63 = 567 numbers. □ l.C.7. There are 677 people at a concert. Do some of them have the same (ordered) pair of name initials? Solution. There are 26 letters in the alphabet. Thus the number of all possible name initials are 262 = 676. Thus at least two people have the same initials. □ We can identify the elements in S by numbering them (using the digits from one to n), that is, we identify S with the set S = {1,..., n) of n natural numbers. Then the permutations correspond to the possible orderings of the numbers from one to n. Thus we have an example of a simple mathematical theorem and this discussion can be considered to be its proof. Number of permutations Proposition. The number p(n) of distinct orderings of a finite set with n elements, is given by the factorial function: (1) p(n) = n\ Suppose S is a set with n elements. Suppose we wish to choose and arrange in order just k of the members of S, where 1 < k < n. This is called a k-permutation without repetition of the n elements. The same reasoning as above shows that this can be done in m\ v(n, k) = n(n - l)(n - 2) ■ ■ ■ (n - k + 1) = (n-k)\ ways. The right side of this result also makes sense for k = 0, (there is just one way of choosing nothing), and for k = n, since 0! = 1. Now we modify the problem, this time where the order of selection is immaterial. 1.3.2. Combinations. Consider a set S with n elements. A k-combination of the elements of S is a selection of k elements of S, 0 < k < n, when order does not matter. For k > 1, the number of possible results of a subsequential choosing of our k elements, is n(n — l)(n — 2) ■ ■ ■ (n — k + 1) (a fc-permutation). We obtain the same fc-tuple in k\ distinct orders. Hence the number of k-combinations is i{n-l){n-2)---{n-k + l) 1! k\ {n-k)\k\ If k = 0, the same formula is still true, since 0! = 1, and there is just one way to select all n elements. Combinations Proposition. The number c(n, k) of combinations of k-th degree among n elements, where 0 < k < n, is (1) fn\ n(n — 1). .. (n — k + 1) n\ \ k I c(n, k) k(k-l). 1 {n-k)\kV We pronounce the binomial coefficient (£) as "n over k" or "n choose k". The name stems from the binomial expansion, which is the expansion of (a+b)n. If we expand (a+b)n, the coefficient of akbn~k is the number of ways to choose a 15 CHAPTER 1. INITIAL WARM UP l.C.8. New players meet in a volleyball team (6 people). How many handshakes are there when everybody shakes once with everybody else? How many handshakes are there if everybody shakes hands once with each opponent after playing a match? Solution. Each pair of players shakes hands at the introduction. The number of handshakes is then the combination c(6, 2) = (®) = 15. After a match each of the six players shakes hands six times (with each of six opponents). Thus the required number is 62 = 36. □ l.C.9. In how many ways can five people be seated in a car for five people, if only two of them have a driving licence? In how many ways can 20 passengers and two drivers be seated in a bus for 25 people? Solution. For the driver's place we have two choices and the other places are then arbitrary, that is, for the second seat we have four choices, for the third three choices, then two and then 1. That makes 2.4! = 48 ways. Similarly in the bus we have two choices for the driver, and then the other driver plus the passengers can be seated among the 24 seats arbitrarily. First choose the seats to be occupied, that is, (^). Among these seats the people can be seated in 21! ways. The solution is 2- (£)21! = 2-f ways. □ 1.C.10. Determine the number of distinct arrangements <25mj> which can arise by permuting the letters in each individual word in the sentence "Pull up if I pull up" (the arising arrangements and words do not have to make any sense). Solution. Let us first compute the number of rearrangements of letters in individual words. From the words "pull" we obtain 4!/2 distinct anagrams (permutation with repetition P(l, 1, 2)), similarly "up" and "if yields two. Therefore, using the rule of product, wehave^-2-2-1-^-2 = 1152. Notice, that if the resulting arrangement should be a palindromic one again, there would be only four possibilities. □ l.C.ll. In how many ways can we insert five golf balls into five holes (into every hole one ball), if we have four identical white balls, four identical blue balls and three identical red balls? Solution. First solve the problem in the case that we have five balls of every colour. In this case it amounts to free choice of five elements from three possibilities (there is a choice out of fc-tuple from n parentheses in the product (from these parentheses, we take a, from the others, we take b). Therefore we have (2) (a + b)" = £ k=0 lkbn~ Note that only distributivity, commutativity and associativity of multiplication and summation was necessary. The formula (2) therefore holds in every commutative ring. We present a few simple propositions about binomial coefficients - another simple example of a mathematical proof. If needed, we define (£) = 0 whenever k < 0 or k > n. 1.3.3. Proposition. For all non negative integers n, we have (D(t) = L-k) 0 ,j; again. Mathematical induction consists of two •TjY steps. In the initial step, we establish the claim ssSsfet? for n = 0 (in general, for the smallest n the claim should hold for). In the inductive step we assume that the claim holds for some n (and all smaller numbers). We use this to prove that this implies the claim for n + 1. The principle of mathematical induction then asserts that the claim holds for every n. The claim (3) clearly holds for n = 0, since (°) = 1 = 2°. It holds also for n = 1. Now assume that the claim holds for some n > 1. We must prove the corresponding claim for 71 + 1 using the claims (2) and (3). We calculate n+l E k=0 71+1 k n+l E k=0 n E 71 k-1 + n+l + E k=0 2n + 2™ = 2 n+l Note that the formula (3) gives the number of all subsets of an 71-element set, since (£) is the number of all subsets of size k. Note also that (3) follows from 1.3.2(2) by choosing a = b=l. To prove (4) we again employ induction, as we did in (3). For 71 = 0 the claim clearly holds. The inductive assumption 16 CHAPTER 1. INITIAL WARM UP three colours for every hole), that is permutations with repetitions (see). We have V(3,5) = 35. Now subtract the configurations where there are either balls of one colour (there are three such), or exactly four red balls (there are 2 ■ 5 = 10; we first choose the colour of the non-red ball - two ways - and then the hole it is in - five ways). Thus we can do it in 35 - 3 - 10 = 230 ways. □ 1.C.12. In how many ways can we insert into three distinct envelopes five identical 10-bills and five identical 100-bills such that no envelope stays empty? Solution. First compute the number of insertions ignoring the non-emptiness condition. It is an example of 3-combinations with repetition from 5 elements, and since we insert the 10-bills and 100-bills independently, we have c(7,2)2 = Q ways. Now subtract the insertions such that exactly one envelope is empty and then the insertions such that two are empty. We have C(2,7)2-3(C(l, 6)2 - 2) -3 = Q2-3(62 - 2)-3 = 336. □ 1.C.13. For any fixed n e N, determine the number of all solutions to the equation x±+x2-\-----\- Xk = n in the set of non-negative integers. Solution. Every solution (r1,..., r^), J2i=i ri — n can be uniquely encoded as a sequence of separators and ones, where we first write r1 ones, then a separator, then r2 ones, then another separator, and so one. Such sequence then clearly contains n ones and k — 1 separator. Every such sequence clearly determines some solution of the given equation. Thus there are exactly that many solutions as there are sequences, that is, (n+*_1). □ 1.C.14. In how many ways could the English Premier League have finished, if we know that no two of the three teams Newcastle United, Crystal Palace and Tottenham Hotspur are "adjacent" in the final table? (There are 20 teams in the league.) Solution. First approach. We use the inclusion-exclusion principle. From the number of all possible resulting tables says that (4) holds for some n. We calculate the corresponding sum for n +1 using (2) and the inductive assumption. We obtain £(G"i)+G k=0 X ' k=0 X X ' X n+1 k=0 -T.(l)+T.k(l)+T.\k k=0 x ' k=0 x ' k=0 x = 2™ + n2n-1 + n2n-1 = (n + 1)2™. This completes the inductive step and the claim is proven for all natural n. □ The second property from above allows us to write down all the binomial coefficients into the Pascal triangle.2 Here, every coefficient is obtained as a sum of the two coefficients situated right "above" it: n = 0 n = 1 n = 2 n = 3 n = 4 n = 5 1 1 1 10 1 10 1 Note that in individual rows we have the coefficients of individual powers in the expression (2). For instance the last given row says (a + bf = a5 + 5a4b + Wa3b2 + Wa2b3 + 5ab4 + b5. 1.3.4. Choice with repetitions. The ordering of n elements, where some of them are indistinguishable, is called a permutation with repetitions. Among n given elements, suppose there are Pi elements of the first kind, p2 elements of the second kind,pk of the fc-th kind, where pi + p2 + ■ ■ ■ + Pk = n. Then the number of permutations with repetitions of these elements is denoted as P(pi,... ,pk). We consider the orderings which differ only in the order of indistinguishable elements to be identical. Elements of the ith kind can be ordered in pi! ways, thus we have Permutations with repetitions The number of permutations with repetitions is P(pi ,Pk) Pi! 'Pfc! Let S be a set with n distinct elements. We wish to select k elements, 0 < k < n from S with repetition permitted. This is called a fc-permutation with repetition. Since the first selection can be done in n ways, and similarly the second can Although the name goes back to Blaise Pascal's treatise from 1653, such a neat triangle configuration of the numbers c(n, k) were known for centuries earlier in China, India, Greece, etc. 17 CHAPTER 1. INITIAL WARM UP we subtract the tables where some two of the three teams are adjacent and then add the tables where all three teams are adjacent. The number is then 20! - (^j ■ 2! ■ 19! + 3! ■ 18! = 18! ■ 16 ■ 17. Second approach. Let us consider the three teams to be "separators". The remaining teams have to be divided such that between any two separators there is at least one team. The remaining teams can be arbitrarily permuted, as can the separators. Thus we have '18" ways. 17! ■ 3! = 18! ■ 17- 16. D. Probability □ We present a few simple exercises for classical probability, where we are dealing with some experiment with only a finite number of outcomes ("all cases") and we are interested in whether or not the outcome of the experiment belongs to a subset of possible outcomes ("favourable outcomes"). The probability we are trying to determine then equals the number of favourable outcomes divided by the total number of all outcomes. Classical probability can be used when we assume, or know, that each possible outcome has the same probability of happening (for instance, fair dice throwing). l.D.l. What is the probability that the roll of a dice results in a number greater than 4? Solution. There are six possible outcomes (the set {1,2,3,4,5,6}). Two are favourable ({5,6}). Thus the probability is 2/6 = 1/3. □ l.D.2. We choose randomly a group of five people from a group of eight men and four women. What is the probability that there are at least three women in the chosen group? Solution. We divide the favourable cases according to the number of men in the chosen group: there can be either two or one. There are eight groups with five people of which one is a man (all women have to be present in such groups, thus it depends only on which man is chosen). There are c(8, 2)-c(4,3) = (*)-(3) of groups with two men (we choose two men from eight and then independently three women from four. These two choices can be independently combined and thus using the rule of product we obtain the number of also be done in n ways etc. The total number V(n, k) of k-permutations with repetitions is nk. Hence /^-permutations with repetitions V{n,k)=nk. - If we are interested in a choice of k elements without taking care of order, we speak of k-combinations with repetitions. At first sight, it does not seem to be easy to determine the number. We reduce the problem to another problem we have already solved, namely combinations without repetitions: Combinations with repetitions Theorem. The number of k-combinations with repetitions from n elements equals for every k > 0 and n > 1 'n + k-1^ k Proof. Label then elements as a 1; a2, ■■ ■ ,an. Suppose each element labeled a{ is selected k{ times, 0 < k{ < k, so that k1+k2-\-----\-kn = fc.Each such selection can be paired with the sequence of symbols * and | where each * represents one selection of an element and individual boxes are separated by (therefore there are n — 1 of them). The number of * in the ith box is equal to k{, so we obtain the sequence The other way around, from any such sequence we can determine the number of selections of any element (e.g. the number of * before first | determines fci). Having altogether k symbols * and n — 1 separators | we see that there are n + k-1 n-1 n + k-1 k possible sequences and therefore also the same number of the required selections. □ 4. Probability Now we are going to discuss the last type of function description, as listed in the very end of the subsection 1.1.5. Thus, instead of assigning explicit values of a function, we shall try to describe the probabilities of the individual options. 1.4.1. What is probability? As a simple example we can use common six-sided dice throwing, with sides labelled as 1, 2, 3, 4, 5, 6. 18 CHAPTER 1. INITIAL WARM UP such groups). The total number of groups with five people is c(12,5) = (g2). The probability, being the quotient of the number of favourable outcomes to the total number of outcomes, is then 8 + e52) 5_ 33' □ . I.D.3. From a deck with 108 cards (2 x 52 + 4 jolly jokers) we draw without returning 4 cards randomly. What is the probability that at least one of them is an ace or a joker? Solution. We can easily determine the probability of the complementary event, that is, in the 4 drawn cards there is none of the 12 cards (8 aces and 4 jokers). This probability is given by the ratio of the number of choices of 4 cards from 96 and the number of choices of4 cards from 108, that is, (946)/(1°8). The complementary event thus has the probability 1 - (7) = 0.380. □ We give an example for which the use of classical probability is not suitable: l.D.4. What is the probability that the reader of this exercise wins at least 25 million euro in EuroLotto during the next week? Solution. Such a formulation is incomplete, it does not give us enough information. We present a "wrong" solution. The sample space of possible outcomes is two-element: either the reader wins or not. A favourable event is one (win), thus the probability is 1/2. This is clearly a wrong answer. □ Remark. In the previous exercise the basic condition of the use of classical probability was violated - every elementary event must have the same probability. In fact, the elementary event has not been defined. EuroLotto has a daily draw with a jackpot of €25 000 000 for choosing 5 correct numbers 1,..., 50. There is no other way to win €25 000 000 than to win a jackpot on some of the day during the week. The elementary event would be that a single lotto card with 5 numbers wins a jackpot. Assuming that the reader submits k lotto cards every day of the week, the probability of winning at least one jackpot during the week is = 21:L78fc760 • If we describe the mathematical model of such throwing with a "fair" dice, we expect by symmetry that every side occurs with the same frequency. We say that "every side occurs with the probability 1/6". But throwing some less symmetric version of a dice with six faces, the actual probabilities of the individual results might be quite different. Let us build a simple mathematical model for this. We shall work with the parameters pi for the probabilities of individual sides with two requirements. These probabilities have to be non-negative real numbers and their sum is one, i.e. Pi + P2 + P3 + Pi + P5 + P6 = 1- At this time, we are not concerned about the particular choice of the specific values pi, they are given to us. Later on, in chapter 10, we shall link probability with mathematical statistics and then we shall introduce methods how to discuss reliability of such a model for a specific real dice. 1.4.2. Classical probability. Let us come back to the mathematical model for the fair dice. We consider the sample space fl = {1, 2,3,4,5,6} of all possible elementary events (each of them corresponding to one possible result of the experiment of throwing the dice). Then we can consider any event as a given subset A of fl. For example A = {1,3, 5} describes the result of getting odd number on the resulting side (we count the labels on the sides of the dice). Similarly, the set B = Ac = {2,4, 6} = fl \ A is the complementary event of getting even numbered points. The probability of both A and II will be 1/2. Indeed, = 1/2, where \ A\ means the number of elements of a set A. This leads to the following obvious generalization: Classical probability Let fl be a finite set with n = \ fl\ elements. The classical probability of the event corresponding to any subset A c fl is defined as P(A) \A\ W\ Such a definition immediately allows us to solve problems related to throwing several fair dice simultaneously. Indeed, we may treat this as throwing independently one dice many times and thus multiplying the probabilities. For example, the event of getting an odd sum of points on two dice is given by adding the probabilities of having an even number on the first one and odd number on the second one and vice versa. Thus the probability will be twice 1/2 ■ 1/2, which is 1/2 as expected. 1.4.3. Probability space. Next, we formulate a more general concept of probability covering also the unfair dice example above. We shall need a finite set fl of all possible states of a system (e.g. results of an experiment), which we call the sample space. 19 CHAPTER 1. INITIAL WARM UP l.D.5. There are 2n seats in a row in a cinema. We randomly seat n men and n women in the row. What is the probability that no two persons of the same sex sit next to each other? Solution. There are (2n)\ possible seatings. The number of seatings satisfying the given condition is 2(n!)2. For we have two ways for choosing the positions for men (thus also for women) - either all men sit on odd-numbered places (thus the women sit on even-numbered places), or vice versa. Among these places, both men and women are seated arbitrarily. The resulting probability is thus 2(n!)2 p{n) (2n)\ In particular, p(2) = 0.33, p(5) = 0.0079, p(8) = 0. 00016. □ 1. D.6. Five persons enter an elevator in a building with eight floors. Each of them leaves the elevator at any floor with the same probability. What is then the probability, that i) all of them leave at sixth floor, ii) all of them leave at the same floor, iii) each of them leaves at a different floor. Solution. The sample space of possible events is the space of all possible ways of leaving the elevator by 5 people. There are 85 of them. In the first case there is only one favourable outcome, thus the probability is ^. In the second case there are eight favourable outcomes, thus the probability is ^. In the third case, the number of favourable outcomes is given by a five-element variation of eight elements (we choose five floors among eight where some person leaves the elevator and then we choose the order in which they leave the chosen floors). The probability is then (see 1.3.2 and 1.3.4) 8' 7-'5; 5 '4 = 0.205078125. 85 □ l.D.7. Randomly choose a positive integer smaller than 105. What is the probability that it will consist only of the digits 0,1,5 and that it will be divisible by 5? (Recall that the first digit must not be 0.) Solution. There are 105 — 1 positive integers smaller than 105. Numbers satisfying the condition must begin with either 1 or 5, and end with 0 or 5. Thus there are 2 ■ 33 ■ 2 five digit favourable numbers. There are 2 ■ 32 ■ 2 four digit favourable numbers, 2-3-2 three digit favourable numbers, 2-2 two digit Further, the space of all possible events is given as the set A of all subsets in fl. Finally, we need the function describing the probabilities of occurrence of individual events: Probability function Let us consider a non-empty fixed sample space fl. The probability function P : A —> R satisfies (1) P(fl) = 1 (2) 0 < P(A) for all events A (3) P(A U B) = P(A) + P(B) whenever A n B = 0. Notice that the intersection An B describes the simultaneous appearance of both events, while the union Au B means that at least one of events A and B appear. The event Ac = fl \ A is called the complementary event. There are some further straightforward consequences of the definition for all events A, B: (4) P(A) = 1 - P(AC) (5) P(0) = 0 (6) P(A) <1 for all events A (7) P(A) < P(B) whenever A c B (8) P{A\jB) = P{A) + P{B)-P{Ar\B) The proofs are all elementary. For example, A U (Ac) = fl and thus (3) implies (4). Similarly, we can write A = (A \ B) U (A n B) and AuB = (A\B)u(B\A)u(AnB) with disjoint unions of sets on the right hand sides. Thus, P(A) = P(A\B)+P(AnB) and P(A U B) = P(A \B) + P(B \A) + (AnB) by (3), which implies the last equality. The remaining three claims are even simpler. All these properties correspond exactly to our intuition how probability should behave. Probability should be always a real number between zero and one. The event fl includes all possible results of the experiment, so it must have probability one. No result appears with probability zero, the probabilities of disjoint events should add, etc. 20 CHAPTER 1. INITIAL WARM UP favourable numbers, and one one digit favourable number. In total there are 2 ■ (33 + 32 + 31 + 1) ■ 2 + 1 = 2 ■ (34 - 1) + 1 = 2 ■ 34 — 1 favourable numbers. According to classical probability, we obtain the probability as 1q5_1 = 0.0016. □ l.D.8. From a sack with five white and five red balls, we draw in succession three balls at random without returning the balls back to the sack. What is the probability that two of them are white and one is red? Solution. Divide the event into a union of three disjoint events, according to in which turn we draw the red ball. The probability that the red ball is drawn as third, second, or first, respectively, is : ^ ■ § ■ §, ^ ■ § ■ f, ^ ■ § ■ f. In total ^. Another solution. Consider the number of all possible triples of drawn balls, (g°). There are ■ @ of triples with exactly two white balls (two white balls can be drawn in (2) ways, and one red ball can join them in five ways). The required probability is then Q ■ ©/(g0) = j|. We could forget the order, in which the balls were drawn, because every order has the same probability of being drawn. Thus there is 3! more both, favourable as well as total events and their ratio remains unchanged. □ l.D.9. From a hat where there are five white, five red and six black balls we draw balls randomly (and do not return the drawn balls back). What is the probability that the fifth drawn ball is black? Solution. We will solve a more general problem, the probability that the i-th drawn ball is black. This probability is the same for all 2, 1 < i < 16 - we can imagine that we draw all balls one by one, and every such sequence (from the first drawn ball to the last one) consisting of five white, five red and six black has the same probability of being drawn. Thus we can use classical probability. There are P(5,5,6) = 5,1^!-6, of such sequences. The number of sequences where there is a black ball on the 2-th place, the rest arbitrary, equals to the number of arbitrary sequences of five white, five red and five blackballs. That is, P(5,5,5) P(5,5,5) 15! 15! 5!5!5! Thus the probability 16! P(5,5,6) 5!5!5!7 6!5!5! 8 Of course, the classical probability on the sample space fl is an example of a probability function. The fact that the set of all events A is closed upon union, intersection and taking the complement has been essential in our exposition above. This will continue in all our discussion on probability in the sequel. Thus we could talk about more general spaces of events A in the sets of all subsets in the sample space. We will return to this and more serious generalizations in chapter 10. 1.4.4. Summing probabilities. By using mathematical induction, the additivity of probability is easily extended to any (finite) number of mutually exclusive ' events Ai c fl, i = 1,..., n. That is, P(\JizIAi) = Y^P(Ai □ whenever A{ n Aj = 0, for alii ^ j, i, j = 1,..., n. Indeed, 1.4.3(3) is the result for n = 2. If we assume the validity of the formula for some fixed n, then the union of any n+1 events A0, A1,..., An can be split into the union of A0 and ^4;l U... An. Then by the induction assumption, together with 1.4.3(3) again, the result follows. In general, the summing of probabilities of event occurrences is much more difficult. The problem is that whenever the events are mutually compatible, the possible results in their intersection are counted multiple times. We have seen the simplest case of two mutually compatible events A and B in 1.4.3(8). For classical probability, it reduces just to counting elements in subsets. Indeed, those elements that belong to both the sets A and B count in the formula P(A U B) = P(A) + P(B) - P(A n B) twice and thus we have to subtract them once. Now, we look at the general case. The approach of interactive inclusion and exclusion (potentially too many) elements in some count is a standard method in combinatorics known as the inclusion-exclusion principle. We shall exploit this method in our general finite probability spaces. As we shall see, this is an example of a mathematical theorem, where the hard part is to understand (and find) the formulation of the result. The proof is then relatively simple. The diagram explains the situation for three sets A, B, C for classical probability: P(AuB U C) = P(A) + P(B) + P(C) - P(A n B) - P(A n c) - P(B n c) + P(A n b n c). Clearly, the probabilities are given by first counting the elements in each set and adding. Then we subtract the sum of those in intersections of pairs of sets, since those elements are counted twice. But we must then add in the number of elements in the intersection of all three. 21 CHAPTER 1. INITIAL WARM UP 1.D.10. Inclusion-exclusion principle. A secretary has to i'/, send six letters to six different people. She puts the fr letters in the envelopes randomly. What is the probability that at least one person receives the correct intended letter? Solution. We compute the probability of the complementary event - no person receives the correct letter. The sample space corresponds to all possible ordering s of six envelopes. If we denote both the letters and the envelopes by numbers from one to six, then all the favourable events (no letter is assigned to the corresponding envelope) correspond to such ordering s of six elements, where the i-th element is not at the i-th place (i = l,...,6). These are the orderings without a fixed point. We compute the number of such orderings using the inclusion-exclusion principle. If we denote by M{ the set of permutations such that i is a fixed point (note that permutations in M{ can also have other fixed points), then the resulting number d of permutations without a fixed point is d = 6!- |MiU---UM6| The number of elements in the intersection Mi1 n • • • n Mik, k = 1,..., 6, is (6 — k)\ (the order of the elements i\,..., ik is fixed, the remaining 6—k can be ordered arbitrarily). Using the inclusion-exclusion principle we have '6> IHaUSlON-iXCLIA&lDN TZINCIPL£-. |MiU---UM6| = ^(-1) fc+i and thus for the number d we obtain the relation = 6!-£(-l) fc+i k=l jtr ' ^ k\ k=0 x ' k=0 The probability that no person receives "his" letter is then (-If _ _53_ 2-^ h\ ~ 144' k=0 The probability we were asked for is k=0 k\ 91 144' We shall now follow the same idea in order to write down the formula in the following theorem. It seems plausible that such a formula should work with proper coefficients of the sums of probabilities of intersections of more and more events among Ai,..., Ak, at least in the case of classical probability. The reader will perhaps appreciate that a quite straightforward mathematical induction will verify the theorem in full generality. 1.4.5. Theorem. Let A±,... ,Ak G A be arbitrary events over the sample space Q with a set of events A Then k k-1 k p{yjkl=^) = £ P{At) -EE n aj) i=l i=l j=i-\-l k-2 k-1 k + E E E Pi^nA.nAe) i=ij=i+ie=j+i + (-i)k-1P(A1nA2n---nAk). Proof. For k = 1 the claim is obvious. The case k = 2 is the same as the equality 1.4.3(8), which we have already proved. Assume that the theorem holds for any number of events up to k, where k > 1. Now we can work in the induction step with the formula for k + 1 events, where the union of the first k of them are considered to be the A in the equation 1.4.3(8) and the remaining event is considered to be the B: P(U™Ai) = P((ut=1Ai)uAk+1) = E((-i)J'+1 E J^n-.-n^ j=l ^ l 0, then P(A1 n A2) = P{A2)P{A-,\A2) = P(A{)P(A2\A{). All these numbers express (in a different manner) the probability that both events Ai and A2 occur. For instance, in the last case we first look whether the first event occurred. Then, assuming that the first has occurred, we look whether the second also occurs. Similarly, for three events A\, A2, A3 satisfying P(A1 n A2 n A3) > 0 we obtain P(A! nA2n A3) = p(^i)P(^2|^i)P(^3|^i n A2). The probability that three events occur simultaneously can be computed as follows. Compute the probability that the first occurs, then compute the probability that the second occurs under the assumption that the first has occurred. Then compute the probability that the third occurs under the assumption that both the first and the second have occurred. Finally, multiply the results together. In general, if we have k events A1,... ,Ak satisfying P(A1 n ■ ■ ■ n Ak) > 0, then the theorem says P(Axc\- ■ -nAk) = p(Ax)P(A2\Ax)- ■ -p^fcl^in- ■ -a4fc_i Notice that our condition that P(A1 n • • • n Ak) > 0 implies that all the hypotheses in the latter formula have got non-zero probabilities and thus all the conditional probabilities make sense. Indeed, each A{ is at least as big as the intersection and thus its probability is at least as big, thus non-zero, see 1.4.3(7). 1.4.9. Geometric probability. In practical problems, the sample space may not be a finite set. The set A of all events may not be the entire set of all subsets in fl. To generalise probability to such situations is beyond our scope now, but we can at least give a simple illustration. Consider the plane R2 of pairs of real numbers and a subset n with known area fl. Events are represented by subsets A c fl. For the event set A we consider some suitable system of subsets for which we can determine the area. An event A then occurs if a randomly chosen point from fl belongs to 25 CHAPTER 1. INITIAL WARM UP a correct password and Michael mistyped. According to the problem statement, the probability of the first event is 1/2 and the probability of the second event is 1/20. In total P(Ai) = | ■ = (we multiply the probabilities, since the events are independent). Further we have (directly from the problem statement) P(A2 In total P(A) P{A{) + P(A2) = i + i = We can evaluate P(Ai\A) P(Ai 40 1 21' □ The method of geometric probability can be used in the case that the given sample space consists of some region of a line, region, space (where we can determine (respectively) length, area, volume,...). We assume that the probability, is equal to the ratio of the area of the subregion to the area of the sample space. 1.D.16. From Edinburgh Waverley station trains depart ev-ery hour (in the direction to Aberdeen). From Aberdeen to Edinburgh they also depart every hour. Assume that the trains move between these two stations with an uniform speed 72 km/h and are 100 meters long. The trip takes 2 hrs in either direction. The trains meet each other somewhere along the route. After visiting an Edinburgh pub, John, who lives in Aberdeen, takes the train home and falls asleep at the departure. During the trip from Edinburgh to Aberdeen he wakes up and randomly sticks his head out of the train for five seconds, on the side of the train where the trains travel in the opposite direction. What is the probability that he loses his head? (We are assuming that there are no other trains involved.) Solution. The mutual speed of the oncoming trains is 40 metres per second, the oncoming train passes John's window for two and a half seconds. The sample space of all outcomes is thus the interval (0,7200). During John's trip two trains pass by John's window in the opposite direction. Any overlap of the 2.5 seconds of the passing time interval with the 5 second time interval when John's head might be sticking out is fatal. Thus, for each train, the space of "favourable" outcomes is an interval of length 7.5 seconds somewhere in the sample space. For two trains, it is double this amount. Thus the probability of losing the head is 15/7200 = 0.002. □ 1.D.17. In a certain country, a bus departs from town A to town B once a day at a random time between eight a.m. and the subregion determined by A, otherwise the event does not occur. Consider the problem of randomly choosing two numbers a < b in the interval [0,1] C K. All values a and b are chosen with equal probability. The question is "what is the probability that the interval (a, 6) has length at least one half?" The choice of points (a, 6) is actually the choice of a point [a, b] inside of the triangle fl with vertex points [0,0], [0,1], [1,1] (see the diagram). We can imagine this as a description of a problem where a very tired guest at a party tries to divide a sausage with two cuts into three pieces for himself and his two friends. What is the probability that the middle part will be at least half of the sausage? Thus we need to determine the area of the subset which corresponds to points with b > a + |, that is, the interior of the triangle A bounded by the points [0, §], [0,1], [|, 1]. We findP(^) = (l/8)/(l/2) = i. Similarly, if we ask for the probability that some of the three guests will get at least half of the sausage, then we have to add the probabilities of two other events: B saying a > 1/2 and C given as & < 1/2. Clearly they correspond to the lowest and the most right top triangles and thus they have got probabilities 1/4, too. Thus the requested probability is 3/4. Equiv-alently we could have asked for the complementary event "all of them get less than a half which clearly corresponds to the middle triangle and thus has probability 1/4. Try to answer on your own the question "what is the minimal prescribed length £ such that the probability of choosing an interval (a, 6) of length at least £ is one half?" Monte Carlo methods. One efficient method for computing approximate values is simulation by the relative occurrence of a chosen event. We present an example. Let fl to be the unit square with vertices at [0,0], [1,0], [0,1], and [1,1]. Let A be the intersection of fl with the unit disk centred at the origin. Then area A = \n. Suppose we have a reliable generator of random numbers a and b between zero and one. We then compute relative frequencies of how often a2 + b2 < 1. That is, that [a, b] e A. Then the result (after a large number of attempts) should approximate the area of a quarter unit circle, that is tt/4 quite well. (Draw a picture!) 26 CHAPTER 1. INITIAL WARM UP eight p.m. Once a day in the same time interval another bus departs in the opposite direction. The trip in either direction takes five hours. What is the probability that the buses meet, assuming they use the same route? Solution. The sample space is a square 12 x 12. If we denote the time of the departure of the buses as x and y respectively, then they meet on the trail if and only if \x — y < 5. This inequality determines the region in the square of "favourable events". This is a complement to the union of two right-angled isosceles triangles with legs of length 7. Its area in total is 49, so the area of the "favourable part" is 144—49 = 95. The probability is p ■ -55- = 0 66 144 u-l"j- Kolocava - (AzMcnwp □ 1.D.18. A rod of length two meters is randomly divided into three parts. Determine the probability that at least one part is at most 20 cm long. Solution. Random division of a rod into three parts is given by two points of the cut, x and y (we first cut the rod in the distance x from the origin, we do not move it and again cut it in the distance y from the origin). The sample space is a square C with side 2 m. two metex. jak 2o*-1'i*- _ Si 2oz ~1oo •fe 1$ zo If we place the square C so that its two sides lie on axes in the plane, then the condition that at least one part is at most 20 cm determines in the square a subregion O: 0 = {(x,y) e C\x < 20 Wx > 180 Vy < 20 V y > 180 V \x - y\ < 20}. Of course, the well-known formula for the area of a circle with radius r is 7rr2, where it = 3.14159.... It is an interesting question - why should the area of a circle be a constant multiple of the square of its radius? We will be able to prove this later. Experimentally, we can hint at this by the approach as above using squares of different sizes. Numerical approaches based on such probabilistic principle are called Monte Carlo methods. 5. Plane geometry So far we have been using elementary notions from the geometry of the real plane in an intuitive way. Now we will investigate in more detail how to S deal with the need to describe "position in the plane" and to find some relation between positions of distinct points in the plane. Our tools will be mappings. We will consider only mappings which, to (ordered) pairs of values (x, y), assign pairs (w, z) = F(x, y). Such a mapping will consist of two functions w(x, y) and z(x, y), each depending on two arguments x, y. This will also serve as a gentle introduction to the part of mathematics called Linear algebra, with which we will deal in the subsequent three chapters. 1.5.1. Vector space R2. We view the "plane" as a set of pairs of real numbers (x, y) e R2. We will call these pairs vectors in wise set u + v = (x + x', y + y'). Since all the properties of commutative groups hold for individual coordinates, these hold for our new vector addition too. In particular there exists a zero vector 0 = (0,0), such that v + 0 = v. We use the same symbol 0 for the vector and for the number zero on purpose. The context will always make it clear which "zero" it is. Next we define scalar multiplication of vectors. For a e R and u = (x, y) e R2, we set a ■ u = (ax, ay). Usually we will omit the symbol ■ and use the juxtaposition of the symbols a v to denote the scalar multiple of a vector. We can directly check other properties for scalar multiplication by a or & and addition of vectors u and v. For instance a (u + v) = au + av, (a + b)u = au + bu, a(bu) = (ab)u. We use the same symbol + for both vector addition and scalar addition. Now we take a very important step. Define vectors e1 = jji I. (1, 0) and e2 = (0,1). Every vector can then be written uniquely as u = (x,y) =xe1+ye2. The expression on the right is called a linear combinations of vectors e\ and e2. The pair of vectors e = (ei, e2) is called a basis of the vector space R2. 2. For such vectors we can define addition "coordinate-that is, for vectors u = (x,y) and v = (x',y') we 27 CHAPTER 1. INITIAL WARM UP As we observe, this subregion has area ^ times the area of the square. □ E. Plane geometry Let us start with several standard problems related to lines in plane: l.E.l. Write down the general equation of the line p : x = 2 - t, y = 1 + 3t, t e R. Solution. By eliminating t, the solution is 3a; + y — 7 = 0. □ l.E.2. We are given a line p : [2,0] +i(3,2), t e R. Determine the general equation of this line. Determine its intersection with the line q : [-1,2] + s(l,3), s G R. If we choose two non-zero vectors u, v such that neither of them is a multiple of the other, then they too form a basis of R2. LINEAZ MHbMTION These operations are easy to imagine if we consider the vectors v to be arrows starting at the origin 0 = (0,0) and ending at the position (x, y) in the plane. The addition of two such arrows is then given by the parallelogram law: Given two arrows starting at the origin, their sum is the arrow given by the diagonal arrow (also starting at the origin), of the parallelogram with the two given arrows as adjacent sides. Multiplication by a scalar a corresponds to stretching the arrow to its a-multiple. This includes negative scalars, where the direction of the vector is reversed. Solution. The coordinates of the points on the first line are given by the parametric equations as a; = 2 + 3t and y = 0 + It. By eliminating t from the equations we obtain the equation: 2a; - 3y - 4 = 0. We obtain the intersection of p with the line q by substituting the points of q in parametric form into the equation for p: 2(-l + s) -3(2 +3s) 0. Here we obtain s = —12/7 and from the parametric equation of q we obtain the coordinates of the intersection P: 19 22 "T □ 1.5.2. Points in the plane. In geometry, we should distinguish between the points in the plane (as for in-(N ~-±Z-$ stance the chosen origin O above), and the vectors as the arrows describing the difference between two such points. We will work in fixed standard coordinates, that is, with pairs of real numbers, but for better usage we will always strictly distinguish vectors written in parentheses and denoted for a moment by bold face letters like u, v, instead of brackets (which we use for coordinates of points in the plane. Points are denoted by capital latin letters). Even if we view the entire plane as pairs of real numbers in R2, we may understand adding two such couples as follows. The first couple of coordinates describes a point P = [x,y], while the other one denotes a vector u = (ui, u2). Their sum P + u corresponds to adding the (arrow) vector u to the point P. If we fix the vector u, we call the resulting mapping P = [x,y] i-> P + u = [x + ui, y + u2] l.E.3. Determine the intersection of the lines p:x + y-4 = 0, q:x = -l + 2t, y = 2 + t, teR. Solution. Eliminate t to obtain q : x — 2y = — 5. Then solve for x and y. The intersection has coordinates x = 1, y = 3. □ the shift of the plane (or translation) by the vector u. Thus, the vectors in R2 can be understood in more abstract way as the shifts in the plane (sometimes called the/ree vectors in elementary geometry texts). The standard coordinates on R2, understood as pairs of real numbers are not the only ones. We can put a coordinate system on the plane with our choosing. 28 CHAPTER 1. INITIAL WARM UP I.E.4. Find the equation of the line p, which goes through the point [2, 3] and is parallel with the line x — 3y + 2 = 0. Find a parametric equation of the line q which goes through the points [1,3] and [—2,1]. Solution. Every line parallel to the line x — 3y + 2 = 0 is given by the equation x — 3y + c = 0 for some eel, Since the line q goes through the point [2, 3], c = 7 by putting x = 2 and y = 3. We can immediately give a parametric equation of the line q q : [1, 3] + t (1 - (-2), 3 - 1) = [1, 3] + t (3, 2), teR. □ l.E.5. Consider the following five lines. Determine if any two of the lines are parallel to each other. Pi : 2x + 3y — 4 = 0, p2 : x — y + 3 = 0, 3 p3 : -2x + 2y = -6, p4 : -x - - y + 2 = 0, p5:x = 2 + t,y = -2-t, teR Solution. It is clear that -2 ■ (-x - | y + 2) = 2x + 3y - 4. Thus pi and p4 describe the same line. p2 can be rewritten as —2x + 2y — 6 = 0, thus the lines p2 and p$ are parallel and distinct. By eliminating t, the line p$ has an equation x + y = 0, which is not parallel to any other line. □ I.E.6. Determine the line p which is perpendicular to the line q : Qx — 7y + 13 = 0 and which goes through the point [-6,7]. Solution. Since the direction vector of p is perpendicular to q, we can write the result immediately as p : x = -6 + 6t, y = 7 - 7t, teR. □ l.E.7. Give an example of numbers a, b e R, such that the vector u is a normal to AB where A = [1,2], B = [2b, b], u = (a — b, 3). Solution. The direction of AB is (2b — 1, b — 2) (this vector is always nonzero), and therefore the vector (2 — b, 2b — 1) is normal to AB. Setting 2-b = a-b, 2b - 1 = 3, we obtain a = b = 2. □ Coordinates in the plane R2 Choose any point in the plane, and call it the origin O. All other points P in the plane can be identified with the vectors (arrows) OP* with their tails at the origin. Choose any point other than O and call it E1. This defines the vector ei = OE1 = (1,0). Choose any other point E2 so that O, Ei, E2 are distinct and not collinear. This defines the vector e2 = OE2 = (0,1). Then every point P = (a, b) in the plane can be described uniquely as P = O + ae1 + be2 for real a, b, or in vector notation, op = aei + be2. Translation, by adding a fixed vector, can be used either to shift the coordinate system (including the origin), or to shift sets of points in the plane. Notice that the vector corresponding to the shift of the point P into the point Q is given as the difference Q — P (in any coordinates). Thus we shall also use this notation for the vector P$ = Q — P. For each choice of coordinates, we have two distinct lines for the two axes. The origin is the point of intersection. Other way round, each choice of two non-parallel lines, together with the scales on each of them defines coordinates in the plane. They are called affine coordinates. Clearly each nontrivial triangle in the plane with vertices O, Ei, E2 defines coordinates where this triangle is denned by points [0, 0], [1,0], [0,1]. Thus we may say that in the geometry of plane, "all nontrivial triangles are the same, up to a choice of coordinates". 1.5.3. Lines in the plane. Every line is parallel to a (unique) j*^n line through the origin. To define a line, we "4^1. therefore need two ingredients. One is a non-/^tPSj^ zero vector which describes the direction of the ^^S^js~ iine] Call it v = (vi,v2). The other is a point Po = [^o, yo] on the line. Every point on the line is then of the form P(t) = P0 + tv, teR. 29 CHAPTER 1. INITIAL WARM UP I.E.8. Determine the relative position of the lines p, q in the plane for p : 2x — y — 5 = 0, q : x + 2y — 5 = 0. If they are not parallel, determine the coordinates of the intersection. Solution. Eliminating y yields (Ax—2y— 10)+(a;+2y— 5) = 0, from which x = 3, and hence y = 1. Hence [3,1] is the (unique) intersection and the lines are not parallel. □ 1. E.9. Planar soccer player shoots a ball from the point F = [1, 0] in the direction (3,4) hoping to hit the goal which is a line segment from the point A = [23,36] to B = [26,30]. Does the ball fly towards the goal? Solution. The ball travels along the line [1,0]+ t(3,4). The line AB has the parametrization [23,36] + u(3, —6), where B = [23,36] + 1 ■ (3, —6). The intersection of these lines is given by equations 1 + 3t = 23 + 3u and At = 36 — Qu, with the solutions = 8,u = 2/3. AsO < 2/3 < 1 the intersection is in the segment AB. The ball hits the goal. Another solution. It is sufficient to consider only the slopes of the vectors FA, (3,4), F~B. Since §§>§>§§, the player scores. □ 1.E.10. Consider the plane R2 with the standard coordinate (Jg3> system. A laser ray is sent from the origin [0,0] #in the direction (3,1). It hits the mirror line p given by the equation p: [4, 3] + i(-2,1) and then it is reflected (the angle of incidence equals the angle of reflection). At which points does the ray meet the line q, given by q: [7,-10] +t(-1,6)? Solution. In principle, there could be none, one or two intersections of a ray with a line. First, we inspect possible intersection of the line q with the ray, before the ray touches the mirror line p. In the standard way, we find the intersection of the line q with the line of the initial movement of the ray: [0, 0] + t(3,1). The intersection point is [0,0] + §(3,1) = [fg, §f ]. The ray meets the mirror at the point [6,2], that is [0, 0] + 2(3,1). As 0 < § < 2 we conclude, that the ray meets the line q before the reflection point. Let us concentrate now on the rebound ray. The angle between the line p and the direction of the ray can be calculated Parametric description of a line We may understand the line p as the set of all multiples of the vector v, shifted by the vector (x0 ,y0). This is called the parametric description of the line: p = {p e R2; P = P0 + «v, t e R}. The vector v is called direction vector of the line p. In the chosen coordinates, the point P(t) = [x(t), y(t)] is given as x = x(t) = x0 + t «i, y = y(t) = y0 + tv2 We can eliminate t from these two equations to obtain -v2x + vxy = -v2xq + wiy0- Since the vector v = (v1,v2) is non-zero, at least one of the numbers v±, v2 is non-zero. If one of the coordinates v± or v2 is zero, then the line is parallel to one of the coordinate axis. Implicit description of a line The general equation of the line in the plane is (1) ax + by = c, with a and b not both zero. The relation between the pair of numbers (a, 6) and the direction vector of the line v = (vi,v2) (2) avi + bv2 = 0. We can view the left hand side of the equation (1) as a function z = f(x, y) mapping each point [x, y] of the plane to a scalar and the line corresponds to the prescribed constant value of this function. We shall see soon that the vector (a, 6) is perpendicular to the direction of the line. 6M7H- OF A TMCTIOhJ f(\f) 30 CHAPTER 1. INITIAL WARM UP using 1.5.7 as COS if ■■ (-2,1)- (3,1) _ v2 v/5v/To 2 ' therefore ip = —45°. The rebounded ray is thus perpendicular to the entering ray and its direction is (1, —3). (Becareful with the orientation! The vector of the direction can also be obtained via reflection (axial symmetry) of the vector perpendicular to the line p.) The ray meets the mirror at the point [6,2], thus the reflected ray has the equation [6,2] +f(1,-3), t > 0. The intersection of the line given by the rebounded ray with the line q is at the point [4,8]. This point lies on the opposite side of the line p to both the incident and reflected rays, (t = —2). Thus the rebound ray does not meet the line q. All together, there is one intersection of the ray with the line g, namely [f|,|i]. □ Remark. The reflection of a ray in three-dimensional space is studied in the exercise 3.F.4. l.E.ll. A line segment of length 1 started moving at noon with a constant speed of 1 meter per second in the direction (3,2) from the point [—2,0]. Another line segment of length of 1 has started moving also at noon from the point [5, —2] in the direction (—1,1), but with double speed. Will they collide? (Segments are oriented in direction of their movements.) Solution. Lines along which the segments are moving can be described parametrically: p : [-2,0]+r(3,2), q : [5,-2] + s(-l,l). The equation of the line p is 2x - 3y + 4 = 0. Substituting the parametric equation of the line q yields the intersection point P = [1,2]. Now we choose a single parameter t for both lines so that the corresponding point on p and on q respectively, describes the position of the initial point of the first and second line segment respectively at the time t. At time 0 the initial point of the the first line segment is at [—2,0], the second is at [5, —2]. During time t (measured in seconds) the first segments travels t units of length in the direction (3,2), the second segments travels It units of length in the direction (—1,1). Thus Suppose we have two lines p and q. We ask about their intersection p n q. That is a point [x, y] which satisfies the equations of both lines simultaneously. We write them as ax + by = r cx + dy = s. (3) Again, we can view the left side as a mapping F, which to every pair of coordinates [x, y] of the point P in the plane assigns a vector of values of two scalar functions f± and fa given by the left sides of the particular equations (3). Hence we can write our two scalar equations as one vector equation -F(v) = w, where v = (x, y) and w = (r, s). Notice that the two lines are not parallel if and only if they have a unique point in their intersection. 1.5.4. Linear mappings and matrices. Mappings F with (JfrH R2. This can be also described with words — linear combination of vectors maps to the same linear combination of their images, that is linear mappings are those mappings which preserve linear combinations. We have already encountered the same behaviour in the equation 1.5.3(1) for the line, where the linear mapping in question was / : R2 —> R and its prescribed value c. That is also the reason why the values of the mapping z = f(x, y) are on the image depicted as a plane in R3. We can write such a mapping using matrices. By a matrix we mean a rectangular array of numbers, for instance A- a b c d We speak of a (square 2x2) matrix A and (column) vector v. Multiplication of matrices, row by column, is defined as follows: A ■ ax + by cx + dy We introduce some more tools for vectors and matrices. Our goal is to compute with matrices in a similar way as we do it with scalars. We define the product C = A ■ B of two square matrices A and B applying the above formulas to individual columns of the matrix B and writing the resulting column vectors again as columns in the matrix C. In order to multiply two vectors v and w in a similar way, we can write the vector w as a row of numbers (the transposed vector) wT. Then the product of wT and v is 31 CHAPTER 1. INITIAL WARM UP the corresponding parameterisations are: (3,2) P -2,0 \+t [5, -21 + It V32 + 22' (-1.1) \/(-l)2 + l2' The initial point of the first segment enters the point [1,2] at time 11 = v7!^ s, the initial point of the second segment at time 12 = \/2 s - more than a half second sooner. At the time 12 + \ = V% + \ < ti the ending point of the second segment moves away from P. Thus when the initial point of the first segment enters the point P, the ending point of the second segment is already away and the segments do not collide. □ We return for a while to complex numbers. The complex plane is basically a "normal" plane, where we have something extra: I.E. 12. Interpret multiplication by the imaginary unit i and complex conjugation as geometrical transfor- Solution. The imaginary unit i corresponds to the point (0,1). Notice that multiplying any number z = a + i b by the imaginary unit i gives the result i ■ (a + i b) = —b + ia. Under the interpretation in the plane, this is a rotation around the origin of the segment joining the origin to the point z through a right angle counterclockwise (cf. 1.1.4). Taking the complex conjugate is a reflection through the axis of real numbers: z = (a + i b) H> (a — i b) = z. □ 1.E.13. Determine the sum of the three angles, which are between the vectors (1,1), (2,1) and (3,1) respectively and the x-axis in the plane R2. Solution. If we view the plane R2 as the Gauss plane (of complex numbers), then the given vectors correspond to complex numbers 1 + i, 2 + i and 3 + i. We are to find the sum of their arguments. According to de Moivre's formula this equals the argument of their product. Their product is (1 +j)(2 + i)(3 + i) = (l + 3i)(3 + i) = m, which is a purely imaginary number with argument 7r/2. So the sum we are looking for is it / 2. □ T W V rx + sy. We call this the scalar product of vectors v and w. We can easily check the associativity of multiplication (do it for general matrices A, B and a vector v in detail): (A-B)-v = A-(B-v). Instead of a vector v we can write any matrix C of correct size. In a similar way, distributivity also holds: A ■ (B + C) = A ■ B + A ■ C, But the commutativity does not hold. For example, 0 1W0 0 o oj'lo 1 0 1\ fo 0\ (0 1 o oj'lo lj'lo 0 0 0 0 0 This last product also shows the existence of divisors of zero. Notice that the mapping defined by multiplication of vectors with a fixed matrix is a linear mapping, i.e. it respects linear combinations. With matrices and vectors we can write the equations for lines and points respectively as A-v = 1.5.5. Determinant of matrix. The procedure of finding the intersection of lines described in 1.5.3 fails in some special cases. For instance the intersection of two parallel lines is either empty (when the lines are parallel but distinct) or the line itself (when the lines are identical). This condition occurs when the ratios a/c and b/d are the same, that is (1) ad — be = 0. Note that this expression already takes care of the cases, where either c or d is zero. The expression on the left in (1) is called the determinant of the matrix A. We write it as a b c d detA- ad — be. Our discussion can be now expressed as follows: Proposition. The determinant is a real valued function det A defined for all square 2x2 matrices A. The (vector) equation A-v = u has a unique solution for v if and only (/det A =f^ 0. So far, we have worked with pairs of real numbers in \\ the plane. Equally well we might pose exactly the same questions for points with integer coordinates and lines with equations with integer coefficients. Notice that the latter requirement is equivalent to considering rational coefficients in the equations. We have to be careful which properties of the scalars we exploit. In fact, we needed all the properties of the field of scalars when discussing the solvability of the system of two equations 32 CHAPTER 1. INITIAL WARM UP Next, we shall exercise the matrix calculus in the plane. We refer to 1.5.4 for the basic concepts. First we experience the operations of addition and multiplication on matrices, then we come to geometric tasks. 1.E.14. Simplify (A - B)T ■ 2C ■ u, where 0 5\ (2 0 -2 2)>B-\-l 1 f2 -2\ (3 A 5J'«=U Solution. By substituting in A = C A-B -2 5 -1 1 2C = (A-By 4 -4 v8 10 and by matrix multiplication we obtain '-2 -l\ (A -4 5 1 ) ' \ 8 10 '-52 64 (A — B)T ■ 2C ■ u = □ 1.E.15. Give an example of matrices a and B for which (a) (a + B) ■ (a - B) a ■ a - B ■ B; (b) (a + B) ■ (a + B) a ■ a + 2A ■ B + B ■ B. Solution. For any two square matrices a and B we have (a + B) ■ (a- B) = a- a- a- B + B ■ a- B ■ B. The identity (a + b) ■ (a - b) = a - a - b ■ b is thus obtained if and only if — a ■ B + B - a is the zero matrix, that is if and only if the matrices a and B commute. An example of such matrices are thus pairs of matrices, which do not commute (the matrix of product is changed when we change the order of multiplied matrices). We can choose for instance a=\: :i, b 4 3 2 1 '1 2 v3 4 since with this choice is Notice that for any pair of square matrices A, B (A + B) ■ (A + B) = A- A + A- B + B ■ A + B ■ B. It follows that (A + B) ■ (A + B) = A ■ A + 2A ■ B + B ■ B if and only if A ■ B = B - A, as in the first case. □ — try to think it through. At least, we can be sure that the intersection of two non-parallel lines with rational coefficients is a point with rational coefficients again. The case of integer coefficients and coordinates is more difficult. We shall come back to this in the next chapter. In particular we shall see that the equation (1) with fixed integer coefficients a, b, c, d has a unique integer solution for all integer values (r, s) if and only if the determinant is ±1. 1.5.6. Aftine mappings. We now investigate how the matrix notation allows us to work with simple map-f/' pings in the affine plane. We have seen that matrix multiplication defines a linear mapping. Shifting R2 by a fixed vector w = (r, s) e R2 in the affine plane can be also easily written in matrix notation: P = x\ x + r P + w = , yj \y + s If we add a fixed vector to the result of a linear mapping then we have the expression n> A ■ v + w ax + by + r cx + dy + s In this way we have described all affine mappings of the plane to itself. Such mappings allow us to recompute coordinates which arise by different choices of origins or bases. We shall come back to this in detail later. HmiN6 TV/NT 12,121 1.5.7. The distance and angle. Now we consider distance. We define the length of the vector v = (x, y) to be ||v|| = \A2 + Immediately we can define notions of distance, angle and rotation in the plane. Distance in the plane The distance between the points P, Q in the plane is given as the length of the vector P$, i.e. \\Q—P\\. Obviously, the distance does not depend on the ordering of P and Q and it is invariant under shifts of the plane by any fixed vector w. The Euclidean plane is an affine plane with distance defined as given above. 33 CHAPTER 1. INITIAL WARM UP F : G x,y e R x,y e 1.E.16. Decide whether the mappings F, G : R2 -> R2 given by x \ fix — 3y -2x+5yy '2a; + 2y-4^ K4x - 9y + 3y are linear. Solution. For any vector (x,y)T e R2 we can express 'x\\ I 7 —3\ (x^ Kyj J \-2 5 J \Vj x\\ (2 2^ This implies that both mappings are affine. Recall that an affine mapping is a linear one if and only if the zero vector maps to zero. Since 30)-CO- °(G))-W- the mapping F is linear, the mapping G is not. Let us mention that j(x) = Ax and g(x) = Ax + b respectively, where x, b e R", A is a square matrix n x n, are general forms of a linear and affine mappings respectively. □ 1.E.17. Compute the lengths of the sides of the triangle with vertices A = [2,2], B = [3,0], C = [4,3]. Solution. Using the formula for the length of a vector F \\u\\ = \Zuf + u2,, u = (ui,u2) £ R2 we obtain the results \AB\ = \\A-B\\ = ^(2 - 3)2 + (2 - 0)2 = VE, \BC\ = ||b-C|| = x/(3 - 4)2 + (0 - 3)2 = a/To, |ylC| = 11^4. — C| | = a/(2-4)2 +(2-3)2 = \/5. □ I.E.18. Determine the angle between the two vectors (a) u= (-3,-2),« = (-2,3); (b) u= (2,6),« = (-3,-9). Solution. The sought angle 0 < ip < it can of course be computed from the formula (1) in 1.5.7. But note that the vector (—3, —2) can be obtained by changing the coordinates of the vector (—2,3) and multiplying one of them by the number —1. But these operations are used when we want to obtain the vector normal to a vector of direction of a given line (or vice versa). Vectors in the case (a) are thus perpendicular, that is ip = 7r/2. In the case (b), since —3 ■ (2,6) = 2 ■ (—3, —9), the vector u is a multiple of the vector v. If one vector is a positive multiple of another, the angle between these two is UlCUDBAhl DISTANCE V Angles are a matter of vectors rather than points in Euclidean geometry. Let u be a vector of length _ 1, at angle p measured counter-clockwise from the vector (1,0). In coordinates, u is at the _ unit circle and has first and second coordinates cos p, sin p respectively (this is one of the elementary definitions of the sine and cosine functions). That is, u = (cos p, sin p). This is compatible with — 1 < sirup < 1 satisfying (cos p)2 + (sin p)2 1. Angle between vectors The angle between two vectors v and v' can be in general described using their coordinates v = (x, y), v' = (x1, y') like this: n. xx' + yy' (1) cosy =...... . In our special case v = (1,0), the more general equation gives cos p ■ which is just the definition of the function cos p. The general case can be always reduced to this special one. First we notice that the angle p between two vectors u, v is always the same as the angle between the normalized vectors ^jj|[U and n^y-v. Thus we can restrict ourselves to two vectors on the unit circle. Then we can rotate our coordinates in such a way that the first of the vectors will become (1, 0). This means, it is enough to show that the scalar product is invariant with respect to rotations. We have already seen the expression xx' + yy' in the definition of the angle. We called it the scalar product of vectors. In the special case, when the scalar product is zero, we say that the vectors are perpendicular. Of course the best 34 CHAPTER 1. INITIAL WARM UP zero. If it is a negative multiple, as in our case, the angle is tt. □ 1.E.19. Determine the angle p between the two diagonals A3A7 and A5A10 of a regular dodecagon (polygon with twelve sides) A0A1A2 ... An. Solution. The angle does not depend neither on the size nor on the position of the given dodecagon. Choose the dodecagon inscribed in a circle with diameter 1. We can put Aq to [1,0] and then the vertices can be . identified with the twelfth roots of 1 in the complex plane. We can write A^ = cos(2A;7r/12) + jsin(2A;7r/12). Especially A3 = cos(7r/2) + - [0,1], A5 = cos(57r/6) + [-&,\],A7 = cos(77r/6) + ~ and A10 = isin(7r/2) - i isin(57r/6) = + \i - i(7tt/6) = _ I,; cos(57r/3) =isin(57r/3) = 1/2 - i& ~ [1/2,-^]. Using the formula (1) in 1.5.7 we finish the computation: 1 COS if = --j^=^=, 2^2 + ^ that is p = 75°. Alternative solution. This problem can be solved via method of synthetic geometry only. Denote the centre of the regular dodecagon by S and the intersection of the diagonals A3A7 and A5A10 by T. Now \ZA7A5AW\ = 45° (this is the inscribed angle which corresponds to the central angle A7SAW, which is a right angle), furthermore |Z^45^47^43| = 30° (again the inscribed angle corresponding to the central angle A5SA3, which is 60°). Thus the angle A5TA7 is then equal to a complement of the aforementioned angles to 180°, that is 105°. The deviation we are looking for is then 180°-105° = 75°. □ 1.E.20. Consider a regular hexagon ABCDEF with vertices labeled in the positive direction, centre at the point S = [1,0] and the vertex A at [0,2]. Determine the coordinates of the vertex C. Solution. The coordinates of the vertex C can be obtained by rotating the point A around the centre S of the hexagon through the angle 120° in the positive direction: _ /cos 120° -sinl20°\,. „ Q ° " I sin 120° cos 120° [A b) + b — I \/3 2 2 \/I _ I 2 2 [I-V3.-1- -1 + [1,0] example of perpendicular vectors of length 1 are the standard basis vectors (1,0) and (0,1). Notice that our formula for the angle between the vectors is symmetric in the two vector arguments, thus the angle p is always between 0 and it. We can easily imagine that not all affine coordinates are adequate for expressing the distance and thus for use in the Euclidean plane. Indeed, although we may choose any point O as the origin again, we want also that the basis vectors ei = OEi and e2 = OE2 perpendicular and of length one. Such basis will be called orthonormal. We shall see that the angles and distances computed in such coordinates will always be the same no matter which coordinates are used. 1.5.8. Rotation around a point in the plane. The matrix of any given mapping F : R2 —> R2 is easy to guess. If the result of applying the mapping is the matrix with columns (a, c) and (b, d), then the first column (a, c) is obtained by multiplying this matrix with the basis vector (1,0) and the second is the evaluation at the second basis vector (0,1). Rotation a&w/vo a toint in ~rm tiane^ We can see from the picture that the columns of the matrix corresponding to rotating counter-clockwise through the angle ip are computed as follows: " d) - (COS1) ^ ^ = c d J \ 0 / ysmf/' J yc a J \1 J \ cosip The counter-clockwise direction is called the positive direction, the other direction is the negative direction. Rotation matrix Rotation through a given angle ip in the positive direction about the origin is given by the matrix R^: \ ^ /cosip — sinf/A fx^ ,yj ^ \sinip cosip J \y. Now, since we now know how the matrix of the rota-1 tion in the plane looks like, we can check that rotation preserves distances and angles (denned by the equation (1) in 1.5.7). Denote the image of a vector v as Rii. x cos ip — y sin'*/)' x sin ip + y cos ip , 35 CHAPTER 1. INITIAL WARM UP □ 1.E.21. An equilateral triangle with vertices [1,0] and [0,1] lies entirely in the first quadrant. Find the coordinates of its third vertex. Solution. The third coordinate is [i + + ^tt] (we are rotating the point [1,0] through 60° □ around [0,1] in the positive direction). 1.E.22. An equilateral triangle has vertices at A = [1,1] and B = [2,3]. Its other vertex lies in the same half-plane as the point S = [0,0]. The triangle is rotated by 60° in the positive direction around the point S, to produce a new triangle. Determine the coordinates of the vertices of the new triangle. Solution. The points we are looking for have coordinates [-IV3, - ±], [i - iV3, iy/3 + ±], [1 - f A v/3 + f ]. □ 1.E.23. Find two matrices A such that A2 1 2 2 s/3 2 1 2 Hint: which geometric transformation in the plane is given by the matrix A2? Solution. A2 is the matrix of rotation through 60° in the positive direction, thus the matrices we are looking for are A = ± 2 2 which are the matrices of rotation through 30° or through 210°. □ 1.E.24. Reflection. Find the matrix of reflection in the plane through the line y = x (that is, find the matrix of the axial symmetry). Solution. The given reflection sends a;-axis into y axis and vice versa. Thus the reflection applied to a vector just transposes its coordinates, therefore the sought matrix is 0 1\ 1 o) A matrix of any linear mapping in R2 can be computed also in standard way: it is given by images of the vectors (1,0) (first column) and (0,1) (second column). In our case the images are (0,1) and (1,0). □ and similarly w' = ■ w for w = (r, s)T, and w' = (r', s')T. We can check that ||v'|| = ||v||, and that x'r' + y's' = xr + ys. The previous expression can be written using vectors and matrices as follows: (Rf ■ w)T(Rf ■ v) ä£, where R? The transposed vector (R^ ■ w)T equals wT is the so-called transpose of the matrix R^. That is a matrix, whose rows consist of the columns of the original matrix and similarly the columns consist of the rows of the original matrix. Therefore we see that the rotation matrices satisfy the relation ■ R^ = I, where the matrix I (sometimes we denote this matrix just as 1 and mean by this the unit in the ring of matrices) is the unit matrix 1 = 1 0 0 1 This leads us to a derivation of a remarkable claim — the matrix F with the property that F ■ = I (we will call such a matrix the inverse matrix to the rotation matrix R^) is the transpose of the original matrix. This makes sense, since the inverse mapping to the rotation through the angle t/j is again a rotation, but through the angle —ip. That is, the inverse matrix of i?T equals the matrix R-ib = cos(—ifj) — sin(—ifj) sin(—ip~) cos(—ip~) cos t/j sin t/j - sin ip cos ip It is easy to write the rotation around a point P = O +w, P = [r, s] again using a matrix. One just has to note that instead of rotating around the given point P, we can first shift P into the origin, then do the rotation and then do the inverse shift. We calculate: n> v — w n> Rjf, ■ (v — w) n> Rip ■ (v — w) + w cos ip (x — r) — sin ip (y — s) + r smijj (x — r) + cosip (y — s)) + s nutation a>n?om> With a Sum 36 CHAPTER 1. INITIAL WARM UP 1.E.25. Determine which linear mappings from R2 to R2 are given by the following matrices (that is, describe the geometrical meaning of the matrices): '1 0\ f-1 ,o or 2 o i, A1 = A3 VI 2 2 _s/2 2 V2 2 x\ (1 0 yj^lo o x\ /-l 0 J^U i Solution. Let (x,y)T stand for an arbitrary real vector. For the matrix Ai we have ,y)" W' which means that the linear mapping given by this matrix is the projection on the x axis. Similarly we can see that the matrix A2 determines the reflection with the respect to the y axis, since ' x ' * "x ' x \ _ l-x' The matrix A^ can be expressed in the form / cos p — sin p\ VsinijS cosp J for ip = 7r/4, thus it gives the rotation of the plane around the origin through the angle tt/4 (in the positive direction, that is counter-clockwise). □ 1.E.26. Show that the composition of an odd number of point reflections in the plane is again a point symmetry. Solution. The point reflection in the plane across the point S is represented with the formula X i-> S — (X — S), that is X i-> 25* — X. By repeated application of three point reflections across the points S, T and U respectively we obtain X ^ 2S-X ^ 2T-(2S-X) m- 2U-(2T-(2S-X)) = 2(U-T + S)-X, that is X ^ 2(U-T + S) - X, which is a point reflection across the point U — T + S. Composition of any odd number of point reflections can be reduced successively to a point reflection. (In principle, this is done by mathematical induction, try to formulate it by yourself). □ 1.E.27. Construct a (2n + l)-gon, if the middle points of all its sides are given. Solution. We use the fact that the composition of an odd number of point reflections is again a point reflection (see the previous exercise). Denote the vertices of the (2n + l)-gon we are looking for by Ai, A2,..., A2n+\ and the middle points of the sides (starting from the middle point of A1A2) by Si, 1.5.9. Reflection. Another well-known example of a length preserving mapping is reflection through a line. It is enough to understand reflection through a line that goes through the origin O. All other reflections can be derived using shifts and rotations. We look first for a matrix of reflection with respect to the line through the origin and through the point (cos ip, sin ip). Notice that 1 0 0 -1 \ IMP Any line going through the origin can be rotated so that it has the direction (1,0) and thus we can write general reflection matrix as Rip ■ Zq ■ R- where we first rotate via the matrix R-^ so that the line is in "zero" position, reflect with the matrix Z0 and return back with the rotation R^. Therefore we can calculate (by associativity of matrix multiplication): /cos ip \siaip (cos ip \siaip (cos2 ip \ 2 sin ip cos ip (cos 2-p sin 2ip I sin 2ip — cos 2ip — sm-p cos ip sm-p — cos ip — sin2 ip cos ip — simp cos ip - simp sin ip^ cos ip) sin'i/) COSf/> 2 sin ip cos ip -(cos2 ip — sin2 ip) The last equality follows from the usual formulas for trigonometric functions: (1) sin 2ip = 2 sin ip cos ip cos 2ip = cos2 ip — sin2 ip. Notice that the product Z^ ■ Z0 gives: cos 2ip sin 2ip \ fl 0 \ f cos 2ip — sin 2ip sia2ip —cos2ip) 10 —lj lsin2,i/' cos2ip This observation can be formulated as follows: Proposition. A rotation through the angle ip can be obtained by two subsequent reflections through the lines that have the angle \ip between them. 37 CHAPTER 1. INITIAL WARM UP 5*2,..., 5*271+1. If we carry out the point reflections across the middle points (from 5*1 to S2n+\), then clearly the point A1 is a fixed point of the resulting point reflection, thus it is its centre point. In order to find it, it is enough to carry out the given point reflection with any point X of the plane. The point A1 then lies in the middle of the line segment XX where X is the image of X in that point reflection. The rest of the vertices A2,..., A2n+1 can be obtained by mapping the point Ai in the point reflections across the points 5*1,..., S2n+\. □ In the next exercises, we exploit the properties of the determinant of a matrix, cf. 1.5.5 and 1.5.7. 1.E.28. Determine the area of the triangle ABC, if A = [-8,1],B= [-2,0], C= [5,9]. Solution. We know that the area equals to the absolute value of the half of the determinant of the matrix, whose first column is given by the vector B — A and the second column by the vector C — A, that is the determinant of the matrix ■-2-(-8) 5-(-8r 0-1 9-1 A simple calculation yields the result i|(-2-(-8)).(9-l)-(5-(-8))-(0-l) 61 2 ' Let us add that the change of the order of the vectors leads to change in the sign of the determinant (but the absolute value is unchanged) and that the value of the determinant would not change at all if we wrote the vertices as rows (preserving the order). Moreover, the determinant formed by vectors B — A and C — A is always positive if the vertices A, B, C are in the anti-clockwise direction. □ 1.E.29. Compute the area S of the quadrilateral given by its vertices [1,1], [6,1], [11,4], [2,4]. Solution. First, denote the vertices (in the counter-clockwise direction) as A- [2,41. 1], B=[6,l], C=[ll,4], D If we divide the quadrilateral ABCD into the triangles ABC and ACD, we can obtain its area as the sum of the areas of these two triangles, by evaluating the determinants di d2 6- 1 11 - 1 5 10 1 - 1 4- - 1 0 3 11 - - 1 2 - 1 10 1 4- - 1 4 - 1 3 3 In fact we can prove the previous proposition purely by Ji, geometrical argumentation, as shown in the above picture (try to be a "synthetic geometer"). If we believe in this proof "by picture", then the above computational derivation of the proposition provides the proof of the standard double angle formulas (1). The following is a recapitulation of previous ideas. Mappings that preserve length 1.5.10. Theorem. A linear mapping of the Euclidean plane is composed of one or more reflections if and only if it is given by a matrix R which satisfies R- ab + cd = 0, 2 i 2 a + c b2 + d2 = l. This happens if and only if the mapping preserves length. Rotation is such a mapping if and only if the determinant of the matrix R equals one, which corresponds to an even number of reflections. When there is an odd number of reflections, the determinant equals —1. Proof. We calculate how a general matrix A might ■ look, when the corresponding mapping ^£L_LY/ preserves length. That is, we have a mapping ' x \ fa b\ fx \ fax + by^ J-(c d)\y) = {Cx + dy/ Preserving length thus means that for every x and y, we have x2 + y2 = (ax + by)2 + (cx + dy)2 = (a2 + c2)x2 + (b2 + d2)y2 + 2(ab + cd)xy. Since this equation is to hold for every x and y, the coefficients of the individual powers x2, y2 and xy on the left and right side of the equation must be equal. Thus we have calculated that the conditions put on the matrix R in the first part 38 CHAPTER 1. INITIAL WARM UP where in the columns are these vectors B — A, C — A (for d{) andC- A, D - - A (for d2). Then S = di + d2 5-3-10-0 2 2 2 + 10-3-1-3 15 + 27 21. (thanks to the order of the vectors are all determinants greater than zero). Correctness of the result is easy to confirm, since the quadrilateral ABCD is a trapezoid with bases of lengths 5, 9 and their distance v = 3. □ In the following exercises, we consider non-transparent figures (triangle, quadrangle) in the R2 plane. We will illustrate the power of the concept of determinant and the oriented area on practical visibility issues in the plane. 1.E.30. Visibility of the sides of a triangle. Let the triangle with the vertices A = [5,6], B = [7,8], C = [5, 8] be given. Determine, which of its sides are visible from the point P = [0,1]. Solution. Order the vertices in the positive direction, that is counter-clockwise: [5, 6], [7,8], [5,8]. Using the corresponding determinants we can determine whether the point [0,1] lies to the "left" or to the "right" of the sides of the triangle when we view them as oriented line segments. B - P 7 7 C - P 5 7 A -P 5 5 B - P 7 7 >0, 0. c- P 5 7 A- P 5 5 <0, Not all the determinants are positive, that means P is outside the triangle. In that case, if it is left of some oriented segment (a side of the triangle), the segment is not visible from P (think this over). Because the last determinant is zero, the points [0,1], [5, 6] and [7, 8] lie on a line, the side AB is thus not visible. The side BC is also not visible, unlike the side AC for which the determinant is negative. □ 1.E.31. Which sides of the quadrangle given by the vertices [-2, -2], [1,4], [3,3] and [2,1] are "visible" from the position of the point X = [3, tt - 2] ? Solution. In the first step we order the vertices such that their order corresponds the counter-clockwise direction. We choose vertex A = [—2, —2], the order of the remaining vertices is then B = [2,1], C = [3,3], D = [1,4] (think, how to of the theorem we are proving are equivalent to the property that the given mapping preserves length. Because a2 + c2 = 1, we can assume that a = cos p and c = sin p for a suitable angle p. As soon as we choose the first column of the matrix R, the relation ab + cd = 0 determines the second column up to a multiple. But we also know that the length of the vector in the second column is one, and thus we have only two possibilities for the matrix R, namely: cos p — sin p sin p cos p cos p sin p sin p — cos p In the first case, we have a rotation through the angle p, in the second case we have a rotation composed with the reflection through the first coordinate axis. As we have seen in the previous proposition 1.5.8, every rotation corresponds to two reflections. The determinant of the matrix R is in these two cases either one or minus one and distinguishes between these two cases by the parity of the number of reflections. □ Notice, we have now proved our earlier claim on the invariance of formulae for distance and angle in any orthonor-mal coordinates. Moreover, we have seen that all euclidean affine mappings are generated by translations, and reflections. 1.5.11. Area of a triangle. At the end of our little trip to geometry we will focus on the area of planar objects. For us, triangles will be sufficient. Every triangle is determined by a pair of vectors v and w, which, if translated so that they start from one vertex P of the triangle, determine the remaining two vertices. We would like to find a formula (scalar function area), which assigns the number areaZ\(v, w) equal to the area of the triangle A(v, w) defined in the aforementioned way. By translating, we can place P at the origin since translation does not change the area. We can see from the statement that the desired value is half of the area of the parallelogram spanned by the vectors v and w. It is easy to calculate (using the well-known formula: base times corresponding height), or simply observe from the diagram that the following holds areaZ\(v + v', w) area A(av, w) area A(v, w) + areaZ\(v', w) a area A(v, w). ■Same 39 CHAPTER 1. INITIAL WARM UP order the points without a picture; you can actually use similar procedure to what follows). First consider the side AB. It along with the point X = [3, tt — 2] determines the matrix -2-3 2-3 -2 - (tt - 2) 1 - (tt - 2), such that its first column is the difference A — X and the second column is B — X. Whether it can be "seen" from the point [3, tt — 2] (i.e. is left or right of the oriented line AB, see 1.5.12), is then determined by the sign of the determinant -2-3 2-3 -5 -1 2 - (tt - 2) 1 - (tt - 2) — tt 3 — tt 0 5 — tt 6 -5- (3-tt) - (-1)(-tt) < 0 For the side BC we analogically obtain 2- 3 3-3 _ -1 1 - (tt - 2) 3 - (tt - 2) ~ 3 - tt -1 ■ (5 - tt) - 0 < 0. And for the sides CD and DA we obtain 3- 3 1-3 _ 0 3 - (tt - 2) 4 - (tt - 2) ~ 5 - tt 0- (-2) ■ (5-tt) > 0, 1-3 -2-3 _ -2 -5 4-(tt-2) -2 - (tt - 2) ~ 6 - tt -tt -2 ■ (-tt) - (-5) ■ (6 - tt) > 0. The determinants differ in signs, thus the point X is outside the given quadrangle and a side is visible (from X), if X is left of the side. According to our convention of putting vectors XA, XB, XC, XD into the determinants, the side is visible, if the corresponding determinant is negative (i.e. X is right of the oriented side). Thus from the point X are visible exactly the sides determined by the pairs of vertices A = [—2, —2], B = [2, l]and£ = [2,1],C = [3,3]. □ 1.E.32. Give the sides of the pentagon with vertices at points [-2,-2], [-2,2], [1,4], [3,1] and [2,-11/6], which are visible from the point [300,1]. Solution. For simplifying the notation, put A = [-2, -2], £ = [2,-11/6], C = [3,l], D=[l,i\, E=[-2,2\. The sides BC and CD are, clearly visible from the position of the point [300,1]. On the other hand, DE and EA cannot be seen. For the side AB, we compute 2 - 300 2 - 300 -2-1 -¥-1 302-(-^)-(-298)-(-3) < 0. UNEAZJ1Y M AF&UMEAir £, k la Finally we add to the formulation of our problem a condition areaZ\(v, w) = — area Z\(w, v), which corresponds to the idea that we give a sign to the area, according to the order in which we are taking the vectors. If we write the vectors v and w into the columns of a matrix A, then the mapping A = (v, w) i-> det A satisfies all the three conditions we wanted. How many such mappings could there possibly be? Every vector can be expressed using two basis vectors ei = (1,0) and e2 = (0,1). By linearity, area A is uniquely determined by these vectors. We want N 1 areaZ\(ei,e2) = -. In other words, we have chosen the orientation and the scale through the choice of basis vectors, and we choose the unit square to have area equal to one. Thus we see that the determinant gives the area of a parallelogram determined by the columns of the matrix A. The area of the triangle is thus one half of the parallelogram. 1.5.12. Visibility in the plane. The previous description of Jj ,, the value for oriented area gives us an elegant tool for determining the position of a point relative to oriented line segments. By an oriented line segment we mean two points in the plane R2 with a selected order. We can imagine it as an arrow from one point to the other. Such an oriented line segment divides the plane into two half-planes. Let us call them "left" and "right". We want to be able to determine whether a given point is in the left or right half-plane. Such tasks are often met in computer graphics when dealing with visibility of objects. We can imagine that an oriented line segment can be "seen" from the points to the right of it and cannot be seen from the points to left of it. This implies that the side can be seen from the point [300,1]. □ 40 CHAPTER 1. INITIAL WARM UP F. Relations and mappings We conclude this chapter by considering briefly some aspects of the language of mathematics. We ad-^ vise the reader to have a quick look at the defini-1 tions of the basic concepts of various relations and their properties, beginning in 1.6.1. l.F.l. Determine whether the following relations on the set M are equivalence relations: i) M = {/ : R -> R}, where / ~ g if/(0) = g(0). ii) M = {/ : R -> R}, where / ~ g if/(0) = 5(1). iii) M is the set of lines in the plane, where two lines are related if they do not intersect. iv) M is the set of lines in the plane, where two lines are related if they are parallel. v) M = N, where m ~ n if S(m) + S(n) = 20, while S(n) stands for the sum of the digits of the integer n. vi) M = N, where m ~ n if C(m) = C(n), where C(n) = S(n) if the sum of the digits S(n) is less than 10, otherwise we define C(n) = C(S(n)). (Thus always C{n) < 10.) Solution. i) We check the three properties of equivalence: a) Reflexivity: for any real function /, /(0) = /(0). b) Symmetry: if /(0) =5(0), then also g(0) = /(0). c) Transitivity: if/(0) = g(0) and g(0) = /i(0), then also /(0) = /i(0). We conclude that the relation is an equivalence relation. ii) No. The relation is not reflexive, since for instance for the function sin we have sin 0 7^ sin 1. It is not transitive. iii) No. The relation is not reflexive (every line intersects itself). It is not transitive. iv) Yes. The equivalence classes then correspond to unori-ented directions in the plane. v) No. The relation is not reflexive. S(l) + S(l) = 2. It is not transitive. vi) Yes. I.F.2. Let the relation R be defined over R2 such that ((a, 6), (c, d)) G R for arbitrary a,b,c,d G R if and only if b = d. Determine whether or not this is an equivalence relation. If it is, describe geometrically the partitioning it determines. We have the line segment AB and are given some point C. We calculate the oriented area of the corresponding triangle determined by the vectors C — A and B — A. If the point C is to the left of the line segment, then with the usual positive orientation (counter-clockwise) we obtain the negative sign of the oriented area (showing the non-visibility), while the positive sign corresponds to the points to the right. This approach is often used for testing relative positions in 2D graphics. 6. Relations and mappings In the final part of this introductory chapter, we return to the formal description of mathematical struc-r' tures. We will try to illustrate them on examples ^ we already know. We can consider this part to be ■Z*. an exercise in a formal approach to the objects and concepts of mathematics. 1.6.1. Relations between sets. First we define the Cartesian product A x B of two sets A and B. It is the set of all ordered pairs (a, b) such that a e A and b G B. A binary relation between the two sets A and B is then a subset R of the Cartesian product Ax B. We write a ~R b to mean (a, b) e R, and say that a is related to b. The domain of the relation is the subset D = {a e A : 3b e B, (a, b) e R}. Here the symbol 3b means that there is at least one such b satisfying the rest of the claim. Similarly, the codomain of the relation is the subset I = {b e B : 3a G A, (a, b) e R}. A special case of a relation between sets is a mapping from the set A to the set B. This is the case when every element of the domain of the re-S^Ti8 lation is related to exactly one element of the codomain. Examples of mappings known to us are all functions, where the codomain of the mapping is a set of numbers, for instance the set of integers or the set of real numbers, or the linear mappings in the plane given by matrices. We write f -.DCA^I CB, f (a) = b to express the fact that (a, b) belongs to a relation, and we say that b is the value of / at a. Furthermore we say that • mapping / of the set A to the set B is surjective (or onto), if d = ,4 and / = 5, clarify? • mapping / of the set A to the set B is injective (or one-to-one), if D = A and for every b e I there exist exactly one preimage a G A, f (a) = b. Expressing a mapping / : A —> B as a relation fCAxB, f = {(a,f(a));aeA} is also known as the graph of a mapping f. 41 CHAPTER 1. INITIAL WARM UP Solution. From ((a, 6), (a, &)) G i? for all a,b G R it is implied that the relation is reflexive. Equally easy to see is that the relation is symmetric, since in the equality of the second coordinates we can interchange the left and right side. If ((a, 6), (c, d)) G Ra ((c, d), (e, /)) G R, that is, b = d and d = /, we easily get that the transitivity condition ((a, &), (e,/)) G i?,thatis& = /. The relation R is an equivalence relation, where the points in the plane are related if and only if they have the same second coordinate (the line they determine is perpendicular to the y axis). The corresponding partition then divides the plane into the lines parallel with the x axis. □ l.F.3. Determine how many distinct binary relations can be denned between the set X and the set of all subsets of X, if the set X has exactly 3 elements. Solution. First, notice that the set of all subsets of X has exactly 23 = 8 elements, and thus the Cartesian product with X has 8 ■ 3 = 24 elements. Possible binary relations then correspond to subsets of this Cartesian product, and of those there are 2 24. □ l.F.4. Give the domain D and the codomain I of the relations R = {(a,v), {b,x), {c,x), (c,u), {d,v), (f,y)} between the sets A = {a, b, c, d, e, /} and B = {x, y, u, v, w}. Is the relation R a mapping? Solution. Directly from the definition of the domain and the codomain of a relation we obtain D = {a, b, c, d, /} C A, I = {x,y,u,v} C B. It is not a mapping since (c, x), (c, u) G R, that is c G D has two images. □ l.F.5. Determine for each of the following relations over the set { a, b, c, d} whether it is an ordering and whether it is complete: Ri = {(a, a), (6, 6), (c, c), (d, d), (b, a), (6, c), (6, d)}, R2 = {(a, a), (6, 6), (c, c), (d, d), (d, a), (a, d)}, R3 = {(a, a), (6, 6), (c, c), (d, d), (a, 6), (6, c), (6, d)}, R4 = {(a, a), (6, 6), (c, c), (a, 6), (a, c), (a, d), (6, c), (6, d), (c, d)}, R5 = {(a, a), (6, 6), (c, c), (d, d), (a, 6), (a, c), (a, d), (6, c), (M),M)}. is het aniappma. domain • —^ U net SHrjectiV-e 1.6.2. Composition of relations and functions. For mappings, the concept of composition is clear. Suppose we have two mappings / : A —> B and g : B —> C. Then their composition g o / : ^4 —> C is defined as (s°/)(a)=S(/(a)). Composition can also be expressed with the notation used for a relation as fCAxB, f = {(a,f(a));aeA} gCBxC, g = {(b,g(b));beB} gofCAxC, gof = {(a,g(f(a)));aeA}. The composition of a relation is defined in a very similar jkf* way. We just add existential quantifiers to the fti^£ statements, since we have to consider all possi-^^AJpQfe ble "preimages" and all possible "images". Let '^fir^^A^ R C A x B, SCBxCbe relations. Then SoRCAxC, SoR= {(a, c); 3& G B, (a, 6) G R, (b, c) G 5}. A special case of a relation is the identity relation idA = {(a, a)64x4;a£4j on the set A. It is a neutral element with respect to composition with any relation that has A as its codomain or domain. Compe-ZilfpH. ef nhi/'ons. : points. U>kiek can be rzachcM Ijj a- patk ^hom left it> H^kb are. in ihe. re-U-Uav- For every relation R C A x B, we define the inverse relation R-1 = {(b, a); (a, b) G R} C B x A. 42 CHAPTER 1. INITIAL WARM UP Solution. Ri is an ordering, which is not complete (for instance neither (a, c) ^ R1 nor (c, a) ^ The relation R2 is not anti-symmetric as it is both (a, d) G R2 and (d, a) G R2, therefore it is not an ordering (it is an equivalence). The relations R$ and R4 are also not an ordering, since they are not transitive (for instance (a, b), (b, c) G R$,R4, (a, c) <£ R3, R4) and also R4 is not reflexive ((d, d) <£ Rd). The relation R5 is a complete ordering (if we interpret (a, b) G R5 as a < b, then a < b < c < d). □ l.F.6. Determine whether or not the mapping / is injective (one-to-one) or surjective (onto), when (a) / :ZxZ^Z, f((x, y)) = x + y - Wx2; (b) / :N^NxN, f(x) = (2x, x2 + 10). Solution. In the case (a) is given a mapping which is surjective (it is enough to set x = 0) but not injective (it is enough to set (x, y) = (0, —9) and (x, y) = (1, 0)). In the case (b) it is an injective mapping (both its coordinates, that is functions y = 2x and y = x2 + 10 are clearly increasing over N). The mapping is not surjective (for instance the pair (1, 1) has no preimage). □ l.F.7. In the following three figures, icons are connected with lines such that people in different parts of the world could have assigned them. Determine whether the connection is a mapping, and whether it is injective, surjective or bijective. Solution. In the first figure the connection is a mapping which is surjective but not injective, because both the snake and the spider are labeled as poisonous. The second figure is not a mapping but only a relation, since the dog is labeled both as a pet and as a meal. The third connection is again a mapping. This time it is neither injective nor surjective. □ l.F.8. Determine the number of mappings from the set {1,2} to the set {a, b, c}. How many of them are surjective and how many are injective? Solution. To the element 1 we can assign any of the elements a,b,c. Similarly for the element 2 we can assign any of the Beware, the same term is used with mappings in a more specific situation. Of course, for every mapping there is its inverse relation, but this relation is in general not a mapping. Therefore we speak about the existence of an inverse mapping if every element b G B is an image of exactly one element in A. In such a case the inverse mapping is exactly the inverse relation. Note that the composition of a mapping and its inverse mapping (if it exists) is the identity mapping. In general, this is not so for relations. 1.6.3. Relation on a set. In the case when 4 = Bwe speak about a relation on the set A. We say that the relation R is: • reflexive, if icU C R, that is (a, a) e R for every a e A, • symmetric, if R-1 = R, that is if (a, b) G R, then also (b, a) e R, • antisymmetric, if R-1 n R C id^, that is if (a, 6) G R and if also (b, a) G R, then a = b, • transitive, if R o R C R, that is if (a, 6) e R and (b, c) G R implies (a, c) G R. A relation is called an equivalence relation if it is reflexes ive, symmetric and transitive. ^ relation is called an ordering if it is reflex-*ve' transitive antisymmetric. Orderings are ~£~«a^"" usually denoted by the symbol <, that is the fact that element a is in relation with element b is written as a < b. Notice that the relation <, that is "to be strictly smaller than", is not an ordering on the set of real numbers, since it is not reflexive. A good example of an ordering is set inclusion. Consider the set 2A of all subsets of a finite set A. We have a relation C on the set 2A given by the property "being a subset". Thus X C Z if X is a subset of Z. Clearly all three conditions from the definition of ordering are satisfied: if X C Y and Y C X then necessarily X and Y must be identical. If X C Y C Z then also X C Z, and reflexivity is clear from the definition. We say that an ordering < on a set A is complete, if every two elements a,b G A are comparable, that is, either a < b or b < a. If A contains more than one element, there exist subsets X and Y where neither X CY nor Y C X, so the ordering C is not complete on the set of all subsets of A. The set of real numbers with the usual < is complete. Thus the subdomains N, Z, Q come equipped with a complete ordering, too. On the other hand, there is no such natural ordering on C. The absolute value is only a partial ordering there (comparing the radii of the circles in the complex plane). 1.6.4. Partitions of an equivalence. Every equivalence rela- tion R on a set A defines also a partition of the \? set A, consisting of subsets of mutually equiva- ^/Tj*^ lent elements, namely equivalence classes. For .jisL— any a G A we consider the set of elements, which are equivalent with a, that is [a] =Ra = {beA; (a,b) G R}. 43 CHAPTER 1. INITIAL WARM UP elements a, b, c. Thus there are exactly 32 mappings of the set {1,2} to the set {a, b, c}. None of them can be surjective, since the set {a, b, c} has more elements than the set {1,2}. The mapping is injective if and only if the elements 1 and 2 are mapped to different elements. There are three possibilities for the image of 1, after the image of 1 is given, there remain two possibilities for the image of 2. Thus the number of injective mappings of the set {1,2} to the set {a, b, c} is 6. □ l.F.9. Determine the number of surjective mappings of the set {1,2,3,4} to the set {1,2,3}. Solution. We can determine the number by subtracting the number of non-surjective mappings from the number of all mappings. The number of all mappings is V(3,4) = 34. Non-surjective mappings have either a one element, or a two element codomain. There are just three mappings with a one element codomain. The number of mappings with a two-element codomain is (;|)(24 — 2) (there are (2) ways to choose the codomain and for a fixed two-element codomain there are 24 — 2 ways how to map four elements onto them). Thus the number of surjective mappings is 34 - f )(24 - 2) - 3 = 36. □ 1.F.10. Write down all the relations over a two-element set {1,2}, which are symmetric but are neither reflexive nor transitive. Solution. The reflexive relations are exactly those which contain both pairs (1,1), (2,2). This excludes relations {(1,1), (2,2)}, {(1,1),(2,2),(1,2)}, {(1,1),(2,2),(2,1)},{(1,1),(2,2),(1,2),(2,1)}. We claim that the remaining relations, which are symmetric but not transitive, must contain (1,2), (2,1). If sucharelation contains one of these two (ordered) pairs, it must by symmetry contain also the other. If it contains neither of these pairs, then it is clearly transitive. From the total number of 16 relations over a two-element set we have thus selected {(1,2), (2,1)}, {(1,2),(2,1),(1,1)}, {(1,2),(2,1),(2,2)}. It is clear that each of these 3 relations is symmetric but neither reflexive nor transitive. □ Clearly a e Ra by reflexivity. If (a, 6) e R, then Ra = Rb by symmetry and transitivity. Furthermore, if RaC\Rb=^ 0 then there is an element c in both Ra and Rb so that Ra = Rc = Rb- It follows that for every pair a, b, either Ra = Rb, or Ra and Rb are disjoint. That is, the equivalence classes are pairwise disjoint. Finally, A = UaeARa- That is, the set A is partitioned into equivalence classes. We sometimes write [a] = Ra, and by the above, we can represent an equivalence class by any one of its elements. 1.6.5. Existence of scalars. As before, we assume to know [J/ ii what sets are, and indicate the construction of the natural numbers. We denote the empty set by 0 (notice the difference between the symbol 0 for the zero and the empty set 0) and define (1) 0 := in other words n + 1 := n U {n} O:=0, 1:={0}, 2:={0,l},...,7i + l:={0,l,...,7i}. This notation says that if we have already denned the numbers 0,1,2,... n, then the number n +1 is denned as the set of all previous numbers. We have denned the set of natural numbers N.3 Next, we should construct the operations + and ■ and deduct their required properties. In order to do that in detail, we would have to pay more attention to basic understanding of sets. For example, once we know what a disjoint union of sets is, we may define the natural number c = a+b as the unique natural number c having the same number of elements as the disjoint union of a and b. Of course, formally speaking, we need to explain what does it mean for two sets to have the same number of elements. Let us notice that in general, having the two sets A and B of the same "size" should mean that there exists a bijection A —> B. This is completely in accordance with our intuition for finite sets. However, it is much less intuitive with infinite sets. For example there is the same amount of all natural numbers and those with natural square roots (the bijection a i-> a2), although the example l.G.l could be read as "most of natural numbers do not have a rational square root". We say, that each set which is bijective to natural numbers N is countable. Sets bijective to some natural number n (as denned above) are called finite (with number of elements n), while the sets which are neither finite nor countable are called uncountable. We can also define a relation < on N as follows: m 5. Solution. Remaining k — 3 racers can be ordered in (k — 3)1 ways. For the three friends there are then k — 2 places (the start, the end and the k — 4 spaces) where we can put them in v (k — 2,3) ways. Using the rule of (combinatorial) product, we obtain (k - 3)\ ■ (k - 2) ■ (k - 3) ■ (k - 4) = (k - 2)! ■ (k - 3) ■ (k - 4). n 1.G.14. There are 32 participants of a tournament. The organisers have stated that the participants must divide arbitrarily into four groups, such that the first one has size 10, the second and the third 8, and the fourth 6. In how many ways can this be done? Solution. We can imagine that from 32 participants we create a row, where first 10 are the first group, next 8 are the second group and so on. There are 32! orderings of all participants. Note that the division into groups is not influenced if we change the order of the people in the same group. Therefore the number of distinct divisions equals P (10,8,8,6) = 10!.g2.g!.6!' 1=1 1.G.15. We need to accommodate 9 people in one four-bed room, one three-bed room and one two-bed room. In how many ways can this be done? Solution. If we assign to the people in the four-bed room the number 1, in the three-bed room number 2 and in the two-bed room number 3, then we create permutations with repetitions from the elements 1, 2, 3, where 1 occurs four times, 2 three times and 3 two times. Number of such permutations is P(4,3,2) = ^L_ = 1260. D 1.G.16. Determine the number of ways how to divide among three people A, B and C 33 distinct coins such that A and B together have twice as many coins as C. Solution. From the problem statement it is clear that C must receive 11 coins. That can be done in (^) ways. Each of the remaining 22 coins can be given either to A or to B, which gives 222 ways. Using the rule of product we obtain the result (S) ' 222- □ 1.G.17. In how many ways can we divide 40 identical balls among 4 boys? Solution. Let us add three matches to the 40 balls. If we order the balls and matches in a row, the matches divide the balls in 4 sections. We order the boys at random, give the first boy all the balls from the first section, give the second boy all the balls from the second section and so on. It is now evident that the result is (433) = 12 341. □ 1.G.18. According to quality, we divide food products into groups 7, II, III, IV. Determine the number of all possible divisions of 9 food products into these groups, such that the numbers of products in groups are all distinct. Solution. If we directly write the considered groups from the elements of 7, II, III, IV, we create combinations of repetitions of the ninth-order from four elements. The number of such combinations is (g2) = 220. □ 1.G.19. In how many ways could the table of the first soccer league ended, if we know only that at least one of the teams Ostrava, Olomouc is in the table after the team of Brno (there are 16 teams in the league). Solution. Let us first determine the three places where the teams of Brno, Olomouc and Ostrava ended. Those can be chosen in c(3,16) = (g6) ways. From 6 possible orderings of these three teams on the given three places only four satisfy the given condition. After that, we can independently choose the order of the remaining 13 teams at the remaining places of the table. Using the rule of product, we have the solution '16\ ■ 4 ■ 13! = 13948526592000. □ 49 CHAPTER 1. INITIAL WARM UP 1.G.20. How many distinct orderings (in a row) at a picture of a volleyball team (6 players), if i) Gouald and Bamba want to stand next to each other; ii) Gouald and Bamba want to stand next to each other and in the middle; iii) Gouald and Kamil do not want to stand next to each other. Solution. i) In this case Gouald a Bamba can be considered a single person, we just multiply then by two to determine their relative order. Thus we have 2.5! = 240 orderings. ii) Here it is similar except that the position of Gouald and Bamba is fixed. We have 2.4! = 48 orderings. iii) Probably the simplest approach is to subtract the cases where Kamil and Gouald stand next to each other (see (i)). We get 6! - 2.5! = 720 - 240 = 480. Q 1.G.21. Coin flipping. We flip a coin six times. i) How many distinct sequences of heads and tails are there? ii) How many sequences with exactly four heads are there? iii) How many sequences with at least two heads are there? q 1.G.22. How many distinct anagrams (rearrangements of letters) of the word "krakatit", such that between the letters "k" there is exactly one other letter. Solution. In the considered anagrams there are exactly six possibilities of placement of the group two "k", since the first of the two "k" can be placed at any of the positions 1 — 6. If we fix the spots for the two "k", then the other letters can be placed arbitrarily, that is, in P(l, 1,2,2) ways. Using the rule of product, we have 6-P(l,l,2,2) = ^-^ = 1080. v 7 2-2 □ 1.G.23. How many anagrams of the word BASILICA are there, such that there are no two vowels next to each other and no two consonants next to each other? Solution. Since there are four vowels and four consonants in the word, each such anagram is either of the type BABABABA or ABABABAB. On the given four places we can permute vowels in P0(2,2) = ^ ways and independently of that also the consonants (4! ways). Using the rule of product, the result is then 2 ■ 4! ■ ^ = 288. □ 1.G.24. In how many ways can we divide 9 girls and 6 boys into two group such that each group contains at least two boys? Solution. We divide the boys and the girls independently: 29(25 - 7) = 12800. □ 1.G.25. Material is composed of five layers, each of them has fibres in one of the possible six directions. How many of such materials are there? How many of them have no two neighbouring layers which have fibres in the same direction? Solution. 65 and 6 ■ 55. □ 1.G.26. For any fixed n e N determine the number of all solutions to the equation xi+x2-\-----\- Xk = n in the set of positive integers. Solution. If we look for a solution in the domain of positive integers, then we note that the natural numbers x1,... xk are a solution to the equation if and only if the non-negative integers = Xi — 1, i = 1,..., k axe a solution to the equation Ui + V2 H-----\-yk = n-k. Using 1 .C. 13, there are of them. □ 50 CHAPTER 1. INITIAL WARM UP 1.G.27. There are n forts on a circle (n > 3), numbered in a row with numbers 1,..., n. In one moment of time each of , the forts shoots at one of its neighbours (fort 1 neighbours also with the fort n). Denote by P(n) the number of all possible results of the shooting (a result of the shooting is a set of numbers of those forts that were hit, regardless of the number of hits taken). Prove that P(n) and P(n + 1) are relatively prime. Solution. If we denote the forts that were hit by a black dot and the unhit by a white dot, the task is equivalent to the task to determine the number of all possible colourings of n dots on a circle with black and white colour, such that no two white dots have "distance" one. For odd n this number is equal to K(n) - the number of colourings with black and white, such that no two white dots are adjacent (we reorder the dots such that we start with the dot one and proceed increasingly with odd numbers, and then increasingly with even). For even n this number equals K(n/2)2, the square of the colouring of n/2 dots on a circle such that no two white are adjacent (we colour independently the dots on even positions and on odd positions). For K(n) we easily derive a recurrent formula K(n) = K(n — 1) + K(n — 2). Furthermore, we can easily compute that K(2) = 3, K{3) = 4, K(A) = 7, that is, K(2) = F(4) - F(0), K{3) = F(5) - F(l), K(A) = F(6) - F(2), and using induction we can easily prove that K(n) = F(n + 2) — F(n — 2), where F(n) denotes the n-th member of the Fibonacci sequence (F(0) = 0,_F(1) = F(2) = 1). Since (K(2), K(3)) = 1, wehaveforn > 3 similarly as in the Fibonacci sequence (K(n),K(n-l)) = (K(n)-K(n-l),K(n-l)) = (K(n-2),K(n-l)) = ■ ■ ■ = 1. Let us now show that for every even n = 2a is P(n) = K(a)2 relatively prime with both P(n + 1) = K(2a + 1) and P(n — 1) = K(2a — 1). For this the following is enough: for a > 2 we have (K(a),K{2a + 1)) = (K(a),F(2)K(2a) + F(l)K(2a - 1)) = (K(a), F(3)K(2a - 1) + F(2)K(2a -2) = ... = (K(a), F(a + l)K(a + 1) + F{a)K{a)) = (K(a), F(a + 1)) = (F(a + 2) - F(a - 2),F{a + 1)) = (F(a + 2) - F(a + 1) - F(a - 2),F(a + 1)) = (F(a) - F(a - 2), F(a + 1)) = (F(a-l),F(a + l)) = (F(a-l),F(a)) = l, and (K(a),K(2a - 1)) = (K(a),F(2)K(2a - 2) + F(l)K(2a - 3)) = (K(a), F{3)K{2a - 3) + F{2)K{2a - 4)) = • • • = (K{a),F{a)K{a) + F(a - l)K{a - 1)) = (K(a), F(a - 1)) = (F(a + 2) - F(a - 2),F{a - 1)) = (F(a + 2) - F(a),F(a - 1)) = (F(a + 2) - F(a + 1), F(a - 1)) = (F(a),F(a - 1)) = 1. This proves the claim. □ 1.G.28. How much money do I save in a building savings in five years, if I invest in it 3000 Kc monthly (at the first day of the month), the yearly interest rate is 3% and once a year I obtain a state donation of 1500 Kc (this donation comes at first of May)? Solution. Let xn be the amount of money at the account after n years. Then (for n > 2) we obtain the following recurrent formula (assuming that every month is exactly one twelfth of a year) xn+1 = 1.03xn + 36000 + 1500 + 0.03 ■ 3000 ( 1 + ^ + ■ ■ ■ + j + 0.03 ■ ^ ■ 1500 = 1.03x„ + 38115. interests from deposits this year interest from the state donation credited at this year Therefore n-2 xn = 38115 ^(1.03)J + (1.03)n_1a;i + 1500, i=0 while xx = 36000 + 0.03 ■ 3000 (l + ±1 + ... + i) = 36585, in total x5 = 38115 ((L03) ~ 1V(i.03)4-36585+1500 = 202136. V °-03 / □ 51 CHAPTER 1. INITIAL WARM UP Remark. In reality, interests are computed according to the number of days the money is on the account. You should obtain a real bank statement of a building savings, determine its interest rates and try to compute the credited interests in a year. Compare the result with the sum that was credited in reality. Compute until the numbers agree ... 1.G.29. What is the maximum number of areas the plane can be divided into by n circles? Solution. For the maximum number pn of areas we derive a recurrent formula Pn+l =Pn + 2n. Note that the (n + l)-th circle intersects n previous circles in at most 2n points (and this can really occur) APPIN6 THE THIZP CI&LE Clearly pi = 2. Thus forpn we obtain pn = pn-x + 2(n - 1) = Pn-2 + 2(n - 2) + 2(n - 1) = ... n-l = pi + ^ 2i = n2 - n + 2. i=l 1.G.30. What is the maximum number of areas a 3-dimensional space can be divided into by n planes? Solution. Let the number be rn. We see that r0 = 1. Similarly to the exercise (1.B.4) we consider n planes in the space, we add another plane ad we ask what is the maximum number of new areas. Again it is exactly the number of areas the new plane intersects. How many can that be? The number of areas intersected by the (n + l)-th plane equals to the number of areas the new (n + l)-th plane is divided into by the lines of intersection with the n planes that were already situated in the space. However, there are at most 1/2 ■ (n2 + n + 2) of those (according to the exercise in plane), thus we obtain the recurrent formula n2 + n + 2 rn+i =rn~\--^-' This equation can be again solved directly: (n - l)2 + (n - 1) + 2 n2-n + 2 rn = rn-i H---- = r„_i H---- (n-l)2-(n-l)+ 2 n2-n + 2 n2 (n — l)2 n (n — 1) = r-2 + T + —2^-2-—2-1 + 1 + 1 \2 f„ q\2 (n-l)2 (n-3)2 n (n-l) (n - 2) r™-3+ 2 + 2 + 2 2 +1+1+1 - n 1 n 71 n(n + l)(2n + l) n(n + 1) 1 + —--------- + n : 12 4 52 CHAPTER 1. INITIAL WARM UP n + 6n + 5 6 ' where we have used the known relation .2 _ n(n + l)(2n + 1) E 6 i=l which can be easily proved by mathematical induction. □ 1.G.31. What is the maximum number of areas a 3-dimensional space can be divided into by n balls? O 1.G.32. What is the number of areas a 3-dimensional space is divided into by n mutually distinct planes which all intersect a given point? Solution. For the number xn of areas we derive a recurrent formula xn = xn_1 + 2(n - 1), furthermore x1 = 2, that is, xn = n(n — 1) + 2. □ 1.G.33. From a deck of 52 cards we randomly draw 16 cards. Express the probability that we choose exactly 10 red and 6 Cjjw3> black cards. Solution. We first realize that we don't have to care about the order of the cards. (In the resulting fraction we would obtain ordered choices by multiplying by 16! both nominator and denominator.) The number of all possible (unordered) choices of 16 cards from 52 is f^2). Similarly, the number of all choices of 10 cards from 26 is equal to (2[j) and of 6 cards from 26 is (266). Since we are choosing independently 10 cards from 26 red and 6 cards from 26 black, using the (combinatorial) rule of product we obtain the result /•26W26-, -^--0.118. n 1.G.34. In a box there are 7 white, 6 yellow and 5 blue balls. We draw (without returning) 3 balls randomly. Determine the probability that exactly 2 of them are white. Solution. In total there are (7+g+5) ways, how to choose 3 balls. Choosing exactly two white allows Q choices of two white balls and simultaneously (Y) choices for the third ball. Using the rule of product is the number of ways how to choose exactly two white equal to Q) ■ . Thus the result is 1.G.35. When throwing a dice, eleventh times in a row the result was 4. Determine the probability that the twelfth roll results in 4. Solution. The previous results (according to our assumptions) do not influence the result of further rolls. Thus the probability is 1/6. □ 1.G.36. From a deck of 32 cards we randomly draw 6 cards. What is the probability that all of them have the same colour? Solution. In order to obtain the result = 1.234 -10"4, we just first choose one of the 4 colours and realize that there are (®) ways how to choose 6 cards from 8 cards of this colour. □ 53 CHAPTER 1. INITIAL WARM UP 1.G.37. Three players are given 10 cards each and two remain (from a deck of 32 cards, where 4 of them are aces). Is it more likely, that somebody receives seven, eight and nine of spades; or that two aces remain? Solution. Since the probability that some of the players receives the three mentioned cards equals 3- (?) while the probability that two aces remain equals (32y it is more likely that some of the players receives the three mentioned cards. Let us note that proving the inequality 3-(279K (2) a (?) is possible by transforming both sides, where by repetitive crossing-out (after expanding the binomial coefficients according to their definition) we easily obtain 6 > 1. □ 1.G.38. We throw n dice. What is the probability that among the numbers that appeared the values 1, 3 and 6 are not present? Solution. We can reformulate the exercise that we throw the dice n times. The probability that the first roll does not result into 1, 3 or 6 is 1/2. The probability that neither the first nor the second roll is clearly 1/4 (the result of the first roll does not influence the result of the second roll). Since the event determined by the result of a given roll and event determined by the result of another roll are always (stochastically) independent, the probability is 1/2™. □ 1.G.39. Two friends are shooting independently of each other at one target - one shoots, then the second shoots, then the first, and so on. The probability that the first hits is 0.4, the second friend has the probability of hitting 0.3. Determine the probability P of the event that after shooting there will be exactly one hit of the target. Solution. We determine the result by summing the probabilities of two mutually exclusive events - first friend hit the target and the second has not; and second friend hit the target and first has not. Since the events of hitting are independent (note that independence is preserved when taking complements) is the probability given by the product of the probabilities of given elementary elements. That is, p = 0.4 ■ (1 - 0.3) + (1 - 0.4) ■ 0.3 = 0.46. n 1.G.40. We flip three coins twelve times. What is the probability that at least one flipping results in three tails? Solution. If we realize that when repeating the flipping, the individual results are independent, and denote for i e {1,..., 12} by A{ the event „the i-th flipping results in three tails", we are determining P ( lU) = 1 - (1 - P(A1)) ■ (1 - P(A2)) (1 - P(A12)). Foreveryi G {1,..., 12} is P(Ai) = 1/8, since at every coin of the three the tail is with the probability 1/2 independently of the results of the other coins. Now we can write the final probability 1- r-)12 1.G.41. In a particular state there is a parliament with 200 members. Two major political parties in this state flip a coin during an "election" for every seat in the parliament. Each of the parties has associated one side of the coin. What is the probability that each of the parties gains 100 seats? (The coin is "fair".) Solution. There are 2200 of possible results of the elections (considered to be sequences of 200 results of flips). If each party (100) is to obtain 100 seats, then there are exactly 100 tails and 100 heads in the sequence. There are (2™) such sequences (since 54 CHAPTER 1. INITIAL WARM UP the sequence is uniquely determined by choosing 100 members of 200 possible, which will result in, say, tails). The resulting probability is /20(n 200! U00/ _ 100M00! _i_ q Qgg 2200 2200 □ 1.G.42. Seven Czechs and five English are randomly divided into two (nonempty) groups. What is the probability that one group consists of Czechs only? Solution. There are 212 — 1 of possible divisions. If one group consists of Czechs only, it means that all English are in one group (either in the first or in the second). It remains to divide the Czechs into two nonempty groups, that can be done in 27 — 1 ways. In the end we must add 1 for the division which puts all English in one group and all Czechs in another, 2 ■ (27 - 1) + 1 2l2^I □ 1.G.43. From ten cards, where exactly one is an ace, we randomly draw a card and put it back. How many times must we do this, so that the probability that the ace is drawn at least once, is greater than 0.9? Solution. Let A{ be the event „at i-th drawing the ace was drawn". Since the individual events A{ are (stochastically) independent, we know that P ( II ^) = 1 - (1 - P(A±)) ■ (1 - P(A2)) (1 - P(An)) for every n eN. We are looking for an n e N such that it holds that P (_U A?j = 1 - (1 - P(Ai.)) ■ (1 - P(A2)) ■■■ (1 - P(An)) > 0.9. Clearly is P(Ai) = 1/10 for any i e N. Thus it is enough to solve the equation !-(£)"> 0.9, from which we can express n > !°8" n n, kde a > 1. loga 0.9 ' Evaluating, we obtain that we must do the drawing at least twenty two times. □ 1.G.44. Texas hold'em. Let us now solve a couple of simple exercises concerning the popular card game Texas hold'em, whose rules we will not state (if the reader does not know them, she can look them up on the Internet). What is the probability that i) the starting combination is a tuple of the same symbols? ii) in my starting tuple of cards there is an ace? iii) in the end I have one of the six best combinations of cards? iv) I win, if I hold in my hand ace and a triple of twos (of any colour), on the flop there is ace and two twos and on the turn there is a third three and all these four cards have distinct colour? (The last card river is not yet turned) Solution. i) The number of distinct symbols is 13 and there are always four of them (one of each colour). Thus the number of tuples with the same symbols is 13(g) = 78. The number of all possible tuples is i13^) = 1326. The probability of having same symbols is then = 0.06. ii) One card is the ace, that is four choices, and the second is arbitrary, that is 51 choices. But we have counted twice the tuples with two aces, of which there are (g) = 6. Thus we obtain 451 — 6 = 198 tuples and the probability is .198. _ n i r 1326 ~~ U'i0- 55 CHAPTER 1. INITIAL WARM UP iii) Let us compute the probabilities of the individual best combinations: ROYAL FLUSH: There are exactly only four such combinations - one of each colours. The number of combinations of five cards are (552) = 2598960. The probability is thus equal to 1.5 ■ 10"6. Very small:) STRAIGHT FLUSH: Sequence which ends with the highest card in the range 6 to K, that is eight choices for every colours. We obtain 259382960 = 1.2 ■ 10~5. POKER: Four identical symbols - 13 choices (for every symbol one). The fifth card can be arbitrary, that is 48 choices. Thatmakes25fig5=2.4.10-4. FULL HOUSE: Three identical symbols make 13 (4) = 52 choices and two identical symbols make 12 (4) = 72 choices. The probability is 25397849460 = 1.4 ■ 10-3. FLUSH: All five cards of the same colour means 4 (13) = 5148 choices and the probability is then 255g896o ^ ^ • 10~3. STRAIGHT: The highest card of the sequence is in the range from 6 to Ace, that is 9 choices. The colour of every card is arbitrary, that makes 9 ■ 45 = 9216 choices. But we have counted both straight flush and royal flush which we must subtract. For determining the probability of one of the six best combinations we don't have to do that, we just do not count the first two combinations. Therefore we obtain the probability approximately 3.5 ■ 10~3 + 2 ■ 10~3 + 1.4 ■ 10~3 + 2.4 ■ 10~4 = 7.14- 10"3. iv) The situation is clearly pretty good and therefore it will be better to count bad situation, that is, when the opponent has even better combination. I have at this moment full house of two aces and three two's. The only combination that could beat me at this moment is either full house of three aces and two twos or a poker of twos. That means that the enemy must have either the ace or the last two. If he has the two and any other card, then he clearly wins no matter what card is river. How many ways are there for this other card in his hand? 3 + 4-1-----1-4 + 2 = 45 (one triple and two aces cannot be in his hand since I have them). There are (426) = 1035 remaining combination and the probability of such loss is then 0.043. If he has an ace in his hand, then the following can happen. If he holds two aces, then he again wins if two is not on the river - then I would have split poker. The probability of my (conditional) loss is then ■ || = 10~3. If the enemy has in his hand ace and some other card than 2 and A, then it is a draw no matter what is on the river. The total probability of the win is thus almost 96 %. 1.G.45. A volleyball team (with libera, that is, 7 people) sits after a match in a pub and drinks beer. But there is not enough mugs, and thus the publican keeps using the same seven. What is the probability that i) exactly one person does not receive the mug he had last round, ii) nobody receives the mug he had last round, iii) exactly three receive the mug they had last round. Solution. i) If six people receive the mug they had last round, then clearly the seventh person also receives the mug he had last round, the probability is thus zero. ii) Let m is the set of all orderings and event a{ occurs when the i-th person receives his mug from last round. We want to calculate\m -\jtat\. We obtain 7! Yl=o ^Ti^ = 1854 And the probability is ±SM = 122 = o.37. iii) We choose which three receive the mug they had last round - Q) = 35 choices. The remaining four must receive mugs from somebody else. That is again the formula from the previous section, specifically it is 4! Y^t=o ^~k\ ~ ® choices. In total we have 9 ■ 35 = 315 choices and the probability is = Jq- i-. 1.G.46. In how many ways can we place n identical rooks on a chessboard n x n such that every non-occupied position is threatened by some of the rooks? 56 CHAPTER 1. INITIAL WARM UP Solution. Such placements are a union of two sets: the set of placements where in at least one row there is one rook (therefore in every row there is exactly one; this set has nn elements - in every row we choose independently one position for the rook), and the set of placements where in every column there is at least one (that is exactly one) rook (as before, this set has nn elements). The intersection of these sets has n\ elements (the places for the rooks are chosen sequentially starting in the first row - there we have n choices, in the second only n—1- one column is already occupied...). Using the inclusion-exclusion principle, we obtain 2nn-n\. n 1.G.47. Determine the probability that when throwing two dice at least one resulted in four, if the sum is 7. Solution. We solve this exercise using the classical probability, where the condition is interpreted as restriction of the probability space. The space has due to the condition 6 elements, and exactly 2 of those are favourable to the given event. The answer is thus 2/6 = 1/3. □ 1.G.48. We throw two dice. Determine the conditional probability, that the first die resulted in five under the condition that the sum is 9. Based on this result, decide whether the events "first dice results in five" and "the sum is 9" are independent. Solution. If we denote the event "first dice resulted in five" by A and the event "the sum is 9" by H, then it holds pi 41 m _ P(AnH) _ A _ l f(A\tt) - p{H) - ^ - 4- Note that the sum 9 occurs when the first die is 3 and the second 6, the first is 4 and the second 6, the first is 5 and the second is 4, or the first is 6 and the second is 3. Of those four results (that have the same probability) only one is favourable to the event A. Since the probability of A is clearly 1 /6 ^ 1 /4, the events are not mutually independent. □ 1.G.49. Let us have a deck of 32 cards. If we draw twice one card, what is the probability that the second drawn card is an ace, if we return the first card; and when we don't return the first card (then there are 31 cards in the deck). Solution. If we return the card in the deck, we are just repeating the experiment, which has 32 possible results (which have the same probability), and exactly four of them are favourable. Thus we see that the probability is 1/8. In the second case when we do not return the card, is probability also the same. It is enough to consider that when drawing all the cards one by one is the probability of the ace as the first card identical to the probability that the ace is the second card. We could also use conditional probability, that results into i_ _3_ , 28 _4_ _ 1 32 ' 31 + 32 ' 31 8' □ 1.G.50. Consider families with two children and for simplicity assume that all choices in the set fl = {bb, bg, gb, gg}, where b stands for „boy" and g stands for „girl" (considering the age of the children) have the same probability. Choose random events Hi - family has a boy, Ai - family has two boys. Compute P (Ai\Hi). Similarly consider families with three children, where Q = {bbb, bbg, bgb, gbb, bgg, gbg, ggb, ggg}. If H2 - the family has both boy and girl, A2 - the family has at most one girl, decide whether the events A2 and H2 are independent. Solution. Considering which of the four elements of the set fl are, (not) favourable to the event Ai or Hi, we easily obtain pi A rr ^ _ PjA^H,) _ P(AQ _ ± _ 1 f{Ai\Hi) - P(ffl) - p(7^j - T - J- Further we have to determine whether the following holds: P (A2 n H2) = P (A2) ■ P (H2). 57 CHAPTER 1. INITIAL WARM UP Again we just have to realize that exactly the elements bbb, bbg, bgb, gbb of the set fi, are favourable to the event A2; to the event H2 the elements bbg, bgb, gbb, bgg, gbg, ggb are favourable and to the event A2 fl H2 the elements bbg, bgb, gbb. Therefore P(A2nH2) = l = l-l = P(A2)-P(H2), which means that the events A2 and H2 are independent. □ 1.G.51. We flip a coin five times. For every head, we put a white ball in a hat, for every tail we put in the same hat a black ball. Express the probability that in the hat there is more black balls than white balls, if there is at least one black ball in the hat. Solution. Let us have the following two events A - there are more black balls than white balls in the hat, H - there is at least one black ball in the hat. We want to expressP(^4|i7). Note that the probability P (Hc) of the complementary event to the event H is 2~5 and that the probability of the event is the same as the probability P (Ac) of the complementary event (there are more white balls in the hat). Necessarily, P(H) = 1 - 2"5, P(A) = 1/2. Furthermore P(A n H) = P(A), since the event H contains the event A (the event A has H as a consequence). Thus we have obtained p(A\m — p(AnH) - i - is 1.G.52. In a box there are 9 red and 7 white balls. Sequentially we draw three balls (without returning). Determine the probability that the first two are red and the third is white. Solution. We solve this exercise using the theorem about multiplication of probabilities. First we require a red ball, that happens with the probability 9/16. If a red ball was drawn, then in the second round we draw a red ball with the probability 8/15 (there are 15 balls in the box, 8 of them are red). Finally, if two red balls were drawn, the probability that a white ball is drawn is 7/14 (there are 7 white balls and 7 red balls in the box). Thus we obtain 16 15 14 - u-i0- n 1.G.53. In the box there are 10 balls, 5 of them are black and 5 are white. We will sequentially draw the balls, and we do not return them back. Determine the probability that first we draw a white ball, then a black, then a white and in the last, fourth turn again a white. Solution. We use the theorem about multiplication of probabilities. In the first round we draw a white ball with the probability 5/10, then a black ball with probability 5/9, then a white ball with probability 4/8 and in the end a white ball with probability 3/7. That gives iL 5 4 3 _ _5_ 10 9 8 7 84' i-i 1.G.54. From a deck of 32 cards we randomly draw six cards. Compute the probability that the first king will be chosen as the sixth card (that is, the previous five cards do not contain any king). Solution. Using the theorem about multiplication of probabilities we have ^ 27 26 25 24 a— f) 0790 32 ' 31 ' 30 ' 29 ' 28 27 U.Uf ZD. 58 CHAPTER 1. INITIAL WARM UP 1.G.55. What is the probability that a sum of two randomly chosen positive numbers smaller than 1 is smaller than 3/7? Solution. It is clear that it is a simple exercise on geometrical probability where the basic space fl is a square with vertices at [0, 0], [1, 0], [1,1], [0,1] (we are choosing two numbers in [0,1]). We are interested in the probability of the event that a randomly chosen point [x,y] in this square satisfies x + y < 3/7, that is, the probability that the point lies in the triangle A with vertices at [0,0], [3/7,0], [0,3/7]. Now we can easily compute p( A\ — vol A (f )2/2 _9_ \ ' vol Q 1 98' j-j 1.G.56. Let a pole be randomly broken into three parts. Determine the probability that the length of the second (middle) part is greater than two thirds of the length of the pole before the breaking. Solution. Let d stand for the length of the pole. The breaking of the pole at two points is given by the choice of the points where we split the pole. Let x be the point which is the first (closer to left end of the pole), and x + y be the point where the second splitting occurs. That says that the basic space is the set {[x, y]\ x e (0, d),y e (0, d — x)}, that is, a triangle with vertices at [0,0], [d, 0], [0, d\. The length of the middle part is given by the value of y. The condition from the exercise statement can be now restated as y > 2d/3, which corresponds to the triangle with vertices at [0,2d/3], [d/3,2d/3], [0, d]. Areas of the considered triangles are d? j2 a (d/3)2/2, therefore the probability is a2 32.2 _ 1 if 9' □ 1.G.57. A pole of length 2 m is randomly divided into three parts. Determine the probability of the event that the third part is shorter than 1,5m. Solution. This exercise is for using the geometrical probability, where we are looking for the probability that the sum of the lengths of the first two parts is greater than one fourth of the length of the pole. We determine the probability of the complementary event, that is, the probability that if we randomly choose two points on the pole, both of them are in the first quarter of the pole. The probability of this event is 1/42, since the probability of picking a point in the first quarter of the pole is clearly 1/4 and this choice is independently repeated (once). Thus the probability of the complementary event is 15/16. □ 1.G.58. Mirek and Marek have a lunch at the school canteen. The canteens opens from 11 to 14. Each of them eats the lunch for 30 minutes, and the arrival time is random. What is the probability that they meet at a given day, if they always sit at the same table? Solution. The space of all possible events is a square 3x3. Denote by x the arrival time of Mirek and by y the arrival time of Marek, these two meet if and only if |a; — y\ < 1/2. This inequality determines in the square of possible events the area whose volume isll/36ofthe volume of the whole square. Thus that is also the probability of the event. □ 1.G.59. From Brno Honza rides a car to Prague randomly between 12 and 16, and in the same time interval Martin rides a car to Brno from Prague. Both stop in a motorest in the middle of the trip for thirty minutes. What is the probability that they meet there, if Honza's speed is 150 km/h and Martin's is 100 km/h? (The distance Praha-Brno is 200 km). Solution. If we denote the departure time of Martin by x and the departure time of Honza by y, and in order to have fewer fractions in the following calculations choose a time unit to be ten minutes, then the base space is a square 24 x 24. The arrival time of Martin to the motorest is x + 6, arrival time of Honza is y + 4. As in the previous exercise, the event that they meet in the motorest is equivalent to the event that their arrival times do not differ by more than thirty minutes, that is, | (a; + 6) — (y + 4) | < 3. This condition determines an area with volume 242 — \ (232 + 192) (see the figure) 59 CHAPTER 1. INITIAL WARM UP 1.G.60. Mirek departs randomly between 10 and 20 o'clock from Brno to Prague. Marek departs randomly in the same interval from Prague to Brno. The trip takes 2 hours. What is the probability that they meet on the road (they use the same road)? Solution. We are solving analogously to the previous exercise. The space of all events is a square 10 x 10, Mirek, departing at the time x, meets Marek, departing at the time y if and only if \x — y | < 2. The probability is p = ^ = Jjr = 0.36. □ 1.G.61. Two meter-long pole is randomly divided into three pieces. Determine the probability that a triangle can be built of the pieces. Solution. Division of the pole is given as in the previous exercises by the points of cutting x and y and the probability space is again a square 2 x 2. In order to be able to build a triangle of the pieces, the lengths of the parts must satisfy the triangle inequalities, that is, sum of lengths of any two parts must be greater than the length of the third part. Since the sum of the lengths is 2 meters, this condition is equivalent to the condition that each part must be smaller than 1 meter. Using the cut-points x and y, we can express this that it cannot simultaneously hold x < 1 and y < 1 or simultaneously x > 1 and y > 1 (this corresponds to the conditions that the border parts of the pole are smaller than 1), and also \x — y < 1 (the middle part is smaller than one). These conditions are satisfied by the shaded area in the picture, whose volume is 1/4. 60 CHAPTER 1. INITIAL WARM UP □ 1.G.62. Do the equations (a) (b) (c) have a unique solution (that is, exactly one)? Solution. The set of equation is uniquely solvable if and only if the determinant of the matrix given by the left-hand side coefficients is nonzero. Therefore, the coefficients on the right-hand side do not influence the uniqueness of the solution. Thus we have to have the same answer in (a) and (b). Since 4 -V3 4xi - V/3x2 = 3, - 2^X2 = -2 4xi — V/3x2 = 16, Xl - 2^X2 = -7 + 2x2 = 7, -2xi - X2 = -3 1 -2y/7 -1 4- (-2y/7) - (-V3-1) ^0, = 4-(-l)-(2-(-2)) = 0, for (a) and (b) there is a unique solution and in (c) there is not. If we multiply the second equation in (c) by —2, we see that it has no solution at all. □ where y G 1.G.63. Determine A-Afar (cos y — sin y i I sin y cos p Solution. We know that the mapping ^x\ / cos p — sin y\ (x Ky) \s\np cosy ) \y is the rotation of the plane R2 around the origin through the angle p in the positive direction. Since matrix multiplication is associative, we obtain that the mapping x,y G x,y G. A - A = ' x \ (cos p — sin p\ (cos p — sin p\ (x yl ysiny cosy J ysiny cosy J \y is a rotation through the angle 2p. That means that / cos 2p — sin 2p Vsin2y cos2y Note that we could have directly multiplied A ■ A (and apply the formulas for sine and cosine of double angle). But repeating the aforementioned method (or using the mathematical induction) yields ' cos np — sin np sinny cos np easier (we set A2 = A ■ A, A3 = A ■ A ■ A, etc.). □ An = n = 2,3, 61 CHAPTER 1. INITIAL WARM UP 1.G.64. The parallelogram identity. The calculation in coordinates can be useful in plane geometry. Let us demonstrate this on the proof "parallelogram identity": iiu,v e R2, then: 2(||U||2 + |M|2) = ||U + v||2 + ||U-v||2. Thus the sum of the squares of the diagonals of a parallelogram is the sum of the squares of the lengths of the four sides of the parallelogram. Solution. Writing both sides of the equation into the coordinates u = (ui, u2), v = (vi,v2) yields: \\u + v\\2 + \\u - v\\2 = (ui + Vl)2 + (u2 + v2)2 + (Ul - vi)2 + (u2 - v2)2 = u\ + 1ll\V\ + v\ + + 2u2v2 + v2 + + uf — 2ui«i + v\ + u2 — 2u2v2 + v2 = 2(u2 + u22 + vf + vl) = 2(||U||2 + |M|2). q 1.G.65. Compute the area S of a quadrilateral given by the vertices [0,-2], [-1,1], [1,5], [1,-1]. Solution. In the usual notation A =[0,-2], B=[l,-1], C=[l,5], £»=[-1,1] and the usual division of the quadrilateral into triangles ABC and ACD with areas Si and 6*2 we obtain 2° =I(7-l) + i(3 + 7) = 8. n 1.G.66. Determine the area of the quadrilateral ABCD with vertices A = [1,0], B = [11,13], C = [2,5] a D = [-2, -5]. S — Si + S2 — h 1-0 1-0 -1 + 2 5 + 2 + Solution. We divide the quadrilateral into two triangles ABC and ACD. We compute their areas by computing absolute values of the determinants, see 1.5.11, S = 1 1 5 + 1 1 5 47 2 ' 10 13 2 ' -3 -5 ~ T □ 1.G.67. Compute the area of parallelogram with vertices at [5,5], [6,8] at [6,9]. Solution. Although such parallelogram is not uniquely determined (the fourth vertex is not given), the triangle with vertices at [5, 5], [6, 8] and [6,9] must be necessarily a half of every parallelogram with these three vertices (one of the sides of the triangle becomes the diagonal of the parallelogram). Therefore the area equals the determinant 6- 5 6- 5 1 1 8- 5 9- 5 3 4 1.G.68. Give the area of a meadow, which is determined on the area map by the points at positions [—7,1], [—1,0], [29,0], [25,1], [24,2] and [17,5]. (Ignore the measurement units. They are determined by the ratio of the area map to the reality.) Solution. The given hexagon can be divided into four triangles with vertices at [-7,1], [-1,0], [17, 5]; [-1,0], [24, 2], [17, 5]; [-1,0], [25,1], [24, 2]; [-1, 0], [29, 0], [25,1]. The areas are 24, 89/2, 27/2 and 15 respectively, which gives the total area as 24 + 44i + 13i + 15 = 97. n 62 CHAPTER 1. INITIAL WARM UP 1.G.69. Determine the area of a triangle A2A3An, where A0Ai... An are the vertices of a regular dodecagon inscribed in a circle of radius 1. Solution. The vertices of the dodecagon can be identified with the twelfth roots of 1 in the complex plane. As in ?? we find out A2 = cos(7r/3) + i sin(7r/3) = 1/2 + i\/3/2, A3 = cos(7r/2) + i sin(7r/2) = i, An = cos(—7r/6) + i sin(—7r/6) = V/3/2 — i/2, that means the that the coordinates of these points in the complex plane are A2 = [1/2, v/3/2], A3 = [0,1], An = [v/3/2, — |]. According to the formula for the area of a triangle, the area of the triangle S is 3- a/3 1 A2 -AX1 1 2 A3 -AX1 ~ 2 \/3 1 I V3 2 2^2 s/3 2 □ 1.G.70. Determine which sides of the quadrilateral with vertices A = [95,99], 5 are visible from the point [2,0]. [130,1061, C ■ [40,60],D = [130,120], O 1.G.71. Determine the number of relations over the set {1,2,3,4}, which are both symmetric and transitive. Solution. Relations of the given properties is an equivalence over some subset of the set {1,2,3,4}. In total, 1 + 4 ■ 1 + (4) 2 + Q ■ 5 + 15 = 52. 1.G.72. Determine the number of ordering relations over a three-element set. □ o 1.G.73. Determine the numer of ordering relations over the set {1,2,3,4} such that the elements 1 and 2 are not comparable (that is, neither 1x2 nor 2x1, where X stands for the ordering relation). O 1.G.74. Determine the number of surjective mappings / from the set {1,2,3,4,5} to the set {1,2, 3} such that/(l) = j(2). Solution. Every such mappings is uniquely given by the images of the elements {1, 3,4,5}, there are exactly that many mappings as there are surjective mappings of the set {1,3,4,5} to the set {1, 2,3}, that is, 36, as we know from the previous exercise. □ 1.G.75. Give all the elements in S o R, if i? = {(2,4),(4,4),(4,5)}cNxN, S = {(3,1), (3,2), (3,5), (4,1), (4,4)}cNx N. Solution. Considering all choices of two ordered tuple (2, 4), (4,1); (2,4), (4,4); (4, 4), (4,1); (4,4), (4,4) satisfying that the second element of the first ordered tuple—which is a member of R—equals the first element of the second ordered tuple—which is a member of S—we obtain Soi? = {(2,l),(2,4),(4,l),(4,4)}. n 1.G.76. Let a binary relation be given fl={(0,4),(-3, 0), (5,^), (5,2), (0,2)} between sets A = Z a B = R. Express R-1 and R o _R_1. Solution. We can immediately see that R-1 = {(4, 0), (0, -3), (tt, 5), (2, 5), (2, 0)}. Furthermore, R o R-1 = {(4, 4), (0, 0), (tt, tt), (2, 2), (4, 2), (tt, 2), (2, tt), (2,4)}. Q 63 CHAPTER 1. INITIAL WARM UP 1.G.77. Decide whether the relation R determined by the condition: (a) (a, b) e R a < \b\; (b) (a,b) e R \a\ = \2b\ over the set of integers Z is transitive. Solution. In the first case R is transitive, because a I < I & |, I & I < I c a < c . In the second case R is not transitive. For instance, consider (4,2),(2,l)eB, (4,l)gÄ. □ 1.G.78. Find all relations over M = {1,2}, which are not antisymmetric. Which of them are transitive? Solution. There are four relations that are not antisymmetric. They are exactly subsets of the set {1,2} x {1,2}, which contain the elements (1,2), (2,1) (otherwise the condition of antisymmetry is satisfied). Of these four only the relation is transitive, because not containing tuples (1,1) and (2,2) in a transitive relation means that the relation cannot contain both 1.G.80. Is there an equivalence relation, which is also an ordering, over the set of all lines in the plane? Solution. An equivalence relation (or ordering relation) must be reflexive, therefore every line must be in relation with itself. Furthermore we require that the relation is both symmetric (equivalence) and antisymmetric (ordering). That means that a line can be in relation only with itself. If we define the relation such that two lines are in relation if and only if they are identical, we obtain "very natural" relation which is both equivalence relation and ordering. We just need to check that it is transitive, which it trivially is. Thus the only relation satisfying the problem statement is the identity over the set of all lines in the plane. □ 1.G.81. Determine whether the relation R = {(k,l) e Z x Z; | k | > | /1} over the set Z is an equivalence and/or an ordering. Solution. The relation R is not an equivalence: it is not symmetric (take (6,2) e R, (2, 6) ^ R); it is not an ordering: it is not antisymmetric (take (2, -2) e R, (-2, 2) e R). □ 1.G.82. Show that the intersection of any equivalence relation over a set X is again an equivalence relation, and that the union of two ordering relations over a set X does not have to be an ordering. Solution. We see that the intersection of equivalence relations is reflexive, symmetric and transitive: all the equivalence relations must contain the tuple (x, x) for every x e X, therefore the intersection contains that tuple too. If the element (x, y) is in the intersection, then the element (y, x) is also in the intersection (just use the fact that every equivalence is symmetric). If tuples (x, y) and (y, z) are in the intersection, then both are in the equivalences also. Since the equivalences are transitive, they all contain the element (x, z) and thus that element is also in the intersection. If we chose X = {1,2} and the ordering relation {(1,1), (1,2), (2,1), (2, 2)} = MxM, (1,2) and (2,1). 1.G.79. We have a set {3,4,5, 6,7}. Write explicitly the relations i) a divides b, ii) Either a divides & or & divides a, iii) a and b have a common divisor greater than one, and examine their properties. o □ i?x = {(1,1), (2,2), (1,2)}, Ä2 = {(1,1), (2,2), (2,1)} 64 CHAPTER 1. INITIAL WARM UP over X, we obtain the relation JRiUJR2 = {(l,l),(2,2),(l,2),(2,l)}, which is not antisymmetric, thus not an ordering. □ 1.G.83. Over the set M = {1,2,..., 19,20} there is an equivalence relation ~ such that a ~ b for any a, b e m if and only if the first digits of the numbers a, b are the same. Construct the partition given by this equivalence. Solution. Two numbers from the set M are in the same equivalence class if and only if they are in the relation (first digit is the same). Therefore the partition consists of the sets {1,10,11,..., 18,19}, {2,20}, {3}, {4}, {5}, {6}, {7}, {8}, {9}. Q 1.G.84. We are given partition of two classes {b, c}, {a, d, e} of the set X = {a, b, c, d, e}. Write down the equivalence relation R over the set X which gives this partition. Solution. Equivalence R is determined by the fact that the two elements are in relation if and only if they are in the same partition class (note also that R must be symmetric), and every element is in relation with itself (R must be reflexive). Therefore R contains exactly (a,a),(&, b),{c,c),{d, d),(e,e), 0, c), (c, 6), (a, d), (a, e), (d, a), (d, e), (e, a), (e, d). 1.G.85. Let {a, b, c, d} be a set with a relation {(a, a), (6, 6), (a, 6), (6, c), (c, 6)}. What is the minimal number of elements we have to add to the relation in order to make it an equivalence? Solution. Let us successively ensure the three properties that define an equivalence. First it is the reflexivity. We must add the tuples {(c, c), (d, d)}. Second is the symmetry - we must add (b, a) and for the third step we must do the so-called transitive closure. Since a is in relation with b and & is in relation with c, we must add (a, c) and (c, a). □ 1.G.86. What is the maximal domain D C R and codomain H C R such that the following mappings are bijective, and what is then the inverse function? i) X H> XA ii) x H> a;3 iii) X l-> —^-r Solution. i) D = [0, oo) and H = [0, oo) or also D = (—oo, 0] a H = [0, oo). The inverse function is then x i-> -^/x. iii) Z) = K\{-1} and 77" = R \ {0}. The inverse function is x i-> A - 1. 1.G.87. Consider a relation R x R. A point is in the relation whenever it holds that (x-l)2 + (y + l)2 = l. Can we describe the points using the function y = j(x)l Depict the points in the relation. Solution. We cannot, because for instance y = — 1 has two preimages: x = 0 and x = 2. The points lie on a circle with the centre at the point (1,-1) and radius 1. □ 65 CHAPTER 1. INITIAL WARM UP 1.G.88. Let for any two integers k, I hold that (k, I) e R whenever the number 4k — 41 is an integral multiple of 7. Is such a relation over R an equivalence? Is it an ordering? Solution. Note that two integers are in the relation R if and only if they have the same remainder under the division by 7. Therefore it is an example of the so-called remainder class of integers. Therefore we know that the relation R is an equivalence relation. Its symmetry (for instance, (3,10), (10,3) G R, 3 ^ 10) implies that it is not an ordering. □ 1.G.89. Let a relation R be defined over the set N = {3,4,5,..., n, n + 1, ... }, such that two numbers are in the relation whenever they are relatively prime (that is, the prime decompositions of the numbers do not contain any common number). Determine whether this relation is reflexive, symmetric, antisymmetric, transitive. Solution. For a tuple of the same numbers it holds that (n,n) <£ R. Therefore the relation is not reflexive. It is clear that when two numbers are relatively prime or not, it does not matter how they are ordered - it is a property of unordered tuples. Therefore, R is symmetric. From the symmetry we have that it is not antisymmetric (for instance, (3,5) G R, 3 ^ 5). Since R is symmetric and (n, n) <£ R for any number n e N, a choice of two distinct numbers which are in the relation gives that R is not transitive. □ 1.G.90. Determine the number of injective mappings of the set {1,2,3} to the set {1,2,3,4}. Solution. Any injective mapping among the given sets is given by choosing an (ordered) triple from the set {1,2,3,4} (the elements in the chosen triple will correspond in order to images of the numbers 1,2,3) and vice versa. Every injective mapping gives such a triple. Thus the number of injective mappings equals the number of ordered triples among four elements, that is «(3,4) =4-3-2 = 24. □ 1.G.91. How many relations are there over an n-element set? Solution. A relation is an arbitrary subset of the cartesian product of the set with itself. This cartesian product has n2 elements, thus the number of all relations over an n-element set is 2™2. □ 1.G.92. How many reflexive relations are there over an n-element set? Solution. The relation over the set M is reflexive if and only if it has the diagonal relation Am = {(a, a), all a e M} as a subset. As for the rest of the n2 — n ordered pairs in the cartesian product M x M, we have independent choice, whether or not the pair belongs to the relation. In total we have 2™2 ~n different reflexive relations over an n-element set. □ 1.G.93. How many symmetric relations are there over an n-element set? Solution. A relation R over the set M is symmetric if and only if the intersection of R with each {(a, 6), (6, a), where a ^ b, a,b e M} is either the whole two-element set or is empty. There are (™) two-element subsets of the set M. If we also declare what the intersection of R and the diagonal relation Am = {(a, a), where a e M} should be, then R is completely determined. In total we are to do (™) + n independent choices between two alternatives: each set of the type {(a, 6), (6, a) | where a, b e M, a ^ b} is either the subset of R or it is disjoint with R. Every pair (a, a), a e M is either in R or not. In total we have 2(2)+™ symmetric relations over an n-element set. □ 1.G.94. How many anti-symmetric relations over an n-element set are there? Solution. A relation R over the set M is anti-symmetric if and only if the intersection of R with each set {(a, b), (b, a)}, a ^ b, a,b e M is either empty or one-element (which means that it is either {(a, 6)} or {(b, a)} but not both). The intersection of R with the diagonal relation is arbitrary. By declaring what these intersections are, the relation R is completely determined. In total we have 3(2)2™ anti-symmetric relations over an n-element set. □ 1.G.95. Determine the number of ordering relations of the set {1, 2,3,4,5} such that exactly two pairs of element are incomparable, o 66 CHAPTER 1. INITIAL WARM UP Solution to the exercises l.B.3. yn = 2(f)" - 2. 1.G.21. i) 26 = 64 ii) © = 15 ill) No head is one possibility = l.oneheadis (f) = 6. Thus there are 7 sequences with at most one head and the result is 64—7 = 57. 1.G.31. The maximum number j/n ofareas a plane can be divided into by n circles is j/n = j/n-i+2(n—1), j/i = 2, thatis, j/n =n2—n+2. For the maximum number pn of areas a space can be divided into by n balls we obtain the recurrent formulapn+i = pn + yn,pi = 2, that is, pn = | (n2 - 3n + 8). 1.G.70. First, we orient the vertices of the given quadrangle in the counter-clockwise order: ABC'D. After computing the corresponding determinants as in the previous exercises we see that only the side CB is visible. 1.G.72. 19. 1.G.73. 87. 1.G.79. i) (3, 3), (4,4), (5, 5), (6, 6), (7, 7), (3, 6), check that it is an ordering relation. ii) again (i, i) for i = 1,..., 7 and additionally (3, 6), (6, 3), check that it is an equivalence relation. ill) (i, i) for i = 1,..., 7 and also (3, 6), (6, 3), (4, 6), (6,4). Check that it is not an equivalence, since transitivity does not hold. 1.G.95. Three different Hasse diagrams which satisfy the given condition. In total 5! + 5! + 5!/4 = 270. 67 CHAPTER 2 Elementary linear algebra Can't you count with scalars yet? - no worry, let us go straight to matrices... A. Vectors and matrices Before we approach vector spaces in a systematic way, we begin with something we know - systems of linear equations. Step by step we shall see that the vector spaces are hidden behind them and we get used to the matrix calculus on the way. 2.A.I. A colourful example. A company of painters orders 810 litres of paint, to contain 270 litres each of red, green and blue coloured paint. The provider can satisfy this order In the previous chapter we warmed up by considering relatively simple problems which did not require any sophisticated tools. It was enough to use addition and multiplication of scalars. In this and subsequent chapters we shall add more sophisticated thoughts and tools. First we restrict ourselves to concepts and operations consisting of a finite number of multiplications and additions to a finite number of scalars. This will take us three chapters and only then will we move on to infinitesimal concepts and tools. Typically we deal with finite collections of scalars of a given size. We speak about "linear objects" and "linear algebra". Although it might seem to be a very special tool, we shall see later that even more complicated objects are studied mostly using their "linear approximations". In this chapter we will work with finite sequences of scalars. Such sequences arise in real-world problems whenever we deal with objects de-.. scribed by several parameters, which we shall call coordinates. Do not try much to imagine the space with more than three coordinates. You have to live with the fact that we are able to depict only one, two or three dimensions. However, we will deal with an arbitrary number of dimensions. For example, observing any parameter in a group 500 students (for instance, their study results), our data will have 500 elements and we would like to work with them. Our goal is to develop tools which will work well even if the number of elements is large. Do not be afraid of terms like field or ring of scalars K. Simply, imagine any specific domain of numbers. Rings of scalars are for instance integers Z and all residue classes Zk. Among fields we have seen only R, Q, C and residue classes Zfc for k prime. Z2 is very specific among them, because the equation x = — x does not imply x = 0 here, whereas in every other field it does. 1. Vectors and matrices In the first two parts of this chapter, we will work with vectors and matrices in the simple context of finite sequences of scalars. We can imagine working with integers or residue classes as well as real or complex numbers. We hope to illustrate how easily a concise and formal reasoning can lead to strong results valid in a much broader context than just for real numbers. CHAPTER 2. ELEMENTARY LINEAR ALGEBRA by mixing the colours he usually sells (he has enough in his warehouse). He has • reddish colour - it contains 50 % of red, 25 % of green and 25 % of blue colour; • greenish colour - it contains 12,5 % of red, 75 % of green and 12,5 % of blue colour; • bluish colour - it contains 20 % of red, 20 % of green and 60 % of blue colour. How many litres of each of the colours at the warehouse have to be mixed in order to satisfy the order? Solution. Denote by • x - the number of litres of reddish colour to be used; • y - the number of litres of bluish colour to be used; • z - the number of litres) greenish colour to be used; By mixing the colours we want a colour that contains 270 litres of red. Note that reddish contains 50 % red, greenish contains 12,5 % red and bluish 20 % red. Thus the following has to be satisfied: 0,5a; + 0,125y + 0,2z = 270. Similarly, we require (for blue and green colours respectively) that 0,25a; + 0,75y + 0,2z = 270, 0,25a; + 0,125y + 0,6z = 270. From the first equation x = 540 — 0,25y — 0, 4z. Substitute for x into the second and third equations to obtain two linear equations of two variables 2,75y + 0, 4z = 540 and 0,25y + 2z = 540. From the second of these we express z = 270 — 0,125y and substitute into the first one we obtain2,7y = 432, that is, y = 160. Therefore z = 270 - 0,125 ■ 160 = 250 and hence x = 540 - 0,25 ■ 160 + 0,4 ■ 250 = 400. An alternative approach is to deduce consequences from the given equations by a sequence of adding them or multiplying them by non-zero scalars. This is easily handled in the matrix notation (which we met when solving equations with two variables in the previous chapter already). The first row of the matrix consists of coefficients of the variables in the first equation, second of the coefficients in the second equation and third of the coefficients in the third. Therefore the matrix of the system is / 0,5 0,125 0,2 \ 0,25 0,75 0,2 , \ 0,25 0,125 0,6 / The extended matrix of the system is obtained from the matrix of the system by inserting the column of the right-hand sides Later, we follow the general terminology where the notion of vectors is related to fields of scalars only. 2.1.1. Vectors over scalars. For now, a vector is for us an ordered n-tuple of scalars from K, where the fixed n e N is called dimension. We can add and multiply scalars. We will be able to add vectors, but multiplying a vector will be possible only by a scalar. This corresponds to the idea we have already seen in the plane R2. There, addition is realized as vector composition (as composition of arrows having their direction and size and compared when emanating from the origin). Multiplication by scalar is realized as stretching the vectors. A vector u= (a1,..., an) is multiplied by a scalar c by multiplying every element of the n-tuple u by c. Addition is defined coordinate-wise. Basic vector operations u + v = (ai,..., an) + (bi,... , bn) = (ai + h,... ,a„ + bn) c-u = c- (a1,. .., an) = (c ■ ai,..., c ■ an). cu = c(a1,..., an) = (cai,can). For vector addition and multiplication by scalars we shall use the same symbols as for scalars, that is, respectively, plus and either dot or juxtaposition. The vector notation convention. We shall not, unlike many other textbooks, use any special notations for vectors and leave it to the reader to pay attention to the context. For scalars, we shall mostly use letters from the beginning of the alphabet, for the vector from the end of the alphabet. The middle part of the alphabet can be used for indices of variables or components and also for summation indices. In the general theory in the end of this chapter and later, we will work exclusively with fields of scalars when talking about vectors. Now we will work with the more relaxed properties of scalars as listed in 1.1.1. For vector addition in K", the properties (CG1)-(CG4) (see 1.1.1) clearly hold with the zero element being (notice we define the addition coordinate-wise) 0 = (0,..., 0) G K". We are purposely using the same symbol for both the zero vector element and the zero scalar element. Next, let us notice the following basic properties of vectors: Vector properties For all vectors v, w e I" and scalars a, b e K we have (VI) a-(v + w) = a- v + a- w (V2) (a + b)-v = a-v + b-v (V3) a-(b-v) = (a-b)-v (V4) l-v = v 69 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA of the individual equations in the system: / 0,5 0,125 0,2 0,25 0,75 0,2 \ 0,25 0,125 0,6 By doing elementary row transformations sequentially (they all correspond to adding rows and multiplication by scalars with the equations, see 2.1.7) we can eliminate the variables in the equations, one by one: 0,5 0,125 0,2 0,25 0,75 0,2 0,25 0,125 0,6 270 270 270 0,25 0,4 3 0,8 0,5 2,4 1 0,25 0 2,75 0 0,25 0,4 0,4 2 1 0,25 0,4 0 1 8 0 11 1,6 540 540 540 540 2160 2160 1 0,25 0,4 0 11 1,6 0 1 8 540 1080 1080 540 2160 2160 1 0,25 0 1 0 0 0,4 8 -86,4 540 2160 -21600 By back substitution, we compute successively ^-21600 -86,4 y = 2160 - 8- 250 = 160, x = 540 - 0,4 ■ 250 - 0,25 ■ 160 = 400. Thus it is necessary to mix 400 litres of reddish, 160 litres of bluish and 250 litres of greenish colour. □ 2.A.2. Solve the system of simultaneous linear equations xi + 2x2 + 3x3 = 2, 2a; i — 3x2 — 23 = —3, —3a;i + x2 + 2x3 = —3. Solution. We write the system of equations in the form of the extended matrix of the system -1 1 Every row of the matrix corresponds to one equation. As in the previous example, equivalent transformation of the equations correspond to the elementary row operations on the matrix and we use them to transform it into the row echelon form -1 1 2 3 2 0 -7 -7 -7 0 7 11 3 / 1 2 3 2 \ / 1 2 3 2 0 1 1 1 L - 0 1 1 1 Vo 0 4 -4 Vo 0 1 -1 First we subtracted from the second row twice the first row, and to the third row we added three times the first row. Then The properties (V1)-(V4) of our vectors are easily checked for any specific ring of scalars K, since we need just the corresponding properties of scalars as listed in 1.1.1 and 1.1.5, applied to individual components of the vectors. In this way we shall work with, for instance, R™, Q", C", but also with Zn, (Zk)n, n = 1, 2, 3,.... 2.1.2. Matrices over scalars. Matrices are slightly more complicated objects, useful when working with vectors. Matrices of type m/n A matrix of the type m/n over scalars K is a rectangular schema A with m rows and n columns / an a12 ... aln\ A = 0-21 0,22 Q-2n where a{j e \"ral «m2 for all 1 < 2 < m, 1 < j < n. For a matrix A with elements a{j we also use the notation A = (a^). The vector (an, ai2,..., ain) e K" is called the (2-th) row of the matrix A, 2 = l,...,m. The vector (aij, a2j,..., amj) G Km is called the (j-th) column of the matrix A, j = 1,..., n. Matrices of the type 1/n or n/1 are actually just vectors nK". All general matrices can be understood as vectors in -mn^ we just cQjjsJdgj. an the columns. In particular, matrix addition and matrix multiplication by scalars is denned: A + B = (aij + bij), a ■ A = (a ■ aij) where A = (a,j), B = (bij), a e K. The matrix — A = (—a{j) is called the additive inverse to the matrix A and the matrix /0 ... 0N> 0= i \0 ... Oy is called the zero matrix. By considering matrices as ran-dimensional vectors, we obtain the following: Proposition. The formulas for A+B, a-A, —A, 0 define the operations of addition and multiplication by scalars for the set of all matrices of the type m/n, which satisfy properties (V1)-(V4). 2.1.3. Matrices and equations. Many mathematical models are based on systems of linear equations. Matrices are useful for the description of such systems. In order to see this, let us introduce the notion of scalar product of two vectors, assigning to the vectors (ai,..., an) and (xi,..., xn) their product (ax,. ..,an)- (xx, ■ ■ ■ ,x axxx + -\- anxn. This means, we multiply the corresponding coordinates of the vectors and sum the results. 70 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA we added the second row to the third row and multiplied the second row by —1/4. Now we restore the system of equations x1 + 2x2 + 3x3 = 2, x2 + x3 = 1, 23 = -1. We see immediately that x3 = —1. If we substitute x3 = — 1 into the equation x2 + x3 = 1, we obtain x2 = 2. Then by substituting x3 = —1, x2 = 2 into the first equation, we obtain xi = 1. □ Systems of linear equations can be written in matrix notation. But is it an advantage, when we can solve the systems even without speaking about matrices? Yes it is, we can handle the equations more conceptually. We can easily decide how many solutions a system has. It is much more efficient in computer assisted computations. Thus we shall get familiar with various operations which can be done with matrices. As we have seen in previous examples, equivalent operations with linear equations correspond to elementary row (column) transformations. Further we have seen that transforming a matrix into a row echelon form, a process called Gaussian elimination, see 2.1.7), solves the system very easily. We demonstrate this on some examples, where we will see that a system can have infinitely many solutions or no solution at all. 2.A.3. Solve a system of linear equations 2xi - x2 + 3x3 = 0, 3a;i + I6X2 + 7x3 = 0, 3xi - 5x2 + 4x3 = 0, 7a; 1 + 7x2 + -10x3 = 0. Solution. Because the right-hand side of all equations is zero (such a case is called a homogeneous system) we work with the matrix of the system only. We find the solution by transforming the matrix into the row echelon form using elementary row transformations. These correspond to changing the order of equations, multiplying an equation by a non-zero number and addition of multiples of equations. Furthermore, we can always go back and forth between the matrix notation and the original system notation with variables x{. We obtain: 3 3 V 7 -1 3 ^ (2 -1 3 16 7 0 35/2 5/2 -5 4 0 -7/2 -1/2 7 -10 j ^ 0 7/2 1/2 \ Every system of m linear equations in n variables anxi + ai2x2 H-----h a\nxn = b\ a2ixi + a22x2 H-----h a2nxn = b2 amixi + am2x2 H-----h amnxn = bm can be seen as a constraint on values of m scalar products with one unknown vector (xi,..., x„) (called the vector of variables, or vector variable) and the known vectors of coordinates (an,. .., ain). The vector of variables can be also seen as a column in a matrix of the type n/1, and similarly the values can be seen as a vector u, and that is ? again a single column of the matrix of the type n/1. Our system of equations can then be formally written as A ■ x = u as follows: an Vaml ) where the left-hand side is interpreted as m scalar products of the individual rows of the matrix (giving rise to a column vector) with the vector variable x, whose values are prescribed by the equations. That means that the identity of the i-th coordinates corresponds to the original i-th equation anxi H-----h ainxn = h and the notation A - x = u gives the original system of equations. 2.1.4. Matrix product. In the plane, that is, for vectors of dimension two, we developed a matrix calculus. (H ~~±Z-3 We noticed that it is effective to work with (see 1.5.4). Now we generalize such a calculus and we develop all the tools we know already from the plane case to deal with higher dimensions n. It is possible to define matrix multiplication only when the dimensions of the rows and columns allow it, that is, when the scalar product is defined for them as before: Matrix product For any matrix A = (a^) of the type ra/n and any matrix B = (bjk) of the type n/q over the ring of scalars K we define their product C = A ■ B = (cik) as a matrix of the type ra/q with the elements aijbjk, for arbitrary 1 < i < m, 1 < k < q. That is, the element of the product is exactly the scalar product of the i-th row of the matrix on the left and of the fc-th column of the matrix on the right. For instance we have f2 1 \ / 2 1 1\ _ (3 2 3\ .1 -iJ'l-i 0 1 ~ [3 1 or 71 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA From there we see that the second, third and fourth equations are multiples of the equation 7x2 + x3 = 0. We continue: / 2 -1 3 \ / 2 -1 3 \ 0 35/2 5/2 0 -7/2 -1/2 \ 0 7/2 1/2 ) 0 35/2 5/2 0 0 0 V o 0 0 2.1.5. Square matrices. If there is the same number of rows and columns in the matrix, we speak of a square matrix. The number of rows or columns is then called the dimension of the matrix. The matrix 'i ... o^ E = (Sa) f2 1 3\ 0 7 1 0 0 0 \ 0 0 0 j Considered as equations, the last two are redundant, and we are left with just 2a; i — x2 + 3a3 = 0, 7a2 + a3 = 0 We substitute for the variable x3 a parameter ( 6 8 and express ^0 1, x2 1 1 --x3 =--t 7 3 7 1 , 11 xi = - (x2 - 3x3) = ——t. If we now substitute t = —7s, we obtain the result in a simple form (x1, x2, x3) = (lis, s, -7s), s G The whole system has infinitely many solutions. 2.A.4, □ Find all solutions of the system of linear equations 3a;i + 3a3 — 5a4 = —8, x1 — x2 + x3 — x4 = —2, —2xi — x2 + Ax3 — 2x4 = 0, 2xi + x2 — x3 — a4 = —3. Solution. The corresponding extended matrix of the system is / 3 0 3 -5 1-11-1 -2 -1 4 -2 \ 2 1-1-1 -8\ -2 0 -3 J By changing the order of rows (equations) we obtain -2 \ -3 0 -8 / which we transform into the row echelon form: i o o 0 / 1 -1 1 -1 2 1-1-1 -2 -1 4 -2 \ 3 0 3 -5 is called the unit matrix, or alternatively, the identity matrix. The numbers defined in such a way are also called the Kro-necker delta. When we restrict ourselves to square matrices over K of fixed dimension n, the matrix product is defined for any two matrices. That is, there is the well defined multiplication operation there. Its properties are similar to that of scalars: Proposition. On the set of all square matrices of dimension n over an arbitrary ring of scalars K, the multi-'*'-jt' / plication operation is defined with the following properties of rings (see 1.1.1): (Rl) Multiplication is associative. (R3) The unit matrix E = (8^) is the unit element for multiplication. (R4) Multiplication and addition is distributive. In general, neither the property (R2) nor (ID) are true. Therefore, the square matrices for n > 1 do not form an integral domain, and consequently they cannot be a (commutative or non-commutative) field. Proof. Associativity of multiplication - (Rl): Since scalars are associative, distributive and commutative, we can compute for any three matrices A = (ay-) of type ra/n, B = (bjk) of type n/p and C = (cm) of type p/q: A - B = l^aij ■ bjk), B ■ C = l^2bjk ■ cM ^ j ' ^ k (A-B)-C= (E(EajAfeH0 = (^atfijkCki), k j j,k A-(B-C) = (J2av(l2b3kCkl)) = (J2avbJkCkl)- j k j,k Note that while computing, we relied on the fact that it does not matter in which order are we performing the sums and products, that is, we were relying on the properties of scalars. We can easily see that multiplication by a unit matrix has the property of a unit element: 1 -1 1 -1 -2 2 1 -1 -1 -3 -2 -1 4 -2 0 CO 0 3 -5 -8 1 -1 1 -1 -2 0 3 -3 1 1 0 0 3 -3 -3 0 0 3 -3 -3 -1 1 -1 -2 CO -3 1 1 -3 6 -4 -4 0 -2 -2 -1 1 -1 -2 CO -3 1 1 0 CO -3 -3 0 0 0 0 A-E- i o o 0 The system has thus infinitely many solutions, because we have three equations in four variables. These three equations have exactly one solution for any choice for the variable an \0-ml air, (1 0 0 1 \0 0 0 V A and similarly from the left, E ■ A = A. It remains to prove the distributivity of multiplication and addition. Again using the distributivity of scalars we can 72 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA x4 G R. Thus for x4 we substitute the parameter t e R and go back from the matrix notation to the system of equations From the last equation we have a; 3 = x3 into the second equation gives 3x2-3t + 3 + t = l, that is, í — 1. Substituting for x2 = - (2i - 2). Finally, using the first equation, we have x1 - i (2í - 2) +í - 1 - í = -2, tj- a* = - (2í - 5). The set of solutions can be written (for t = 3s) in the form {(x±, x2, X3, x4) = (2s — |, 2s — §, 3s — 1, 3s) , s e R} We return to the extended matrix of the system and transform it further by using the row transformations in order to have (still in the row echelon form) the first non-zero number of every row (the so-called pivot) equal to one and that all the other numbers in the column of the pivot are zero. We have /i 0 0 V 0 0 0 V 0 0 0 V 0 -1 3 0 0 -1 1 0 0 -1 1 0 0 -1 2 \ 1 1 -3 -3 0 0 ) -1 -2 \ 1/3 1/3 -1 -1 0 0 / 0 -1 \ -2/3 -2/3 -1 -1 0 0 J 0 0 -2/3 -5/3 \ 0 1 0 -2/3 -2/3 0 0 1-1 -1 \ 0 0 0 0 0 / because first we have multiplied the second and the third row by 1 / 3, then we have added the third row to the second and its (—1)-multiple to the first. Finally we have added the second row to the first. From the last matrix we easily obtain the result X2 t e /-5/3\ /2/3\ -2/3 2/3 z3 -1 + 1 w V 0 J \ij Free variables are those whose columns do not contain any pivot (in our case there is no pivot in the fourth column, that is, the fourth variable is free and we use it as a parameter). □ easily calculate for matrices A = (a^) of the type ra/n, B = (bjk) of the type n/p, C = (cjk) of the type n/p, x1 - X2 + X-3 - t = -2, 3x2 - 3x3 + í = 1, A 3x3 - 3í = -3. {B + C)= (J2a^(bjk + cjk) = ( (Z^Afc) + (I>^) ) =A-B + A-C j j (B + C)-D= (Y,(b3k + c3k)dk \ k [ Y, bjkdki) + (Y, cJkdki) )=B-D + C-D. k k As we have seen in 1.5.4, two matrices of dimension two do not necessarily commute: for example A 0\ fO 1\ _ fO 1N v0 O) '[0 O) ~ [O 0y '0 1\ fl 0\ _ (0 0^ v0 Oj ' [O Oj ~ [0 0y This gives us immediately a counterexample to the validity of (R2) and (ID). For matrices of type 1/1 both axioms clearly hold, because the scalars itself have them. For matrices of greater dimension the counterexamples can be obtained similarly. Simply place the counterexamples for dimension 2 in their left upper corner, and select the rest to be zero. (Verify this on your own!) □ In the proof we have actually worked with matrices of more general types, thus we have proved the properties in greater generality: Associativity and distributivity Matrix multiplication is associative and distributive, that is, A ■ (B ■ O = (A ■ B) ■ C A-{B + C)=A-B + A-C, whenever are all the given operations denned. The unit matrix is a unit element for multiplication (both from the right and from the left). 2.1.6. Inverse matrices. With scalars we can do the following: from the equation a ■ x = b with a fixed --. invertible a we can express x = a-1 ■ b for any b. We would like to be able to do this for matrices too. So we need to solve the problem - how to tell that such a matrix exists, and if so, how to compute it? We say that B is the inverse of A if A ■ B = B ■ A = E. Then we write B = A-1. From the definition it is clear that both matrices must be square and of the same dimension n. A matrix which has an inverse is called an invertible matrix or a regular square matrix. 73 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.A.5. Determine the solutions of the system of equations - x2 + 3xi x\ -2x1 - x2 2xi + x2 3x3 — 5x4 = 8, x3 — X4 = -2, 4x3 - 2^4 = 0, x3 — X4 = -3. Solution. Note that the system of equations in this exercise differs from the system of equations in the previous exercise only in the value 8 (instead of —8) on the right-hand side. If we do the same row transformations as in the previous exercise, we obtain CO 0 CO -5 M 1 1 -1 1 -1 -2 -2 -1 4 -2 0 2 1 -1 -1 -* \ -1 1 -1 / 3 -3 1 1 0 3 -3 -3 0 3 -3 13 / \ -1 1 -1 "2\ 3 -3 1 1 0 3 -3 "3 0 0 0 16 where the last operation was subtracting the third row from the fourth. From the fourth equation 0 = 16 follows that the system has no solutions. Let us emphasize than whenever we obtain an equation of the form 0 = a for some u/0 (that is, zero row on the left side and non-zero number after the vertical bar) when doing the row transformation, the system has no solutions. □ You can find more exercises for systems of systems of linear equations on the page 124 Now we are going to manipulate with matrices to get more familiar with their properties. 2.A.6. Matrix multiplication. Note that, in order to be able to multiply two matrices, the necessary and sufficient condition is that the first matrix has the same number of columns as the number of rows of the second matrix. The number of rows of the resulting matrix is then given by the number of rows of the first matrix, the number of columns then equals the number of columns of the second matrix. In the subsequent paragraphs we derive (among other things) that B is actually the inverse of A whenever just one of the above required equations holds. The other is then a consequence. We easily check that if A-1 and B~x exist, then there also is the inverse of the product A ■ B (1) (A-B)-1 = B~X ■ A-1. Indeed, because of the associativity of matrix multiplication proved a while ago, we have (B-1 ■ A-1) -(A-B) = B-1 ■ (A-1 ■ A)-B = E (A-B)- (B-1 ■ A-1) = A ■ (B ■ B-1) ■ A"1 = E. Because we can calculate with matrices similarly as with scalars (they are just a little more complicated), "-^"i"^'" the existence of an inverse matrix can really help us with the solution of systems of linear equations: if we express a system of n equations for n unknowns as a matrix product A ■ x = an Vaml 0-17} \. 1 and when the inverse of the matrix A exists, then we can multiply from the left by A-1 to obtain A'1 ■ u = A'1 ■ A ■ x = E ■ x = x, that is, A-1 ■ u is the desired solution. On the other hand, expanding the condition A-A-1 = E for unknown scalars in the matrix A-1 gives us n systems of linear equations for the same matrix on the left and different vectors on the right. Thus we should think about methods for solutions of the systems of linear equations. 2.1.7. Equivalent operations with matrices. Let us gain some practical insight into the relation between systems of equations and their matrices. Clearly, searching for the inverse can be more complicated than finding the direct solution to the system of equations. But note that whenever we have to solve more systems of equations with the same matrix A but with different right sides u, then yielding A-1 can be really beneficial for us. From the point of view of solving systems of equations A ■ x = u, it is natural to consider the matrices A and vectors u equivalent whenever they give a system of equations with the same solution set. Let us think about possible operations which would simplify the matrix A such that obtaining the solution is easier. We begin with simple manipulations of rows of equations which do not influence the solution, and similar modifications of the right-hand side vector. If we are able to change a square matrix into the unit matrix, then the right-hand side vector is a solution of the original system. If some of the rows of the system vanish during the course of manipulations (that is, 74 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA ŕ 0 -5\ / 2 7 15 B=[ \2 7 13/ \ iv) v) (1 3 -3) vi) 1 2 Remark. Parts i) and ii) in the previous exercise show that multiplication of square matrices is not commutative in general. In part iii) we see that if we can multiply two rectangular matrices, then it is possible only in one of the orders. In parts iv) and v) note that (A ■ B)T = BT ■ AT. 2.A.7. Let A ■ Can the matrix A be transformed into B using only elementary row transformations (we say then that such matrices are row equivalent)? Solution. Both matrices are row equivalent with the three-dimensional identity matrix. It is easy to see that row equivalence on the set of all matrices of given type is indeed an equivalence relation. Thus the matrices A and B are row equivalent. □ 2.A.8. Find a matrix B for which the matrix C = B - A is 5°°J> in row echelon form, where A- Solution. If we multiply the matrix A successively from the left by elementary matrices (consider what elementary row transformations does it correspond to) E1 = E* = En = (3 -1 3 5 -3 2 3 1 -3 -5 0 \7 -5 1 4/ (° 0 1 ' 1 0 0 0 1 0 0 , E2 = -5 1 0 0 1 0 0 0 0 0 1 0 0 0 1/ .0 0 0 1/ /I 0 0 ON /I 0 0 °\ 0 1 0 0 , E4 = 0 1 0 0 -3 0 1 0 0 0 1 0 \o 0 0 h 0 0 1/ A 0 0 0N \ A 0 0 0 0 1/3 0 0 1 0 0 , Eq = 0 0 1 -2 0 1 0 0 \o 0 0 1, 1 \o 0 0 1/ they become zero), then we get some direct information about the solution. Our simple operations are: Elementary row transformations • interchanging two rows, • multiplication of any given row by a non-zero scalar, • adding another row to any given row. These operations are called elementary row transformations. It is clear that the corresponding operations at the level of the equations in the system do not change the set of the solutions whenever our ring of coordinates is an integral domain. Analogically, elementary column transformations of matrices are • interchanging two columns • multiplication of any given column by a non-zero scalar, • adding another column to any given column. These do not preserve the solution set, since they change the variables themselves. Systematically we can use elementary row transformations for subsequent elimination of variables. ; / This gives an algorithm which is usually called *~ the Gaussian elimination method. Henceforth, we shall assume that our scalars come from a integral domain (e.g. integers are allowed, but not say Z4). Gaussian elimination of variables Proposition. Any non-zero matrix over an arbitrary integral domain of scalars K can be transformed, using finitely many elementary row transformations, into row echelon form: • For each j, if aik = 0 for all columns k = 1,..., j, then cikj = Ofor all k > i, • (/ a(i-i)j is the first non-zero element at the (i — l)-st row, then a. 0. Proof. The matrix in row echelon form looks like 0 0 a2k H 2m 0 aip J The matrix can (but does not have to) end with some zero rows. In order to transform an arbitrary matrix, we can use a simple algorithm, which will bring us, row by row, to the resulting echelon form: 75 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA E7 = 0 0 0 1 0 -4 0 0 we obtain C = A -3 -5 0 \ 0 1 9/4 1/4 0 0 0 0 \o 0 0 o ) /I 0 0 0\ 0 1/4 0 0 8 — 0 0 1 0 [p 0 0 l) (° 0 1 °\ 0 1/12 - -5/12 0 1 -2/3 1/3 0 \o -4/3 - -1/3 V □ Gaussian elimination algorithm (1) By a possible interchange of rows we can obtain a matrix where the first row has a non-zero element in the first non-zero column. Let that column be column j. In other words, ay ^ 0, but aiq = 0 for all i, and all q, 1 < q < 3- (2) For each 2 = 2,..., multiply the first row by the element a,ij, multiply 2-th row by the element ay and subtract, to obtain a{j = 0 on the 2-th row. (3) By repeated application of the steps (1) and (2), always for the not-yet-echelon part of rows and columns in the matrix we reach, after a finite number of steps, the final form of the matrix. This algorithm clearly stops after a finite number of steps and provides the proof of the proposition. □ matrices C = { 2.A.9. Complex numbers as matrices. Consider the set of a b —b a C is closed under addition and matrix multiplication, and further show that the mapping / : C —> ^a + bi satisfies f(M + N) = f(M)+f(N) and./': \l\ f(M)-f(N) (onthe left-hand sides of the equations we have addition and multiplication of matrices, on the right-hand sides we have addition and multiplication of complex numbers). Thus the set C along with multiplication and addition can be seen as the field C of complex numbers. The mapping / is called an isomorphism (of fields). Thus for instance we have The given algorithm is really the usual elimination of , a, b G R}. Note that variables used in the systems of linear equations. In a completely analogous manner we define the column echelon form of matrices and considering column elementary transformations instead the row ones, we obtain an algorithm for transforming matrices into the column echelon form. Remark. Although we could formulate the Gaussian elim-(Ji., ination for general scalars from any ring, this does not make much sense in view of solving equations. Clearly having divisors of zero among the scalars, we fS might get zeros during the procedure and lose information this way. Think carefully about the differences between the choices K = Z, K = R and possibly Z2 or Z4. 3 5 -5 3 8 -9 9 8 69 13 -13 69 which corresponds to (3 + 5i) ■ (8 — 9i) = 69 — 132. 2.A.10. Solve the equations for matrices ■Xx = X2 1 2 3 4 Solution. Clearly the unknowns X1 and X2 must be matrices of the type 2 x 2 (in order for the products to be defined and that the result is a matrix of the type 2 x 2). Set Xx X2 b2 d2 On the other hand, if we are dealing with fields of scalars, we can always arrive at a row echelon form where the nonzero entries on the "diagonal" are ones. This is done by applying the appropriate scalar multiplication to each individual row. However, this is not possible in general - think for instance of the integers Z. 2.1.8. Matrix of elementary row transformations. Let us now restrict ourselves to fields of scalars K, that is, every nonzero scalar has an inverse. Note that elementary row or column transformations correspond respectively to multiplication from the left or right by the following matrices (only the differences from the unit matrix are indicated): (1) Interchanging the 2-th and j-th row (column) and multiply out the matrices in the first given equation. We obtain f a\ + 3ci , 3ai + 8ci bi + 3di 3bi + 8di 1 2 3 4 0 i-th row j-th row 76 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA that is, ai 3<2i + 3ci + 8ci + 3d! = 1, = 2, = 3, 8di = 4. 3&i + By adding a (—3)-multiple of the first equation with the third equation we obtain c1 = 0 and then a1 = 1. Analogously, by adding a (—3)-multiple of the second equation to the fourth equation we obtain d1 = 2 and then b\ = —4. Thus we have Xl = (o 2 We can find the values a2, b2, c2, d2 by a different approach. If A is a square matrix, we write A-1 to denote its inverse, so that A ■ A-1 = A-1 ■ A = E, the unit matrix) It is easy to check that a b\~1 _ 1 (d -b\ c dj ad-be \-c a J ' which holds for any numbers a, b,c,d e K provided ad—be ^ 0. (This is easy to derive; it also directly follows from formula 1 in 2.2.11). We calculate 1 3\_1 _ f-8 3 3 8) ~ \ 3 _1, Multiplying the given equations by this matrix from the right gives X2 and thus 1 2 3 4 X, = -2 -12 -8 -1 □ 2A.11. Solve the matrix equation 1 í = ß X • O 2.A. 12. Computing the inverse matrix. Compute the inverse of the matrices 3 2\ A 0 A A= 5 6 3 ß = 3 3 4 \3 5 2 \2 2 3 Then determine the matrix [AT ■ B) Solution. We find the inverse by the following method: write next to each other the matrix A and the unit matrix. Then use elementary row transformations so that the sub-matrix A changes into the unit matrix. This will change the original unit sub-matrix to A-1. We obtain (2) Multiplication of the i-th row (column) by the scalar a: \ i-th row •••/ (3) To row i, add row j (columns): i-th row and j-th column V This trivial observation is actually very important, since the product of invertible matrices is invertible */ (recall 2.1.6(1)) and all elementary transformations over a field of scalars are invertible (the definition of the elementary transformation itself ensures that inverse transformations are of the same type and it is easy to determine the corresponding matrix). Thus, the Gaussian elimination algorithm tells us, that for an arbitrary matrix A, we can obtain its equivalent row echelon form A = P - A by multiplying with a suitable invertible matrix P = Pk ■ ■ ■ Pi from the left (that is, sequential multiplication with k matrices of the elementary row transformations). If we apply the same elimination procedure for the columns, we can transform any matrix B into its column echelon form B' by multiplying it from the right by a suitable invertible matrix Q = Qi ■ ■ ■ Qt. If we start with the matrix B = A in row echelon form, this procedure eliminates only the still non-zero elements out of the diagonal of the matrix and in the end we can transform the remaining elements to be units. Thus we have verified a very important result which we will use many times in the future: 2.1.9. Theorem. For every matrix A of the type m/n over a field of scalars K, there exist square invertible matrices P and Q of dimensions m and n, respectively, such that the matrix P ■ A is in row echelon form and P-A-Q = 0 0 0 0 1 0 0 0 °\ 0 0 J The number of the ones in the diagonal is independent of the particular choice of P and Q. Proof. We already have proved everything but the last sentence. We shall see this last claim below in 2.1.11. □ 77 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA In the first step we subtracted from the first row the third row, in the second step we added a (—5) -multiple of the first to the second row and added a (—3)-multiple of the first row to the third row, in the third step we subtracted from the second row the third row, in the fourth step we added a (—2)-multiple of the second row to the third row, in the fifth step we added a (—5)-multiple of the third row to the second row and added a 2-multiple of the third row to the first row, and in the last step we changed the second and the third row. We have obtained the result / 3 -4 3 \ A'1 = I 1 -2 2 . \-7 11 -9/ Note that when calculating the matrix A-1 we did not have to cope with fractions thanks to the suitably chosen row transformations. Although we could carry on similarly when doing the next exercise, that is, B~x, we will rather do the more obvious row transformations. We have 2.1.10. Algorithm for computing inverse matrices. In the previous paragraphs we almost obtained the complete algorithm for computing the inverse matrix. Using the simple modification below, we find either that the inverse does not exist, or we compute the inverse. Keep in mind that we are still working over a field of scalars. Equivalent row transformations of a square matrix A of dimension n leads to an invertible matrix P' such that P' ■ A is in row echelon form. If A has an inverse, then there exists also the inverse of P' ■ A. But if the last row of P' - A is zero, then the last row of P' - A ■ B is also zero for any matrix B of dimension n. Thus, the existence of a zero row in the result of (row) Gaussian elimination excludes the existence of A-1. Assume now that A-1 exists. As we have just seen, the row echelon form of A will have exclusively non-zero rows only, In particular, all diagonal elements ofP'-A are non-zero. But now, we can employ row elimination by the elementary row transformation from the bottom-right corner backwards and also transform the diagonal elements to be units. In this way, we obtain the unit matrix E. Summarizing, we find another invertible matrix P" such that for P = P" ■ P' we have P ■ A = E. Now observe that we could clearly work with columns instead of row transformation and thus, under the assumption of the existence of A-1, we would find a matrix Q such that A ■ Q = E. From this we see immediately that P = P ■ E = P ■ (A ■ Q) = (P ■ A) ■ Q = E ■ Q = Q. That is, we have found the inverse matrix A-1 = P = Q for the matrix A. Notice that at the point of finding the matrix P with the property P -A = E, we do not have to do any further computation, since we have already obtained the inverse matrix. In practice, we can work as follows: Computing the inverse matrix Write the unit matrix E to the right of the matrix A, producing an augmented matrix (A, E). Transform the augmented matrix using the elementary row transformations to row echelon form. This produces an augmented matrix (PA, PE), where P is invertible, and PA is in row echelon form. By the above, either PA = E, in which case A is invertible and P = PE = A-1, or PA has a row of zeros, in which case we conclude that the inverse matrix for A does not exist. 2.1.11. Linear dependence and rank. In the previous practical algorithms dealing with matrices we worked all the time with row and column 5 additions and scalar multiplications, seeing them as vectors. Such operations are called linear combinations. We shall return to such operations in an abstract sense later on in 2.3.1. But it will be useful to understand their core meaning right 78 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA that is, B-1 ' 1 2 -1 1 Using the identity [AT ■ B) _1 = Br B-1 ■ (A-1)1 1 and the knowledge of the inverse matrices computed before, we obtain / 1 2 -3\ / 3 = -1 1 -1--4 V 0 -2 3 / \ 3 □ -14 -9 42 -10 -5 27 17 10 -49 2A.13. Compute the inverse of the matrix /i 0 -2\ A = 2 -2 1 . \5 -5 2 / 2A.14. Calculate A5 and A~3, if I 2 -1 1\ A= -1 2 -1 . \0 0 lj 2.A./5. Compute the inverse of the matrix O o 8 3 0 0 5 2 0 0 0 0 0 -1 0 0 0 0 0 1 2 Vo 0 0 3 o 2A.16. Determine whether there exists an inverse of the matrix C = (11 1 l\ 11-11 1-1 1 -1 \i -i-i iy If yes, then compute C _1. now. A linear combination of rows of a matrix A = (a^) of type m/n is understood as an expression of the form ciUi H-----h ckUi where c, are scalars, u, are rows of the matrix A. Similarly, we can consider linear combinations of columns by replacing the above rows Uj by the columns Uj = If the zero row can be written as a linear combination of some given rows with at least one non-zero scalar coefficient, we say that these rows are linearly dependent. In the alternative case, that is, when the only possibility of obtaining the zero row is to select all the scalars Cj equal to zero, the rows are called linearly independent. Analogously, we define linearly dependent and linearly independent columns. The previous results about the Gaussian elimination can be now interpreted as follows: the number of nonzero "steps" in the row (column) echelon form is ► always equal to the number of linearly independent !'J rows (columns) of the matrix. Let Eh be the matrix from the theorem 2.1.9 with h ones on the diagonals and assume that by two different row transformation procedures into the echelon form we obtain two different h' < h. But then according to our algorithm there are invertible matrices P, P', Q, and Q' such that Eh = P ■ A- Q, Eh' = P' ■ A ■ In particular, Eh = P ■ P'~x ■ Eh> ■ Q'"1 invertible matrices P" and Q" such that ) and so there are P" ■ Eh. Eh In the product P" ■ Eh> there will be more zero rows in the bottom part of the echelon matrix than we see in Eh and we must be able to reach Eh using only elementary column transformations. This is clearly not possible, because the zero rows remain zero there. Therefore the number of ones in the matrix P - A ■ Q in theorem 2.1.9 is independent of the choice of our elimination procedure and it is always equal to the number of linearly independent rows in A, which must be the same as the number of linearly independent columns in A. This number is called the rank of the matrix and we denote it by h(A). We have the following theorem: Theorem. Let A be a matrix of type m/n over a field of scalars K. The matrix A has the same number h(A) of linearly independent rows as linearly independent columns. In particular, the rank is always at most the minimum of the dimensions of the matrix A. The algorithm for computing the inverse matrix also says that a square matrix A of dimension m has an inverse if and only if its rank equals m. 79 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA O 2A.17. Compute A-1, if o 2.A.18. Find the inverse to the n x n matrix (n > 1) /2-n 1 1 2 -n 1 V i 1 1 1 \ 1 2-n 1 1 2-n/ Solution. You can try for small n (n = 2, 3, 4), which is easy to compute with the known algorithm, and then guess the general form. 1 A~ n- 1 1 1 0 V i 1\ 1 1 □ We have already encountered systems of linear equations at the beginning of the chapter. Now we will deal with them in more detail. We use the inverse matrix to assist in computing the solution to the system of linear equations. Note that we do the same computation as before. To express the variables is the same as to bring the matrix of the system with equivalent transformation to the identity matrix and that is the same as to multiply the matrix of the system with the inverse matrix. 2.A.19. Participants of a trip. There were 45 participants of a two-day bus trip. On the first day, the fee for a watchtower visit was €30 for an adult, €16 for a child and €24 for a senior. The total fee for the first day was €1116. On the second day, the fee for a bus with a palace and botanical garden tour was €40 for an adult, €24 for a child and €34 for a senior. The total fee for the second day was €1 542. How many adults, children and seniors were there among the participants? Solution. Introduce the variables x for the „number of adults"; y for the „number of children"; 2.1.12. Matrices as mappings. Similarly to the way we worked with matrices in the geometry of the plane (see 1.5.7), we can interpret every matrix A of the type ra/n as a mapping A x n> A ■ x. By the distributivity of matrix multiplication, it is clear how the linear combinations of vectors are mapped using such mappings: A - (ax + by) = a (A ■ x) + b (A ■ y). Straight from the definition we see, by the associativity of multiplication, that composition of mappings corresponds to matrix multiplication in given order. Thus invertible matrices of dimension n correspond to bijective mappings A : I" —> Remark. From this point of view, the theorem 2.1.9 is very interesting. We can see it as follows: the rank of the matrix determines how large is the image of the whole K" under this mapping. In fact, if _ A = P- Ek-Q where the matrix has k ones as in 2.1.9, then the invertible Q first bijectively "shuffles" the n-dimensional vectors in K", the matrix then "copies" the first k coordinates and completes them with the remaining m — k zeros. This "fc-dimensional" image then cannot be enlarged by multiplying with P. Multiplying by P can only bijectively reshuffle the coordinates. 2.1.13. Back to linear equations. We shall return to the notions of dimension, linear independence and so on in the third part of this chapter. But we should notice now what our results say about the solutions of the systems of linear equations. If we consider the matrix of the system of equations and add to it the column of the required results, we speak about the extended matrix of the system. The above Gaussian elimination approach corresponds to the sequential variable elimination in the equations and the deletion of the linearly dependent equations (these are simply consequences of other equations). Thus we have derived complete information about the size of the set of solutions of the system of linear equations, based on the rank of the matrix of the system. If we are left with more non-zero rows in the row echelon form of the extended matrix than in the original matrix of the system, then there cannot be a solution (simply, we cannot obtain the given vector value with the corresponding linear mapping). If the rank of both matrices is the same, then the backwards elimination provides exactly as many free parameters as the difference between the number of variables n and the rank h(A). In particular, there will be exactly one solution if and only if the matrix is invertible. All this will be stated explicitely in terms of abstract vector spaces in the important Kronecker-Capelli theorem, see 2.3.5. 80 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA z for the „number of seniors"; There were 45 participants, therefore x + y + z = 45. The fees for the first and second days respectively imply that 30a; + 16y + 24z = 1116, 40a; + 24y + 34z = 1542. We write the system of three linear equations in the matrix notation as We compute 1 1\ x\ / 45 30 16 24 ■ V = 1116 40 24 34/ W \1542 1 1 30 16 24 40 24 34 16 5 -4 30 3 -3 -40 -8 7 Hence the solution is fx\ 1 I 16 5-4 \y \= - 30 3 -3 \zj D \-40 -8 7 expressed in words, there were 22 adults, 12 children and 11 seniors. □ The latter approach is particularly efficient if we have to solve several systems with the same matrix on the left hand side but different values on the right hand side. But what if the matrix of the system is not invertible? Then we cannot use the inverse matrix for solving the system. Such a system cannot have a single solution. As the reader may have noticed above, a system of linear equations either has no solution, has one solution or has infinitely many solutions, depending on one or more free parameters (for instance, it cannot have exactly two solutions). We should have also noticed when dealing with equations with two variables in the previous section, that the space of the solutions is either a vector space (in the case when the right-hand side of the system is zero, we speak of a homogeneous system of linear equations) or an affine space, see 4.1.1 (in the case when the right-hand side of at least one of the equations is non-zero, we speak of a non-homogeneous system of linear equations). We can recognize all the possibilities from the rank of the matrices, i.e. the number of nonzero rows left in the row-echelon form. 2. Determinants In the fifth part of the first chapter, we introduced the scalar function det on square matrices of dimension 2 over the real numbers, called deter-=gS~M minant, see 1.5.5. We saw that the determinant assigned a non-zero number to a matrix if and only the matrix was invertible. We did not say it in exactly this way, but you can check for yourself in previous paragraphs starting with 1.5.4 and formula 1.5.5(1). We saw also that determinants were useful in another way, see the paragraphs 1.5.10 and 1.5.11. There we showed that the volume of the parallelepiped should be linearly dependent on every two of the vectors defining it. It was useful to require the change of the sign when changing the order of these vectors. Because determinants (and only determinants) have these properties, up to a constant scalar multiple, we concluded that it was determining the volume. Now we will see that we can proceed similarly for every finite dimension. We work again with arbitrary scalars K and matrices over these scalars. Our results about determinants will thus hold for all commutative rings, notably also for integer matrices or matrices over any residue classes. 2.2.1. Definition of the determinant. Each bijective mapping from a set X to itself is called a permutation of the set X, cf. 1.3.1. If X = {1,2,..., n), the permutation can be written by putting the resulting ordering into a table: 1 2 a{\) a{2) n a {n) In this way, we shall view a permutation as a bijection or an ordering. The element s e lis called a fixed point of the permutation a if a (x) = x. If there exist exactly two distinct elements x, y e X such that cr(b). A permutation a is called even or odd, if it contains an even or odd number of inversions, respectively. Thus, the parity sgntr of the permutation a is (^number of inversions an(j wg denQte u fey sgn(fJ). Tnis amounts to our definition of sign for computing determinant. But we should like to know how to calculate the parity. The following theorem reveals that the Sarrus rule really defines the determinant in dimension 3. Theorem. Over the set X = {1,2,... ,n} there are exactly n\ distinct permutations. These can be ordered in a sequence such that every two consecutive permutations differ in exactly one transposition. Every transposition changes parity. For any chosen permutation a there is such a sequence starting with a. Proof. For n = 1 or n = 2, the claim is trivial. We prove the theorem by induction on the size n of the set X. Assume that the claim holds for all sets with n — 1 elements and consider a permutation tr(l) = ai,..., a(n) = an. According to the induction assumption, all the permutations that end with an can be obtained in a sequence, where every two consecutive permutations differ in one transposition. There are (n — 1)! such permutations. In order to proceed further, we select the last of them, and use the transposition of cr(n) = an with some element a, which has not been at the last position yet. Once again, we form a sequence of all permutations that end with a,. After doing this procedure n-times, we obtain n(n — 1)! = n\ distinct permutations - that 82 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA The third system is given by the extended matrix 1 \ -4 1 2 ) 1 -3 0 1 -2 2 1 -1 0 \ -2 -1 1 which is the matrix A (only the last column is given after the vertical bar). If we try to simplify the matrix into the row echelon form, we must obtain a row ( 0 0 0 I a ) , where a ^ 0. We know, that the column on the right-hand side is not a linear combination of the columns on the left-hand side (the rank of the matrix is 4). This system thus has no solution. □ For further examples see 2.E.7 B. Determinants In order to be able to define the key object of the matrix calculus, the determinant, we must deal with permutations (bijections of a finite set) and their parities. We shall use the two-row notation for permutations (see 2.2.1). In the first row we list all elements of the given set, and every column then corresponds to a pair (preimage, image) in the given permutation. Because a permutation is a bijection, the second row is indeed a permutation (ordering) of the first row, in accordance with the definition from combinatorics. 2.B.I. Decompose the permutation '\ 2345678 9^ v3 16789542. into a product of transpositions. Solution. We first decompose the permutation into a product of independent cycles. Start with the first element 1 and look on the second row to see what the image of 1 is. It is 3. Now look on the column that starts with 3, and see that the image of 3 is 6, and so on. Continue until we again reach the starting element 1. We obtain the following sequence of elements, which map to each other under the given permutation: 1i-^3i-^6i-^9i-^2i-^1. The mapping which maps elements in such a manner is called a cycle (see 2.2.3) which we denote by (1,3, 6,9, 2). Now choose any element not contained in the obtained cycle. With the same procedure as with 1, we obtain the cycle (4,7,5,8). From the method is clear that the result does not depend on the first obtained cycle. Each element from the set is, all permutations on n elements. The resulting sequence satisfies the condition. Note that the last sentence of the theorem does not seem to be useful in practice. But it is a very important part for proving the theorem by induction over the size of X. It remains to prove the part of the theorem about parities. Consider the ordering (ai,..., a,, aj+i,... , an), containing r inversions. Then in the ordering (ai,... , Oi+i, ai,. .., an) there are either r — 1 or r +1 inversions. Every transposition (a{, a,j) is obtainable by doing (j — i) + (j—i — l) = 2(j — transpositions of neighbouring elements. Therefore any transposition changes the parity. Also, we already know that all permutations can be obtained by applying transpositions. □ We found that applying a transposition changes the parity of a permutation and any ordering of numbers {1,2,... ,n} can be obtained through transposing of neighbouring elements. Therefore we have proven Corollary. On every finite set X = {1,. i} with n ele- ments, n > 1, there are exactly |n! even permutations, and |n! odd permutations. If we compose two permutations, it means first doing all transpositions forming the first permutation and then all the transpositions forming the second one. Therefore for any two permutations cr,ij:I^Iwe have sgn(tr orj) = sgn(tr) ■ sgn(n) and also sgrifo-"1) = sgn(tr). 2.2.3. Decomposing permutations into cycles. A good tool for practical work with permutations is the cycle decomposition, which is also a good exercise on the concept of equivalence. Cycles A permutation a over the set X = {1,..., n) is called a cycle of length k, if we can find elements a1,..., a^ e X, 2 < k < n such that u(ai) = ai+1, i = 1,..., k — 1, while a (ah) = ai, and other elements in X are fixed-points of a. Cycles of length two are transpositions. Every permutation is a composition of cycles. Cycles of even length have parity — 1, cycles of odd length have parity 1. Proof. The last claim has yet to be proved. Fix a permutation a and define a relation R such that \i, two elements x, y e X are i?-related if and only if de(x) = y for some iteration £ e Z of the permutation a (notice a~x means the inverse bijection to a). Clearly, it is an equivalence relation 83 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA ({1,2,..., 9}) appears in one of the obtained cycles, we can thus write: a = (1,3,6,9, 2)o(4,7,5,8), a = (4,7,5,8)o(l,3,6,9, 2), since independent cycles commute. For cycles the decomposition into transpositions is simple, we have (1,3, 6,9,2) = (l,3)o(3,6)o(6,9)o(9,2) = (1,3)(3,6)(6,9)(9,2). Thus we obtain: a = (1,3)(3,6)(6, 9)(9, 2)(4,7)(7,5)(5,8). □ Remark. The minimal number of transpositions in the decomposition of a permutation is obtained by carrying out exactly the procedure as above. That is, first decompose the permutation into the independent cycles, then the cycles canoni-cally into the transpositions. Thus the found decomposition is the decomposition into the minimal number of transpositions. Note also that the operation o is a composition of mappings, thus it is necessary to carry out the composition "backwards", as we are used to in composition of mappings. Applying the given composition of transposition for instance on the element two we can successively write: [(1,3)(3,6)(6,9)(9,2)](2) = [(1,3)(3,6)(6,9)]((9,2)(2)) = [(1,3)(3,6)(6,9)](9) = [(1,3)(3,6)](6) = (1,3)(3) = 1, thus the mapping indeed maps the element 2 on the element 1 (it is actually just the cycle (1,3,6,9,2) written in a different way). When writing a composition of permutations, we often omit the sign "o" and speak of the product of permutations. When writing the cycle we write only the elements on which the cycle (that is, the mapping) nontrivially acts (that is, the element is mapped to some other element). Fixed-points of the cycle are not listed. Thus it is necessary to know on which set do we consider the given cycle (mostly it will be clear from the context). The cycle c = (4,7,5, 8) from the previous example is thus a mapping (permutation), which, in the two-row notation, looks like this 1 2345678 9 123786549 If the original permutation has some fixed-points they do not appear in the cycle decomposition. (check it carefully!). Because x is a finite set, for some £ it must be that cre(x) = x. If we pick one equivalence class {x, 1. There is just one term with all of its elements on the diagonal. In all other terms, there must be elements both above and below the diagonal (if we place one element outside of the diagonal, we block two diagonal entries and we leave only n — 2 diagonal positions for the other n — 1 elements). Therefore, if the matrix A is in a row echelon form, then every term of \A\ is zero, except the term with exclusively diagonal entries. This proves the following algorithm: Computing determinants using elimination If A is in the row echelon form then \A\ = ana22 ■ ■ ■ ann. The previous theorem gives an effective method for computing determinants using the Gauss elimination method, see the paragraph 2.1.7. Notice that the very same argumentation allows us to stop the elimination having the first k columns in the requested form and finding the determinant of the matrix B of dimension n—k in the right bottom corner of A in another way. The result will then be \ A\ = ana22 • • • (ikk\B\. Let us note a nice corollary of the first claim of the previous theorem about the equality of the determinants of the matrix and its transpose. It ensures that whenever we prove some claim about determinants formulated in terms of rows 86 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 1 3 5 6 1 1 1 2 1 1 1 2 1 2 2 2 1 2 2 2 0 1 1 0 1 1 1 2 1 3 5 6 0 2 4 4 0 1 2 1 0 1 2 1 0 1 2 1 1 1 1 2 1 1 1 2 1 1 1 2 0 1 1 0 0 1 1 0 0 1 1 0 0 0 2 4 0 0 1 1 0 0 1 1 0 0 1 1 0 0 2 4 0 0 0 2 = 2. Note, that we have interchanged the rows twice in the course of computation. The other way of computing the determinant is by cofac-tor expansion along the first column (the one with the greatest number (one) of zeroes). Successively we obtain 1112 0 12 1 2 2 2 3 5 6 1 ■ 1 1 2 - 1 ■ 1 1 2 1 2 1 1 2 1 using the Sarrus rule -2-2 + 6 = 2. □ 2.B.5. Compute the determinant of the matrix /l 0 1 0 1\ 0 2 0 0 3 4 0 0 \0 0 0 0 5/ Solution. We notice, that the last (fifth) row contains four zeros (as well as the second column). It is the most, we can find in a row or a column in the matrix, thus it will be advantageous to use Laplace theorem (2.3.10) and compute the determinant via expasion along the fifth row or second column. We present the expansion via fifth row: 10 10 1 0 2 0 2 0 0 0 3 0 3 4 0 0 4 4 0 0 0 0 5 +0 10 0 1 0 2 2 0 0 0 0 3 4 0 4 4 0 1 0 1 0- 2 0 2 0 0 3 0 3 0 0 4 4 1 0 1 1 0- 0 2 0 0 0 0 3 3 4 0 0 4 0 1 0 2 0 0 3 2 0 = 5 0 0 4 -0- + 5' 1 1 0 1 0 0 2 0 0 3 0 3 4 0 4 4 1 0 1 0 0 2 0 2 0 0 3 0 4 0 0 4 1 0 3 0 = 120 0 4 of the corresponding matrix, we immediately obtain an analogous claim in terms of the columns. For instance, we can immediately formulate all the claims (2)-(6) for linear combinations of columns. As a useful (theoretical) illustration of this principle, we shall derive the following formula for direct cal-/^t^-y^? culation of solutions of systems of linear equa-tions. For sake of simplicity, we shall work with Vllefe.. gey 0f scalars now. Cramer rule Proposition. Consider the system of n linear equations for n variables with matrix of the system A = (a^) and the column of values b = (pi,..., bn). In matrix notation this means we are solving the equation A - x = b. If there exists the inverse \ A\_1, then the individual components of the unique solution x = (xi,..., xn) are given as xi — \ A{ \ \A\ , where the matrices Ai arise from the matrix A of the system by replacing the i-th column by the column b of values. Proof. As we have already seen, working over field of scalars the inverse of the matrix of the system exists if and only if the system has a unique solution, and this in turn happens if and only if \A\_1 exists. If we have such a solution x, we can express the column b in the matrix A{ by the corresponding linear combination of the columns of the matrix A, that is the values &fc = a^ixi-l-----\-a,knxn. Then, by subtracting the 2^-multiples of all the other £-\h columns from this i-th column in A{, we arrive at just the x{ -multiple of the original column of A. The number Xi can thus be brought in front of the determinant to obtain the equation \A{\ = x{ \A\, and thus A^||^4|_1 = a^l^H^I-1 = Xi, which is our claim. □ Notice also that the properties (3)-(5) from the previous theorem say that the determinant, (considered as a mapping which assigns a scalar to n vectors of dimension n), is an antisymmetric mapping linear in every argument, exactly as we required in analogy to the 2-dimensional case. 2.2.7. Further properties of the determinant. Later we will see that, exactly as in the dimension 2, the determinant of the matrix equals to the (oriented) volume of the parallelepiped determined by the columns of the matrix. We shall also see that considering the mapping x i-> A ■ x given by the square matrix A on R™ we can understand the determinant of this matrix as expressing the ratio between the volume of the parallelepipeds given by the vectors x\,..., xn and their images A • xi,..., A • xn. Because the composition a; i-> Ax i-> B-(Ax) of mappings corresponds to the matrix multiplication, the Cauchy theorem below is easy to understand: 87 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA where we have used the expansion along the second column in the second step and computed the determinant of the 3 x 3 matrix directly using the Sarrus rule. Another option is to try to expand the determinant along several rows, exploiting vanishing of many sub-determinants there. For example, we may use the last two rows. Clearly there might be only two non-zero sub-determinants built from this row there. Thus the entire determinant must be (notice that choosing two lines and two columns always leads to the plus sign in the definition of the algebraic complement, see 2.3.10) 4 4 0 5 Cauchy theorem 0 1 0 A A i 0 1 2 0 2 + 4 0 4 r. 0 2 0 0 3 0 0 0 0 3 20-0 + 20-6 = 120 □ 2.B.6. Find all the values of a such that a 1 1 1 a 1 1 1 a 1 0 0 -a 1. D D = a- (-a) 2/ 2 -a (a For complex a give either its algebraic or polar form. Solution. We compute the determinant by expanding the first row of the matrix: a 1 1 1 0 a 1 1 Ola 1 ~ ' 0 0 0 -a Expand further using the last row: a 1 1 a We conclude that a4 — a2 + 1 = 0. Substituting t = a +2 t + 1 with roots 11 = l-iV3 we have t isin(7r/3), Í2 cos(7t/3) = cos(7t/3) + isin(7r/3) = cos(—7r/3) + i sin(—7r/3), from where we obtain four possible values for the parameter a: ai = cos(7r/6)+i sin(7r/6) = VS/2 + i/2, a2 = cos(77r/6) + i sin(77r/6) = — \/3/2 — i/2, a3 = cos(—7r/6) + isin(—7r/6) = v/3/2 — i/2, = cos (5tt/6) + i sin(57r/6) = -VS/2 + i/2. Alternatively, we can multiply by a2 + 1 to obtain a6 + l (a2 + l)(a4-a2 + l) = 0. The equation a6 = — 1 has six (complex) solutions given by a = cos p + i simp where p = 7r/6 + kir/S = (2k + l)7r/6, k = 0,1,2,3,4,5. Of these, we must discard the two choices k = 1, and k = 4, since these choices solve a2 + 1 = 0 and Theorem. Let A = (a^), B = (pij) be square matrices of dimension n over the ring of scalars K. Then \A-B\ = \A\ ■ \B\. In the next paragraphs, we derive this theorem in a purely algebraic way, in particular because the previous argumentation based on geometrical intuition could hardly work for arbitrary scalars. The basic tool is the determinant expansion using one or more of the rows or columns which we have seen in simplest case of single rows or columns in 2.2.4. We will also need a little technical preparation. The reader who is not fond of too much abstraction can skip these paragraphs and note only the statement of the Laplace theorem and its corollaries. Notice also, the claims (2), (3) and (6) from the theorem 2.2.5 are easily deduced from the Cauchy theorem and the representation of the elementary row transformations as multiplication by suitable matrices (cf. 2.1.8). 2.2.8. Minors of the matrix. When investigating matrices ■ and their properties we often work only with parts of the matrices. Therefore we need some new concepts. sub matrices and minors LetA= (aij) be a matrix of the type ra/n and let 1 < i\ < ... < ik < m, 1 < ji < ... < ji < n be fixed natural numbers. Then the matrix M = of the type k/l is called a submatrix of the matrix A determined by the rows i\,..., ik and columns ji,..., je. The remaining (m — k) rows and (n — I) columns determine a matrix M* of the type (m—k) / (n—l), which is called complementary submatrix to M in A. When k = I we call the determinant \ I the subdeterminant or minor of the order k of the matrix A. If m = n and k = I, then M* is also a square matrix and \M*\ is called the minor complement to \M\, or complementary minor of the submatrix M in the matrix A. The scalar ( —l)il_l—^k+h-i—kn . |Jkf*| is then called the algebraic complement of the minor \M\. The submatrices formed by the first k rows and columns are called leading principal submatrices, and their determinants are called leading principal minors of the matrix A. If we choose k sequential rows and columns starting with the i-th row, we speak of principal submatrices and principal minors. 88 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA not a4 — a2 + 1 = 0. We conclude that a = cos p + i sin p where y = (2fc + l)7r/6, A; = 0,2, 3, or 5. □ 2.B.7. Vandermonde determinant. Prove the formula for the Vandermonde determinant, that is, the determinant of the Vandermonde matrix: Vn = where x1,..., xn e K and on the right-hand side of the equation there is the product of all terms Xj — x{ where j > i. Solution. We proceed by induction on n. From technical reasons we work with the transposed Vandermonde matrix (it has the same determinant). By subtracting the first row from all other rows and then expanding the first column we obtain 1 1 . 1 Xl x2 ■ xn x\ x2 X2 ^n ~ TT Xi xT1 x2 ■ ™n—1 ^n Vn(x1,x2, 1 0 Xl x2 - x1 0 xn Xi xn x-^ x2 — X\ x\ — x\ 2 2 Xji X\ Xn X-^ Xn „1-1 — X „1-1 n-1 1 „1-1 If we take out xi+1 — x\ from the i-th row for i G {1, 2,..., n — 1}, we obtain Vn(x!, X2, . . . , Xn) = (x2 - Xl) ■ ■ ■ (xn - Xl) 1 X2+Xi 1 Xn+Xi n—2 n—j—2 j E' Z_ŕ-'—n "t- n—j—2 3 Ll By subtracting from every column (starting with the last and ending with the second) x i -multiple of the previous column, we obtain 1 X2 + Xi n—2 n—j—2 j 1 Xn+X1 j = 0 Tl n-j-2 j 1 X2 1 X?) Specially, when k = £ = 1, m = nwe call the corresponding algebraic complementary minor the algebraic complement Aij of the element a{ j of the matrix A, which we met already in 2.2.4. 2.2.9. Laplace determinant expansion. If the principal minor \M\ of the matrix A is of the order k, then, \, directly from the definition of the determinant, each of the individual k\(n — k)\ terms in the product of \M\ with its algebraic complement is a term of \A\. In general, consider a square submatrix M, that is, a square matrix given by the rows ii < i2 < ■ ■ ■ < ik and columns ji < ■ ■ ■ < jk- Then using (ji — 1) H-----h (ik — k) exchanges of neighbouring rows and (ji — 1) H-----h (jk — k) exchanges of neighbouring columns in A we can transform this submatrix M into a principal submatrix and the complementary matrix gets transformed into its complementary matrix. The whole matrix A gets transformed into a matrix B satisfying (cf. 2.2.5 and the definition of the determinant) \B\ = (-l)a\A\,Whema = j:kh=1(ih+jh)-2(l+---+k). But (-1)Q = (-l'f with/3 = J2kh=i(ih +jh). Therefore we have checked: Proposition. If A is a square matrix of dimension n and \ M is its minor of the order k < n, then the product of any term of\M\ with any term of its algebraic complement is a term in the determinant \A\. This claim suggests that we could perhaps express the determinant of the matrix by using some products of smaller determinants. We see that \A\ contains exactly n\ distinct terms, exactly one for each permutation. These terms are mutually distinct as polynomials in the components of a general matrix A. If we can show that there are exactly that many mutually distinct expressions from the previous claim, we obtain the determinant A\ as their sum. It remains to show that the terms of the product M | ■ | M*\ contain exactly n\ distinct members from \A\. From the chosen k rows we can choose (£) minors M and using the previous lemma each of the k\(n — k)\ terms in the products of \M\ with their algebraic complements is a term in \A\. But for distinct choices of M we can never obtain the same terms and the individual terms in (-l)^+--+^+n+-+ji . \m\ ■ \m*\ are also mutually distinct. Therefore we have exactly the required number k\(n — k)! (™) = n\ of terms, and we have proved: Laplace theorem Theorem. LetA= (a^) be a square matrix of dimension n over arbitrary ring of scalars with k rows fixed. Then \ A\ is a sum of all (™) products (-1)21 +"'+lk +^ +"' ■ | M \ ■ | M* of minors of the order k chosen among the fixed rows with their algebraic complements. 89 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA Therefore Vn(x1,x2, ■■■,xn) = (x2 - Xl) ■ ■ ■ (xn - X!) Vn-l(x2, ■ ■ ■ , Xn) The Laplace theorem transforms the computation of \A\ into the computation of determinants of lower dimension. This method of computation is called the Laplace expansion along the chosen rows (or columns). For instance, the expansion along the i-th row or the j-th column is: Because it is clear that ^2 (Xn— 1, Xn) Xn Xn—i, it follows by induction that Vn(x1,X2, ■ ■ ■ ,Xn) = Y[ (Xj-Xi). Note that the determinant is non-zero whenever the numbers xi,..., xn are mutually distinct. □ Remark. Another (more beautiful?) proof of the formula can be found in 5.1.5. 2.B.8. Find whether or not the matrix (3 2 -1 2\ 4 1 2 -4 -2 2 4 1 V2 3 -4 8/ is invertible. Solution. The matrix is invertible (that is, there is an inverse matrix) whenever we can transform it by elementary row transformations into the unit matrix. That is equivalent for instance to the property that it has non-zero determinant. That we can compute using the Laplace Theorem (2.3.10) by expanding for instance the first row: 3 2-12 4 12-4 -2 2 4 1 2 3-48 1 2 -4 = 3- 2 4 1 3-4 8 -2- 4 -2 2 2 4 -4 8 -4 4 1 1 +(-!)■ -2 2 4 1 2 -2 2 4 2 3-4 =3 ■ 90 - 2 ■ 180 + (-1) ■ 110 - 2 ■ (-100) = 0, that is, the given matrix is not invertible. □ 2.B.9. Solve the system from 2.A.2 using the Cramer rule (see 2.2.6). \A I — ^^aijAij — cijjAj j=i i=i where A{j are the algebraic complements of the elements a{j (that is, minors of order one), as deduced in 2.2.4 already. In practical computations, it is often efficient to combine the Laplace expansion with a direct method of Gaussian elimination. 2.2.10. Proof of the Cauchy theorem. The theorem is *Sw based on a clever but elementary application of the Laplace theorem. We just use the Laplace expansion twice on a particular arrangement of a well chosen matrix. Consider first the following matrix H of dimension 2n (we are using the so-called block symbolics, that is, we write the matrix as if composed of the (sub)matrices A, B, and so on). / ai H ■ A 0 —E B -1 air, 0 0 0 bu 0 \ 0 bin \ 0 -1 bnl ... bnn J The Laplace expansion along the first n rows gives \H\ = \A\ ■ \B\. Now in sequence, we add linear combinations of the first n columns to the last n columns in order to obtain a matrix with zeros in the bottom right corner. We obtain / an K : «711 -1 V o ain Cll Cnl 0 Cln \ C-nr o o o J The elements of the submatrix on the top right part must satisfy cij = anbij + ai2b2j + ■ ■ ■ + ainbnj, that is, they are exactly the components of the product A ■ B and \K\ = \H\. The expansion of the last n columns gives us \K\ = {-\)n{-\)1+-+2n\A-B\ = {-\)2n ■ u3- Proof. We can expand a ■ u = (a + U) ■ u = a ■ u + 0 ■ u which, according to the axiom (CG4), implies 0 ■ u = 0. Now u + (-1) ■ u (=2) (1 + (-1)) ■ u = 0 ■ u = 0 and thus — u = (—1) ■ u. Further, . . _ , (V2,V3) . , a ■ (u + ( — 1) ■ v) = a ■ u + (—a) ■ v = a ■ u — a ■ v, which proves (3). It follows that (V2,V3) (a — b) ■ u = a ■ u + (—b) ■ u = a ■ u — b ■ u which proves (4). Property (5) follows using induction with (V2) and (VI). It remains to prove (1): a ■ 0 = a ■ (w — u) = a ■ u — a ■ u = 0, which along with the first derived proposition in this proof proves one implication. For the other implication, we use an axiom for the field of scalars, and axiom (V4) for vector spaces: if p ■ u = 0 and p / 0, then u = 1 ■ u = (p-1 -p])■ u = p~x ■ 0 = 0. □ 2.3.3. Linear (in)dependence. In paragraph 2.1.11 we worked with linear combinations of rows of a matrix. With vectors we work analogously: 93 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA and (yj)j^0 satisfying the given equation, that is, anxn+k + an-ixn+k-i + ■ ■ ■ + a0Xk = 0 anyn+k + an-iyn+k-i H----+ a0yk = 0. By adding these equations, we obtain an{xn+k + Vn+k) + an-l{xn+k-l + Vn+k-l) H-----h a0(xk +yk) = 0, therefore also the sequence (xj + yj)°^0 satisfies the given equation. Analogously, if the sequence (xj)J±0 satisfies the given equation, then also (uxj)JL0, where U G R. vi) No. The sum of two solutions of a non-homogeneous equation anXn+k + a-n-lX-n+k-l + ' ' ' + «C>2fc = C O-nVn+k + O-n-lVn+k-l H-----h a-oVk = c, c G R - {0} satisfies the equation an(2n+fc + yn+k) + an-l(xn+k-l + yn+k-i) H-----h a0(2fc + yk) = 2c, that is, it does not satisfy the original non-homogeneous equation. But the set of solutions forms an affine space, see 4.1.1. vii) It is a vector space if and only if c = 0. If we take two functions / and g from the given set, then (/ + g) (1) = (/ + g){2) = /(l) + = 2c. Thus if / + g is to be a member of the given set, it must be that (/ + g) (1) = c, therefore 2c = c, hence c = 0. □ 2.C.2. Find out, whether the set Ui = {(zi, 2:2,2:3) G R3; 21 I = I x2 I = I 23 |} is a subspace of a vector space R3 and the set U2 = {ax2 + c; a, c G R} a subspace of the space of polynomials of degree at most 2. Solution. The only property we have to check is whether the given subset is closed under linear combination of vectors in it, that is if it forms a vector space. The set Ui is not a vector (sub)space. We can see that, for instance, (1,1,1)+ (-1,1,1) = (0,2,2) i Ul Linear combination and independence An expression of the form ai vi + ■ ■ ■ + a^ is called a linear combination of vectors vi,..., G V. A finite sequence of vectors vi,... ,Vk is called linearly independent, if the only zero linear combination is the one with all coefficients zero. That is, for any scalars ai,..., aj; G K, ai i>i + ■ ■ ■ + afc = 0 implies ai = a2 = • • • = a*; = 0. It is clear that for an independent sequence of vectors, all vectors are mutually distinct and nonzero. The set of vectors M c Vina vector space V over K is called linearly independent, if every finite fc-tuple of vectors vi,..., Vk G M is linearly independent. The set of vectors M is linearly dependent, if it is not linearly independent. A nonempty subset M of vectors in a vector space over a field of scalars K is dependent if and only if one of its vectors can be expressed as a finite linear combination using other vectors in M. This follows directly from the definition. At least one of the coefficients in the corresponding linear combination must be nonzero, and since we are over a field of scalars, we can multiply whole combination by the inverse of this nonzero coefficient and thus express its corresponding vector as a linear combination of the others. Every subset of a linearly independent set M is clearly also linearly independent (we require the same conditions on a smaller set of vectors). Similarly, we can see that M C V is linearly independent if and only if every finite subset of M is linearly independent. 2.3.4. Generators and subspaces. A subset M C V is called a vector subspace if it forms, together with the restricted operations of addition and .: scalar multiplication, a vector space. That is, we require Va, b G K, Vw, w G M, a ■ v + b ■ w G M. We investigate a couple of cases: The space of m-tuples of scalars Rm with coordinate-wise addition and multiplication is a vector space over R, but also a vector space over Q. For instance for m = 2, the vectors (1, 0), (0,1) G R2 are linearly independent, because from a- (1,0) + b- (0,1) = (0,0) follows a = b = 0. Further, the vectors (1,0), {y/2, 0) G R2 are linearly dependent over R, because v7^ -(1,0) = (v7^, 0), but over Q they are linearly independent! Over R these two vectors "generate" a one-dimensional subspace, while over Q the subspace is "larger". Polynomials with real coefficients and of degree at most m form a vector space Rm [2]. We can consider the polynomials as mappings / : R —> R and define the addition and scalar multiplication like this: (/ + g)(x) = f(x)+ g(x), (a ■ f)(x) = a ■ f(x). 94 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA Solution. Because 1 2 3 1 1 0 -1 1 2 1 -1 3 0 0 3 2 The set [72 is a subspace (there is a clear identification with R2), because (aix2 + ci) + (a2x2 + c2) = (ai + a2) x2 + (ci + c2), fc ■ (ax2 + c) = (fca) x2 + kc for all numbers ai, ci, a2, c2, a, c, k G R. □ 2.C.3. Determine whether or not the vectors (1,2,3,1), (1,0, -1,1), (2,1, -1,3) and (0, 0,3,2) are linearly independent. 10^0, the given vectors are linearly independent. □ 2.C.4. Given arbitrary linearly independent vectors u, v, w, z in a vector space V, decide whether or not in V the vectors u — 2v, 3u + w — z, u — Av + w + 2z, 4i> + 8w + Az are linearly independent. Solution. Considered vectors are linearly independent if and only if the vectors (1, -2, 0,0), (3,0,1, -1), (1, -4,1,2), (0,4,8,4) are linearly independent in R4. We have 1-200 1-4 1 2 =-36^0' 0 4 8 4 thus the vectors are linearly independent. □ 2.C.5. The vectors (1,2,1), (-1,1,0), (0,1,1) are linearly independent, and therefore together form a basis of R3 (for basis it is important to give an order of the vectors). Every three-dimensional vector is therefore some linear combination of them. What linear combination corresponds to the vector (1,1,1), or equivalently, what are the coordinates of the vector (1,1,1) in the basis formed by the given vectors? Solution. We seek a, b, c G R such that a(l, 2,1) + b(-l, 1,0) + c(0,1,1) = (1,1,1). The equation must hold in every coordinate, so we have a system of three linear equations in three variables: a-b = 1 2a + b + c = 1 a + c = 1, Polynomials of all degrees also form a vector space R[x] (or Rqo [x]) and Rm [x] c R„ [x] is a vector subspace for any m < n < 00. Further examples of subspaces is given by all even polynomials or all odd polynomials, that is, polynomials satisfying /(—x) = ±/(x). In complete analogy with polynomials, we can define a vector space structure on a set of all mappings R —> R or of all mappings M —> V of an arbitrary fixed set M into the vector space V. Because the condition in the definition of subspace con-(Ji 1, sists only of universal quantifiers, the intersection of subspaces is still a subspace. We can see this also directly: Let Wi, i G 7, be vector subspaces in V, fS 1 a,b eK, u,v e nieIWi. Thena-u + b-v G Wt for all i e I. Hence a ■ u + b ■ v G rijg/Wi. It can be noted that the intersection of all subspaces W C V that contain some given set of vectors M C V is a subspace. It is called the linear span or linear hull of M and we write span M. We say that a set M generates the subspace spanM, or that the elements of M are generators of the subspace span M. We formulate a few simple claims about subspace generation: Proposition. For every nonempty set M C V, we have (1) span M = {ai ■ u\ + ■ ■ ■ + at ■ k G N, a, G K, Uj G M,j = l,...,k}; (2) M = span M if and only if M is a vector subspace; (3) if N C M then span N C span M is a vector subspace; the subspace span 0 generated by the empty subspace is the trivial subspace {0} C V. Proof. (1) The set of all linear combinations ai«i H----+ akuk on the right-hand side of (1) is clearly a vector subspace and of course it contains M. On the other hand, each of the linear combinations must be in span M and thus the first claim is proved. Claim (2) follows immediately from claim (1) and from the definition of vector space. Analogously, (1) implies most of the third claim. Finally, the smallest possible vector subspace is { 0 }. Notice that the empty set is contained in every subspace and each of them contains the vector 0. This proves the last claim. □ Basis and dimension A subset M C V is called a basis of the vector space V if span M = V and M is linearly independent. A vector space with a finite basis is called finitely dimensional. The number of elements of the basis is called the dimension ofV. If V does not have a finite basis, we say that V is infinitely dimensional. We write dim V = k, k G N or k = 00. 95 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA whose solution gives uso= \i^= ~\>c= |>thus we nave (1,1,1) = \- (1,2,1)- \- (-1,1,0) + ^-(0,1,1), that is, the coordinates of the vector (1,1,1) in the basis ((1,2,1), (-1,1,0), (0,1,1)) are (i,-i,i). □ 2.C.6. Determine all constants a e R such that the polyno-<2wj> mialsax2+x+2, — 2x2+ax+3Sindx2+2x+a are linearly dependent (in the vector space P3 [x] of polynomials of one variable of degree at most three over real numbers). Solution. In the basis 1, x, x2 the coefficients of the given vectors (polynomials) are (a, 1,2), (—2, a, 3), (1,2, a). Polynomials are linearly independent if and only if the matrix whose columns are given by the coordinates of the vectors has a rank lower than the number of the vectors. In this case the rank must be two or less. In the case of a square matrix, a rank less than the number of rows means that the determinant is zero. The condition for a thus reads a -2 1 1 a 2=0, 2 3a that is, a is a root of the polynomial a6 — 6a — 5 - -•2 — a — 5), thus there are 3 such constants a1 1-V21 l)(a2 a2=1-^,a3 (a + -- -1, □ 2.C.7. Consider the complex numbers C as a real vector space. Determine the coordinates of the number 2 + i in the basis given by the roots of the polynomial x2 + x + 1. Solution. Because roots of the given polynomial are — \ + i^- and — I — i^-, we have to determine the coordinates (a,b) of the vector 2 + i in the basis (-| + i^, - \ -i^). These real numbers a, b are uniquely determined by the condition , 1 -v^ , , 1 .Vs. n . a ■ (---\-i—) + b ■ (---1—) = 2 + i. K 2 2 ' K 2 2 ' By equating separately the real and the imaginary parts of the equation, we obtain a system of two linear equations in two variables: --a--b = 2 2 2 y/3, -a--b = 1. 2 2 The solution gives usa = —2 + -^,& = — 2 — therefore the coordinates are (—2 + —2 — ^=). □ In order to be satisfied with such a definition of dimension, we must know that different bases of the same space will always have the same number of elements. We shall show this below. But we note immediately, that the trivial subspace is generated by the empty set, which is an "empty" basis. Thus it has dimension zero. The linearly independent vectors (0,...,1,...,0) G 1,. (all zeros, but one value 1 at the i-th position) are the most useful example of a basis in the vector space K". We call it the standard basis of Kn. 2.3.5. Linear equations again. It is a good time now to re-call the properties of systems of linear equation in terms of abstract vector spaces and their bases. As we have already noted in the introduction to this section (cf. 2.3.1), the set of all solutions of the homogeneous system A ■ 0 is a vector space. If A is a matrix with m rows and n columns, and the rank of the matrix is k, then using the row echelon transformation (see 2.1.7) to solve the system, we find that the dimension of the space of all solutions is exactly n — k. Indeed, the left hand side of the equation can be understood as the linear combination of the columns of A with coefficients given by x and the rank k of the matrix provides the number of linearly independent columns in A, thus the dimension of the subspace of all possible linear combinations of the given form. Therefore, after transforming the system into row echelon form, exactly m — k zero rows remain. In the next step, we are left with exactly n — k free parameters. By setting one of them to have value one, while all others are zero, we obtain exactly n — k linearly independent solutions. Then all solutions are given by all the linear combinations of these n — k solutions. Every (n — fc)-tuple of linearly independent solutions is called a. fundamental system of solutions of the given homogeneous system of equations. We have proved: Proposition. TTje set of all solutions of the homogeneous system of equations A ■ x = 0 for n variables with the matrix A of rank k is a vector sub-space in Kn of dimension n — k. Every basis of this space forms a fundamental system of solutions of the given homogeneous system. Next, consider the general system of equations A-x = b. Notice that the columns of the matrix A are actually images of the vectors of the standard basis in I" under the mapping assigning the vector A ■ x to each vector x. If there should be a solution, b must be in the image under this mapping and thus it must be a linear combination of the columns in A. 96 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.C.8. Remark. As a perceptive reader may have spotted, the problem statement is not unambiguous - we are not given the order of the roots of the polynomial, thus we do not have the order of the basis vectors. The result is thus given up to the permutation of the coordinates. We add a remark about rationalising the denominator, that is, removing the square roots from the denominator. The authors do not have a distinctive attitude whether this should always be done or not (Does ^ look better than ^?)- In some cases the rationalising is undesirable: from the fraction -j= we can immediately spot that its value is a little greater than 1 (because V35 is just a little smaller than 6), while for the rationalised fraction we cannot spot anything. But in general the convention is to normalize. 2.C.9. Consider complex numbers C as a real vector space. Determine the coordinates of the number 2 + i in the basis given by the roots of the polynomial x2 — x + 1. O 2.CIO. For what values of the parameters a,b,c e R are the vectors (1,1, a, 1), (1, b, 1,1), (c, 1,1,1) linearly dependent? o 2.C.11. Let a vector space V be given along with a basis formed by the vectors u, v, w, z. Determine whether or not the vectors u — 3i> + z, v — 5w — z, 3w — 7z, u — w + z are linearly independent. O 2.C.12. Complete the vectors 1 - x2 + x3,1 + x2 + x3,1 -x — x3 to a basis of the space of polynomials of degree at most 3. O 2.C.13. Do the matrices 0 -1 form a basis of the vector space of square two-dimensional matrices? Solution. The four given matrices are as vectors in the space of 2 x 2 matrices linearly independent. It follows from the fact that the matrix /I 1 -5 1\ 0 4 0 -2 1 0 3 0 V-2 -1 0 3/ If we extend the matrix A by the column b, the number of linearly independent columns and thus also rows might increase (but does not have to). If this number increases, then b is not in the image and the system of equations does not have a solution. If on the other hand the number of linearly independent rows does not change after adding the column b to the matrix A, it means that b must be a linear combination of the columns of A. Coefficients of such combinations are then exactly the solutions of our system. Consider now two fixed solutions x and y of our system and some solution z of the homogeneous system with the same matrix. Then clearly A-(x-y) = b-b = 0 A ■ (x + z) = b + 0 = b. Thus we can summarise in the form of the so called Kronecker-Capelli theorem1: Kronecker-Capelli Theorem Theorem. The solution of a non-homogeneous system of linear equations A ■ x = b exists if and only if adding the column b to the matrix A does not increase the number of linearly independent rows. In such a case the space of all solutions is given by all sums of one fixed particular solution of the system and all solutions of the homogeneous system that has the same matrix. 2.3.6. Sums of subspaces. Since we now have some intuition about generators and the subspaces generated by them, we should understand the possi-'^--^ bilities of how some subspaces can generate the whole space V. Sum of subspaces Let Vi, i G I be subspaces of V. Then the subspace generated by their union, that is, span \JiejVi, is called the sum of subspaces Vi. We denote it as W = J2ieI Vi. Notably, for a finite number of subspaces V\,..., Vk C V we write W = Vi H-----h Vk = span(Vi U V2 U ■ ■ ■ U Vk). We see that every element in the considered sum W can be expressed as a linear combination of vectors from the sub-spaces Vi. Because vector addition is commutative, we can aggregate summands that belong to the same subspace and for a finite sum of k subspaces we obtain Vi + V2 + ■ ■ ■ + Vk = {vi + ■ ■ ■ +vk; vt eVt,i = l,...,k}. 1A common formulation of this fact is "system has a solution if and only if the rank of its matrix equals the rank of its extended matrix". Leopold Kro-necker was a very influential German Mathematician, who dealt with algebraic equations in general and in particular pushed forward Number Theory in the middle of 19th century. Alfredo Capelli, an Italian, worked on algebraic identities. This theorem is equally often called by different names, e.g. Rouche-Frobenius theorem or Rouche-Capelli theorem etc. This is a very common feature in Mathematics. 97 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA is invertible (which is by the way equivalent to any of the following claims: its rank equals its dimension; it can be transformed into the unit matrix by elementary row transformations; it has the inverse matrix; it has a non-zero determinant (equal to 116); it stands for a system of homogeneous linear equations with only zero solution; every non-homogeneous linear system with left-hand side given by this matrix has a unique solution; the range of a linear mapping given by this matrix is a vector space of dimension 4 - this mapping is in-jective). □ 2.C.14. In the vector space dimensional subspaces U = span{ui, u2,u3}, while we are given three- V = span^i,^,^}, f1) f1) f1) 1 1 0 i lil = 1 0 i -i w w v) v2 = (1, -1,1, -1)T, v3 = (1, -1, -1,1)T. Determine the dimension and find a basis of the subspace U n V. Solution. The subspace U n V contains exactly the vectors that can be obtained as a linear combinations of vectors Ui and also as a linear combination of vectors v{. Thus we search for numbers x1, x2, x3,y±, y2, y3 £ R such that the following holds: H H ŕ M f1} i 1 0 i -1 -1 i +X2 0 +X3 1 = yi -i +V2 1 +V3 -1 w v) W \-y w that is, we are looking for a solution of a system x1 + Xi + X\ x2 x2 x2 + x3 = Vi + V2 + y-3, = Vi — V2 — V3, + x3 = -yi + V2 — V3, + x3 = -Vi - V2 + y-3- Using matrix notation of this homogeneous system (and preserving the order of the variables) we have f1 1 1 \o 0 0 1 1 0 1 1 0 -1 -1 -1 1 1 -1 1 1 1 -1 -1 0 0 2 1 1 -1\ 1 1 -V 2 2 0 2 1 -V The sum W = Vi + ■ ■ ■ + Vk C V is called the direct sum of subspaces if the intersection of any two is trivial, that is, Vi nVj = {0} for alH ^ j. We show that in such a case, every vector w e W can be written in a unique way as the sum w = vi H-----\-vk, where v{ e Vi. Indeed, if we could simultaneously write w as w = v[ + ■ ■ ■ + v'k, then 0 = w — w = (vi — v[) H-----h (vk — v'k). If vi — v\ is the first nonzero term of the right-hand side, then this vector from Vi can be expressed using vectors from the other subspaces. This is a contradiction to the assumption that Vi has zero intersection with all the other subspaces. The only possibility is then that all the vectors on the right-hand side are zero and thus the expression of w is unique. For direct sums of subspaces we write w = v1®---®vk = ®*=1vi. 2.3.7. Basis. Now we have everything prepared for understanding minimal sets of generators as we understood them in the plane R2 and to prove the promised indepence of the number of basis elements on any choices. A basis of a fc-dimensional space will usually be denoted as a fc-tuple v_= (yi..., vk) of basis vectors. This is just a matter of convention: with finitely dimensional vector spaces we shall always consider the bases along with a given order of the elements, even if we have not defined it that way (strictly speaking). Clearly, if (yi,..., vn) is a basis of V, then the whole space V is the direct sum of the one-dimensional subspaces V = span{«i} © • • • © span{w„}. An immediate corollary of the derived uniqueness of decomposition of any vector w in V into the components in the direct sum gives a unique decomposition W = XiVi + ■ ■ ■ + xnvn. This allows us, after choosing a basis, to see the abstract vectors again as n-tuples of scalars. We shall return to this idea in paragraph 2.3.11, when we finish the discussion of the existence of bases and sums of subspaces in the general case. 2.3.8. Theorem. From any finite set of generators of a vector space V we can choose a basis. Every basis of a finitely dimensional space V has the same number of elements. Proof. The first claim is easily proved using induction jji,, on the number of generators k. ~~ Only the zero subspace does not need a generator and thus we are able to choose an empty basis. On the other hand, we are not able to choose the zero vector (the generators would then be linearly dependent) and there is nothing else in the subspace. In order to have our inductive step more natural, we deal with the case k = l first. We have V = span{w} and u/0, 98 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA A 1 1 -1 -1 - -A 0 1 1 1 1 - -1 0 0 -1 0 2 2 \o 0 1 3 1 A 1 1 -1 -1 -1\ 0 1 1 1 1 -1 0 0 1 0 -2 -2 \o 0 0 1 1 1 J A 1 1 0 0 0 \ 0 1 1 0 0 -2 0 0 1 0 -2 -2 \o 0 0 1 1 O A 0 0 0 0 2 \ 0 1 0 0 2 0 0 0 1 0 -2 -2 \o 0 0 1 1 1 / We obtain a solution x1 = -2t, x2 = -2s, 23 = 2s + 2t, yi = -s - i, y2 = s, y3 = f, t,seR. We obtain a general vector of the intersection by substituting /x1+x2+x3\ I 0 \ x\ + x2 —2t — 2s x1 + x3 2s \ x2 + x3 J \ It ) We see that dim U n V = 2, [/ n V = spanj 1 1 V0 / 1 o V 1 / □ 2.C.15. Let there be in R3 two vector spaces U and V generated by the vectors (1,1, -3), (1, 2,2) and (1,1, -1), (1, 2,1), (1, 3,3), respectively. Determine the intersection of these two sub-spaces. Solution. According to the definition of intersection, the vectors in the intersection are in both, the span of the vectors (1,1,-3), (1,2,2), as well as in the span of the vectors (1,1, —1), (1,2,1), (1,3,3). It helps to consider first the geometry. Firstly, U is spanned by two linearly independent vectors. So U is a plane in R3. Next, V is spanned by three vectors. But these are linearly dependent since 0. 1 1 1 1 1 -1 1 2 3 = 1 2 1 -1 1 3 1 3 3 So V is also a plane. because {v} is a linearly independent set of vectors. Then {v} is also a basis of the vector space V and any other vector is a multiple of v, so all bases of V must contain exactly one vector, which can be chosen from any set of generators. Assume that the claim holds for k = n and consider V = span{«i,..., vn+1}. If v1,..., vn+1 are linearly independent, then they form a basis. If they are linearly dependent, there exists i such that Vi = aivi H-----h cii-iVi-i + ai+ivi+i H-----h an+ivn+i. Then V = span{i>i,..., t^-i, vi+i,..., vn+i} and we can choose a basis, using the inductive assumption. In remains to show that bases always have the same number of elements. Consider a basis v = (v1,..., vn) of the space V and for an arbitrary nonzero vector u, consider u = aivi + ■ ■ ■ + anvn £ V with di ^ 0 for some i. Then Vi = — (u-(a1v1-\-----\-ai-1vi-1+ai+1vi+1-\-----\-anvn)) a-i and therefore also span{u, vi,..., Vi-i,vi+i,..., vn} = V. We show that this is again a basis. For if adding u to the linearly independent vectors v1,..., vi-1,vi+1,..., vn leads to a set of linearly dependent vectors, then V = span{«i,... ,Vi-Uvi+1,. . .,vn}, which implies a basis of n — 1 vectors chosen from v, which is not possible. Thus we have proved that for any nonzero vector u e V there exists i, 1 < i < n, such that (u, i>i,..., Vi-i,vi+i,..., vn) is again a basis of V. Similarly, instead of one vector u, we can consider a linearly independent set «i,..., We will sequentially add ui, u2,..., always exchanging for some v{ using our previous approach. We have to ensure that there always is such v{ to be replaced (that is, that the vectors u{ will not consequently replace each other). Assume thus that we have already placed «i,..., ui instead of some v/s. Then the vector ui+1 can be expressed as a linear combination of the latter vectors u{ and the remaining Vj's. As we have seen, ui+1 may replace any vector with non-zero coefficient in this linear combination. If only the coefficients at ui,..., ui were nonzero, then it would mean that the vectors «i,..., ui+1 were linearly dependent, which is a contradiction. Summarizing, for every k < n we can arrive after k steps at a basis in which k vectors from the original basis were exchanged for the new Ui's. If k > n, then in the n-th step we would obtain a basis consisting only of new vectors u{, which means that the original set could not be linearly independent. In particular, it is not possible for two bases to have a different number of elements. □ In fact, we have proved a much stronger claim, the Steinitz exchange lemma: 99 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA If the vector (27, x2, x3) lies in U, then (x1,x2,x3) = A(l,l,—3) + /i(l,2,2) for some scalars A, fi. Similarly (27,2:2,2:3) lies in V, so (2:1,2:2,3:3) = a(l,l,-l) + 0(1,2,1) + 7(1,3,3) for scalars q,/3, 7. When written in full, this is a set of six equations in eight unknowns. Solving these is possible but can be quite cumbersome. Some simplification is obtained as follows: The first three equations, which describe U are 27 = A + fi x2 = A + 2/i 2:3 = —3A + 2/i If we solve these three equations for the two "unknowns" A and [i, (which in any case we do not want), or alternatively if we eliminate A and fi, from these equations, we obtain the single equation 8x1 — 5x2 + x3 = 0 to replace the first three. The second set of three equations, which describe V are 27 = a + (3 + 7 x2 = a + 2/3 + 7 2:3 = — a + (3 + 37 If we solve these three equations for the three "unknowns" a (3 and 7, (which in any case we do not want), or alternatively if we eliminate a j3 and 7, from these equations, we obtain the single equation 327— 22:2+2:3 = 0 to describe V. Introducing the parameter t, it is straightforward write the solution as the line (27, x2, x3) = t(3, 5,1). □ Now we move to unions of vector spaces. There is a simple algorithm, how to chose the maximal linearly independent set of vectors out of a given set of vectors. Write the given vectors as columns in a matrix. Then tranform the matrix with row tranformation into the row echelon forms. The vectors, who correspond to the columns, where the "stairs" begin, form a maximal linearly independent set. To justify this, just think of the system of linear equations describing that a linear combination of the given vectors is zero. The matrix of the system is exactly the one described. If you put in the system only vectors corresponding to columns where are stairs, you get a system which can be transformed into a one in row echelon form with non-zero numbers on the diagonal and thus only solution of the systems are zeros, that is the vectors are linearly independet. Similarly, the system together with any Steinitz Exchange Lemma For every finite basis v of a vector space V and every set of linearly independent vectors u{, i = 1,..., k in V we can find a subset of the basis vectors Vi which will complete the set of U0S into a new basis. 2.3.9. Corollaries of the Steinitz lemma. Because of the possibility of freely choosing and replacing basis vectors we can immediately derive nice (and intuitively expectable) properties of bases of vector spaces: Proposition. (1) Every two bases of a finite dimensional vector space have the same number of elements, that is, our definition of dimension is basis-independent. (2) If V has a finite basis, then every linearly independent set can be extended to a basis. (3) A basis of a finite dimensional vector space is a maximal linearly independent set of vectors. (4) The bases of a vector space are the minimal sets of generators. A little more complicated, but now easy to deal with, is the situation of dimensions of subspaces and their sums: Corollary. Let W, Wx, W2 C V be subspaces of a space V of finite dimension. Then (1) dim W < dim V, (2) V = W if and only if dim V = dim W, (3) dim Wx + dim W2 = Aiia(Wx + W2)+ dim(Wx n W2). Proof. It remains to prove only the last claim. This is evident if the dimension of one of the spaces is zero. Assume dim W\=r>\, dim W2 = s > -1. 'liiPCj'' 1 anc^ (Wl • • •, wt) be a basis of W\ n W2 (or ■S-3«^ " empty set, if the intersection is trivial). According to the Steinitz exchange lemma this basis of the intersection can be extended to a basis ..., wt, ut+x..., ur) for Wx and to a basis (wx ■ ■ ■, wt,vt+1,..., vs) for W2. Vectors Wx,..., wt,ut+1,..., ur, vt+1 ...,vs clearly generate W\ + W2. We show that they are linearly independent. Let axwx H----+ atwt + bt+xut+x + ■ ■ ■ ----h brur + ct+1vt+1 H-----h csvs = 0. Then necessarily - (ct+x ■ vt+x H-----Vcs-vs) = = a1-w1-\-----h at ■ wt + bt+1 ■ ut+1 H-----h br ■ ur must belong to W2 n Wx ■ This implies that h+x = ■ ■ ■ = br = 0, since this is the way we have denned our bases. Then also ax ■ wx H-----\-at-wt + ct+1 ■ vt+1 H-----h cs ■ vs = 0 100 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA of the vectors which correspond to "no stair" columns has a lower rank than the number of variables (coefficients of a linear combination), thus according to 2.3.5 has a nontrivial (non-zero) solution. 2.C.16. Determine the vector subspace (of the space R4) generated by the vectors u\ = (—1,3,-2,1), u2 = (2,-1,-1,2), u3 = (-4,7,-3,0), «4 = (1,5,-5,4), by choosing a maximal set of linearly independent vectors u{ (that is, by choosing a basis). Solution. Write the vectors u{ into the columns of a matrix and transform it using elementary row transformations. This way we obtain /-l 2 -4 3-17 -2 -1 -3 V1 A 0 0 0 Í1 0 0 2 4 -7 3 2 1 0 0 0 -4 7 -3 1\ 5 -5 4/ 4\ 5 -7 3/ /i -1 3 -2 2 2 -1 -1 4\ 1 5 -V o -i 0 0 4 \ 5/4 -1/4 0 / 0 -1 -1 0 0 ® 0 0 0 -4 7 -3 4\ 5/4 1 0 ) 2 o' © 0 0 / o \ 0 (l 0 0 (® 0 0 V o And (according to the algorithm) it follows that the vectors corresponding to the columns with circled elements, namely vectors ui, u2 and u4 form a maximal linearly independent set. □ Remark. Note, that the maximal set of linearly independent vectors is not unique. Unique is only the number of vectors in it (the dimension of the vector space generated by the given vectors). For example from vectors (1, 0), (0,1), (1,1) you can pick any two to form a maximal linearly independent set, from vectors (1,0), (2,0), (0,1). This fact is also reflected in the algorithm, becouse it is independent of an order, in which you put the given vectors as columns in a matrix. 2.C.17. Find a basis of the subspace rA 2\ (° !\ (-1 0 U = span} 3 4,23,1 2 L \5 6/ \4 5/ \ 3 4 of the vector space of real matrices 3x2. Extend this basis to a basis of the whole space. and because the corresponding vectors form a basis W2, all the coefficients are zero. The claim (3) now follows by directly counting the generators. □ 2.3.10. Examples. (1) I" has (as a vector space over K) dimension n. The 7i-tuple of vectors ((1,0,...,0),(0,1,...,0)...,(0,...,0,1)) is clearly a basis, called the standard basis o/K". Note that in the case of a finite field of scalars, say with k prime, the whole space K" has only a finite number kn of elements. (2) C as a vector space over R has dimension 2. A basis is for instance the pair of numbers 1 and i, or any other two complex numbers which are not a real multiple of each other, eg. 1 + i and l — i. (3) Km[a;], that is, the space of all polynomials with coefficients in K of degree at most m, has dimension m + 1. A basis is for instance the sequence 1, x, x2,..., xm. The vector space of all polynomials K[x] has dimension oo, but we can still find a basis (although infinite in size): (4) The vector space R over Q has dimension oo. It does not have a countable basis. (5) The vector space of all mappings / : R —> R has also dimension oo. It does not have any countable basis. 2.3.11. Vector coordinates. If we fix a basis (v1,..., vn) of '-i0%k' a fi^te dimensional space V, then every vector \-4^J w eV can be expressed as a linear combination -'i^fcii^t v = aiVl -----Mn in a unique way. Indeed, '^fe-f^s-j— assume that we can do it in two ways: w = ai«i H-----h anvn = b1v1 H-----h bnvn. Then 0 = (ai — &i) ■ i>i H-----h (a„ - bn) ■ vn and thus a{ = b{ for all i = 1,..., n, because the vectors v{ are linearly independent. We have reached the concept of coordinates: Coordinates of vectors Definition. The coefficients of the unique linear combination expressing the given vector w e V in the chosen basis v = (vi,..., vn) are called the coordinates of the vector w in this basis. Whenever we speak about coordinates (ai,..., an) of a vector w, which we express as a sequence, we must have a fixed ordering of the basis vectors v = (vi,..., vn). Although we have defined the basis as a minimal set of generators, in reality we work with them as with sequences (that is, with ordered sets). 101 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA Solution. Recall that a basis of a subspace is a set of linearly independent vectors which generate given subspace. By writing the entries of the matrices in a row, we can consider the matrices as vectors in R6. In this way, the four given matrices can be identified with the rows of the matrix / 1 2 3 4 5 6^ 0 1 2 3 4 5 -1 0 12 3 4 \-2 -10 12 3/ It is easy to show that this matrix has rank 2, and hence that the subspace U is generated just by the first two matrices, which consequently form a basis for U. In fact, it follows easily that /-I 0\ /I 2\ /0 1\ 1 2 = -1-3 4+2-2 3 V 3 4/ \5 6/ \4 5/ + 3- '0 1\ 2 3 .4 5) There are many options for extending this basis to be a basis for the whole space. One option is to choose the first two of the given matrices together with the last four (actually, any four would do) of the six linearly independent matrices (1 ON 0 0 \o o) 0 0 1 0 0 0 ON /0 0 0,0 o 0/ \0 1 . Linear independence of these six matrices is established by computing 1^0. Clearly the dimension is 6, so spanning is automatic, and hence we have a basis. □ How can we describe simple mappings analytically? For example,how can we describe a rotation, an axial symmetry, a mirror symmetry, a projection of a three-dimensional space onto a two-dimensional one in the plane or in the space? How can we describe the scaling of a diagram? What do they have in common? These all are linear mappings. This means that they preserve a certain structure of the space or a subspace. What structure? The structure of a vector space. Every point in the plane is described by two coordinates, every point in the 3-dimensional space is described by three coordinates. If we fix the origin, then it makes sense to say that a point is in some direction twice that far from the origin as some other point. Assigning coordinates to vectors A mapping assigning the vector v = a1v1 + ■ ■ ■ + anvn to its coordinates in the basis v will be denoted by the same symbol v : V —> K". It has the following properties: (1) v_(u + w) = v{u) + v_{w)\ Vu, w E V, (2) v(a-u) = a- v(u); Va G K, Vu G V. Note that the operations on the two sides of these equations are not identical. Quite the opposite; they are operations on different vector spaces! S Sometimes it is really useful to understand vectors as mappings from fixed set of independent generators to coordinates (without having the generators ordered). In this way, we may think about the basis M of infinite dimensional vector spaces V. Even though the set M will be infinite, there can be only a finite number of non-zero values for any mapping representing a vector. The vector space of all polynomials Koo a good example. with the basis M = {1, }is 2.3.12. Linear mappings. The above properties of the assignments of coordinates are typical for what we f 7 have called linear mappings in the geometry of ká-~:^ the plane R2. For any vector space (of finite or infinite dimension) we define "linearity" of a mapping between spaces in a similar way to the case of the plane I J2. Linear mappings Let V and W be vector spaces over the same field of scalars K. The mapping / : V —> W is called a linear mapping, or homomorphism, if the following holds: (1) f(u + v) = f(u) + f(v), Vu,veV (2) f(a ■ u) = a ■ f(u), Va G K, Vu G V. We have seen such mappings already in the case of matrix multiplication: x h-> A ■ with a fixed matrix A of the type ra/n over K. The image of a linear mapping, Im/ = /(V) C W, is always a vector subspace, since for any set of vectors Ui, the linear combination of images is the image of the linear combination of the vectors Ui with the same coefficients. Analogously, the set of all vectors Ker / = /_1({0}) C V is a subspace, since the linear combination of zero images will always be a zero vector. The subspace Ker / is called the kernel of the linear mapping f. A linear mapping which is a bijection is called an isomorphism. Analogously to the abstract definition of vector spaces, it is again necessary to prove seemingly trivial claims that follow from the axioms: 102 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA We also know where arrive at if we translate or shift by some amount in a given direction and then by some other amount in another direction. These properties can be formalized -we speak of vectors in the plane or in space, and we consider their multiplication and addition. Linear mappings have the property that the image of a sum of vectors is a sum of the images of the vectors. The image of a multiple of a vector is the same multiple as the image of the vector. These properties are shared among the mappings stated at the beginning of this paragraph. Such a mapping is then uniquely determined by its behaviour on the vectors of a basis. (In the plane, a basis consists of two vectors not on the same line. In space a basis consists of three vectors not all in the same plane). How can we write down some linear mapping / on a vector space VI For simplicity, we start with the plane R2. Assume that the image of the point (vector) (1,0) is (a, b) and the image of the point (vector) (0,1) is (c, d). This uniquely determines the image of an arbitrary point with coordinates (u, v): f((u, v)) = /Ml, 0)+v(0,1)) = uf(l, 0)+ vf(l, 0) = (ua, ub) + (yc, vd) = (au + cv, bu + dv). This can be written down more efficiently as follows: a c\ I u b d [v au + cv bu + dv A linear mapping is thus a mapping uniquely determined (in a fixed basis) by a matrix. Furthermore, when we have another linear mapping g given by the matrix ^ "Q, then we can easily compute (an interested reader can fill in the details by himself) that their composition gofis given by the matrix a b\ Ie f \ _ (ae + fc be + df c d) \g h) \ag + ch bg + dh This leads us to the definition of matrix multiplication in exactly this way. That is, an application of a mapping on a vector is given by the matrix multiplication of the matrix of the mapping with the given vector, and that the mapping of a composition is given by the product of the corresponding matrices. This works analogously in the spaces of higher dimension. Further, this again shows what has already been proven in (2.1.5), namely, that matrix multiplication is associative but not commutative, just as with mapping composition. That is another motivation to study vector spaces. Recall that already in the first chapter we worked with the matrices of some linear mappings in the plane R2, notably with the rotation around a point and with axial symmetry (see 1.5.8 and 1.5.9). Proposition. Let f : V —> W be a linear mapping between two vector spaces over the same field of scalars K. The following is true for all vectors u, ui,..., uk £ V and scalars ai,..., ak £ K m /(o) = o, (2) f(-u) = -f(u), (3) /(ai-uH-----\-ak-uk) = ai-/(«i)H-----\-ak-f(uk), (4) for every vector subspace V\ C V, its image f(V{) is a vector subspace in W, (5) for every vector subspace W\ C W, the set f~1(Wi) = {v £ V; f(v) £ Wi} is a vector subspace in V. Proof. We rely on the axioms, definitions and already proved results (in case you are not sure what has been used, look it up!): /(0) = f(u -u) = /((l - 1) ■ u) = 0 ■ f(u) = 0, /(-«) =/((-1)-U) = (-1). f{u) = -f{u). Property (3) is derived easily from the definition for two summands, using induction on the number of summands. Next, (3) implies span /(Vi) = /(Vi), thus it is a vector subspace. On the other hand, if j(u) £ W\ and j(y) £ W\ then for any scalars we arrive at f(a ■ u + b ■ v) = a - j(u) + b-f(v)eW!. □ 2.3.13. Proposition (Simple corollaries). (1) The composition g o / : V —> Z of two linear mappings f : V —> W and g : W —> Z is again a linear mapping. (2) The linear mapping f : V —> W is an isomorphism if and only if Imf = W and Ker/ = {0} £ V. The inverse mapping of an isomorphism is again an isomorphism. (3) For any two subspaces V\, V2 £ V and linear mapping f:V^W, f{V1 + V2) = f{V1) + f{V2), /(Vinv2) c/(Vi)n/(v2). (4) The "coordinate assignment" mapping u : V —> Kn given by an arbitrarily chosen basis u = (ui,... ,un) of a vector space V is an isomorphism. (5) Two finitely dimensional vector spaces are isomorphic if and only if they have the same dimension. (6) The composition of two isomorphisms is an isomorphism. Proof. Proving the first claim is a very easy exercise ! W. For every choice of basis u = (ui,..., un) on V, v = (yi,... ,vn) onW there are the following linear mappings as shown in the diagram: V - f ■w The bottom arrow /„_„ is defined by the remaining three, i.e. the composition of linear mappings fu,v = v o / o yT1. Matrix of a linear mapping Every linear mapping is uniquely determined by its values on an arbitrary set of generators, in particular, on the vectors of a basis u. Denote by f(ui) = an ■ wi + a21 -v2 H-----h amlvm j(u2) = a\2 ■ vi + a22 ■ v2 H-----h am2vm f(un) = aln ■ v1 + a2n ■ v2 H-----h amnvm, that is, scalars a{j form a matrix A, where the columns are coordinates of the values f(uj) of the mapping / on the basis vectors expressed in the basis v on the target space W. A matrix A = {ai3) is called the matrix of the mapping f in the bases u, v. For a general vector u = xiui + ■ ■ ■ + xnun e V we calculate (recall that vector addition is commutative and distributive with respect to scalar multiplication) f(u) = x1f(u1) H-----h xnf(un) = x1(a11v1-\-----\-amlvm) H-----h xn(alnV!-\----) = (ziaiiH-----hx„ain)wi H-----h (xiamiH----)vm. Using matrix multiplication we can now very easily and clearly write down the values of the mapping fu,v(w) defined uniquely by the previous diagram. Recall that vectors in Ke are understood as columns, that is, matrices of the type l/l fu,v{u{w)) = v(f(w)) = A ■ u(w). On the other hand, if we have fixed bases on V and W, then every choice of a matrix A of the type ra/n gives a unique linear mapping K" —> Km and thus also a mapping 104 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.C.19. Find the matrix of the rotation in the positive sense by the angle tt/3 about the line passing through the origin with the oriented directional vector (1,1,0) under the standard basis R3. Solution. The given rotation is easily obtained by composing these three mappings: • rotation through the angle tt/4 in the negative sense about the axis z (the axis of the rotation goes over on the x axis); • rotation through the angle 7r/3 in the positive sense about the x axis; • rotation through the angle 7r/4 in the positive sense about the z axis (the x axis goes over on the axis of the rotation). The matrix of the resulting rotation is the product of the matrices corresponding to the given three mappings, while the order of the matrices is given by the order of application of the mappings - the first mapping applied is in the product the rightmost one. Thus we obtain the desired matrix A 0 0 . 0 1 2 2 vs 1 2 2 f \ ~ I / 2 / V2 V2 2 2 s/2 s/2 2 2 0 0 Note that the resulting rotation could be also obtained for instance by taking the composition of the three following mappings: • rotation through the angle tt/4 in the positive sense about the axis z (the axis of rotation goes over on the axis y); • rotation through the angle tt/3 in the positive sense about the axis y; • rotation through the angle tt/4 in the negative sense about the axis z (the axis y goes over to the axis of rotation). Analogously we obtain 2 0 V3 0 1 0 0 2 ICO 1 4 4 1 3 4 V6 4 V6 4 4 / : V —> W. We have found the bijective correspondence between matrices of the fixed types (determined by dimensions of V and W) and linear mappings V —> W. 2.3.15. Coordinate transition matrix. If we choose V = W to be the same space, but with two different bases u, v, and consider the identity mapping for /, then the approach from the previous paragraph expresses the vectors of the basis u in coordinates with respect to the basis v. Let the resulting matrix be T. Thus, we are applying the concept of the matrix of a linear mapping to the special case of the identity mapping idy. V- V T=(idv)„,„ IK" ..................................> IK" The resulting matrix T is called the coordinate transition matrix for changing the basis from u to the basis v. The fact that the matrix T of the identity mapping yields exactly the transformation of coordinates between the two bases is easily seen. Consider the expression of u with the basis u u = xiui H-----h xnun, and replace the vectors u{ by their expressions as linear combinations of the vectors v{ in the basis v. Collecting the terms properly, we obtain the coordinate expression x = (xi,..., i„) of the same vector u in the basis v. It is enough just to reorder the summands and express the individual scalars at the vectors of the basis. But this is exactly what we do when forming the matrix for the identity mapping, thus x = T ■ x. We have arrived at the following instruction for building the coordinate transition matrix: Calculating the matrix for changing the basis Proposition. The matrix T for the transition from the basis u to the basis v is obtained by taking the coordinates of the vectors of the basis u expressed in the basis v and writing them as the columns of the matrix T. The new coordinates x in terms of the new basis v are then x = T ■ x, where x is the coordinate vector in the original basis u. Because the inverse mapping to the identity mapping is again the identity mapping, the coordinate transition matrix is always invertible and its inverse T-1 is the coordinate transition matrix in the opposite direction, that is from the basis v to the basis u (just have a look at the diagram above and invert all the arrows). 2.3.16. More coordinates. Next, we are interested in the matrix of a composition of the linear mappings. U Qip Thus, consider another vector space Z over K J of dimension k with basis w, linear mapping g : W —> Z and denote the corresponding matrix by g^w- 105 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.C.20. Matrix of general rotation in R3. Derive the matrix of a general rotation in R3. Solution. We can do the same things as in the previous example with general values. Consider an arbitrary unit vector (x, y, z). Rotation in the positive sense by the angle p about this vector can be written down as a composition of the following rotations whose matrices we already know: i) rotation 7l± in the negative sense about the z axis through the angle with cosine equal to , that is, x/VT with sine under which the line with the direc- xj\J x2 + y2 y/Vl - z2, tional vector (x, y, z) goes over on the line with the directional vector (0, y, z). The matrix of this rotation is / x/Vl - z2 y/Vl - z2 0\ i?i = -y/VT^ x/VT^z2" 0 , V 0 0 1/ ii) rotation TZ2 in the positive sense about the y axis through the angle with cosine Vl — z2, that is, with sine z, under which the line with the directional vector (0, y, z) goes over on the line with the directional vector (1, 0,0). The matrix of this rotation is 0 z \ 1 0 I , 0 R2 = vT iii) rotation H3 in the positive sense about the x axis through the angle p with the matrix /I 0 0 \ R3 = I 0 cos(y) - sm(p) , \0 sm( Kk on the bottom and we calculate directly (we write A for the matrix of / and B for the matrix of g in the chosen bases): gv,w 0 fu,v(x) = w o g o v_1 o v o f o u_1 = B ■ (A ■ x) = (B ■ A) ■ x = (g o f)^(x) for every x e Kn. By the associativity of matrix multiplications, the composition of mappings corresponds to multiplication of the corresponding matrices. Note that the isomorphisms correspond exactly to invertible matrices and that the matrix of the inverse mapping is the inverse matrix. The same approach shows how the matrix of a linear mapping changes, if we change the coordinates on both the domain and the codomain: V - V - f AV- IV s-1 where T is the coordinate transition matrix from u' to u and S is the coordinate change matrix from v' to v. If A is the original matrix of the mapping, then the matrix of the new mapping is given by A' = S*-1 AT. In the special case of a linear mapping / : V —> V, that is the domain and the codomain are the same space V, we express / usually in terms of a single basis u of the space V. Then the change from the old basis to the new basis u' with the coordinate transition matrix T leads to the new matrix A' = T~XAT. 2.3.17. Linear forms. A simple but very important case of linear mappings on an arbitrary vector space V over the scalars K appears with the codomain being the scalars themselves, i.e. mappings / : _ V —> K. We call them linear forms. If we are given the coordinates on V, the assignments of a single i-fh coordinate to the vectors is an example of a linear form. More precisely, for every choice of basis v = (iii,..., ii„), there are the linear forms 11* : V —> K such that 11* (vj ) = Sij, that is, 11* (vj ) = 1 when i = j, and 11* (vj ) = 0 when i =^ j. The vector space of all linear forms on V is denoted by V* and we call it the dual space of the vector space V. Let us now assume that the vector space V has finite dimension n. The basis of V*, v* = (11*,..., 11*), composed of assignments of individual coordinates as above, is called the dual basis to v. Clearly this is a basis of the space V*, because these forms are evidently linearly independent (prove 106 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA as the change of coordinate system of the observer) We have to understand, what happens with the coordinates of vectors first. The key to all this is the transition matrix (see 2.3.15). We will further write e for the standard basis, that is vectors ((1,0,0), (0,1,0), (0,0,1)) (these vectors could be any three linearly independent vectors in a vector space; with naming them as we did, we identified the vector space with R3) 2.C.21. A vector has coordinates (1,2,3) in the standard basis e. What are its coordinates in the basis M= ((1,1,0),(1,-1,2),(3,1,5))? Solution. We write the transiton matrix T for u to the standard basis first. We just write coordinates of the vectors which form the basis u in the columns: For expressing the sought coordinates we albeit need the transition matrix from the standard basis to u. No problem, it is just T( — 1). (see 2.3.15 if you have not done so yet). We already know how to compute inverse matrix (see 2.1.10). 1 Finally the sought coordinates are T-\W)T = (\,-1,\)T. □ Similarly we work with the matrix of a linear mapping. 2.C.22. We are given a linear mapping R3 —> R3 in the standard basis as the following matrix: \2 0 0/ Write down the matrix of this mapping in the basis (/i,/2,/3) = ((1,1,0), (-1,1,1), (2,0,1)). Solution. Again the transition matrix T for changing the basis from the basis / = (fi, H, h) to the standard basis e can be obtained by writing down the coordinates of the vectors /i. Hi H in the standard basis as the columns of the matrix T. Thus we have /I -1 2\ T= 1 1 0 . \0 1 1/ it!) and if a e V* is an arbitrary form, then for every vector U = X\V\ + ■ ■ ■ + xnvn a(u) = xia(vi) + ■ ■ ■ + xna(yn) = a{vi)vl(u) H-----h a(vn)v*(u) and thus the linear form a is a linear combination of the forms Taking into account the standard basis {1} on the one-dimensional space of scalars K, any choice of a basis v on V identifies the linear forms a with matrices of the type 1/n, that is, with rows y. The components of these rows are coordinates of the general linear forms a in the dual basis v*. Expressing such a form on a vector is then given by multiplying the corresponding row vector y with the column of the coordinates x of the vector u e V in the basis v: a(u) = y-x = yxxx H-----h ynxn. Thus we can see that for every finitely dimensional space V, the dual space V* is isomorphic to the space V. The choice of the dual basis provides such an isomorphism. In this context we meet again the scalar product of a row of n scalars with a column of n scalars. We have worked with it already in the paragraph 2.1.3 on the page 70. The situation is different for infinitely dimensional spaces. For instance the simplest example of the space of all polynomials K[x] in one variable is a vector space with a countable basis with elements v{ = x1. As before, we can define linearly independent forms v*. Every formal infinite sum JZSo aiv* *s now a well-defined linear form on K[x], because it will be evaluated only for a finite linear combination of the basis polynomials xl ,i = Q,\,2,.... The countable set of all v* is thus not a basis. Actually, it can be proved that this dual space cannot have a countable basis. 2.3.18. The length of vectors and scalar product. When dealing with the geometry of the plane R2 in j/ the first chapter we also needed the concept of the length of vectors and their angles, see 1.5.7. For denning these concepts we used the scalar product of two vectors u = (x,y) and v = (x',y') in the form u ■ v = xx1 + yy'. Indeed, the expression for the length of v = (x,y) is given by \\v\\ = \/x2 + y2 = while the (oriented) angle ip of two vectors u = (x, y) and v = (x1, y') is in the planar geometry given by the formula cos ip ■ xx + yy Note that this scalar product is linear in each of its arguments, and we denote it by u ■ v or by (u, v). The scalar product denned in such a way is symmetric in its arguments and of 107 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA The transition matrix for changing the basis from the standard basis to the basis / is then the inverse of T: T" 3 1 I V 4 4 2 / The matrix of the mapping in the basis / is then given by (see 2.2.11) □ 2.C.23. Consider the vector space of polynomials of one variable of degree at most 2 with real coefficients. In this space, consider the basis 1, x, x2. Write down the matrix of the derivative mapping in this basis and also in the basis / = (1 + X2, X, X + X2). Solution. First we have to determine the matrix of the derivative mapping (let us denote the mapping as d, its matrix as D). We chose the basis (1, x, x2) as a standard basis e, so we have coordinates 1 ~ (1,0, 0), x ~ (0,1, 0) and x2 ~ (0,0,1). We look at the images of the basis vectors: d(l) = 0 ~ (0,0,0), d{x) = 1 ~ (1,0,0) and d(x2) = 2x ~ (0,2,0). Now we write the images as columns into the matrix D: D ■ Now we write the coordinates of the basis vectors of the basis / into the columns: to get the transition matrix from / to e. As in the previous example we get the matrix of d in the basis / as T~XDT = where we had to compute □ course ||i>|| = 0 if and only if v = 0. We also see immediately that two vectors in the Euclidean plane are perpendicular whenever their scalar product is zero. Now we shall mimic this approach for higher dimensions. First, observe that the angle between two vectors is always a two-dimensional concept (we want the angle to be the same in the two-dimensional space containing the two vectors u and v). In the subsequent paragraphs, we shall consider only finitely dimensional vector spaces over real scalars R. Scalar product and orthogonality A scalar product on a vector space V over real numbers is a mapping ( , } : V x V —> R which is symmetric in its arguments, linear in each of them, and such that (v,v) > 0 and ||w||2 = (v, v) = 0 if and only if v = 0. The number ||i>|| = sj(v, v) is called the length of the vector v. Vectors v and w e V are called orthogonal or perpendicular whenever (v, w) = 0. We also write v _L w. The vector v is called normalised whenever ||i>|| = 1. The basis of the space V composed exclusively of mutually orthogonal vectors is called an orthogonal basis. If the vectors in such a basis are all normalised, we call the basis orthonormal. A scalar product is very often denoted by the common dot, that is, (u,v) = u ■ v. Thus, it is then necessary to recognize from the context whether the dot means a product of two vectors (the result is a scalar) or something different (e.g. we often denote the product of matrices and product of scalars in the same way). Because the scalar product is linear in each of its arguments, it is completely determined by its values on pairs of basis vectors. Indeed, choose abasis u = (ui,..., un) of the space V and denote S{j — (ui, Uj ). Then from the symmetry of the scalar product we know Sij = Sji and from the linearity of the product in each of its arguments we get If the basis is orthonormal, the matrix S is the unit matrix. This proves the following useful claim: Scalar product in coordinates Proposition. For every orthonormal basis, the scalar product is given by the coordinate expression (x,y) = yT ■ x. For each basis of the space V there is the symmetric matrix S such that the coordinate expression of the scalar product is (x,y) =yT ■ S -x. 108 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.C.24. In the standard basis in R3, determine the matrix of the rotation through the angle 90° in the positive sense about the line (t, t, t), t e R, oriented in the direction of the vector (1,1,1). Further, find the matrix of this rotation in the basis g = ((1,1,0), (1,0,-1), (0,1,1)). Solution. We can easily determine the matrix of the given rotation in a suitable basis, that is, in a basis given by the directional vector of the line and by two mutually perpendicular vectors in the plane x + y + z = 0, that is, in the plane of vectors perpendicular to the vector (1,1,1). We note that the matrix of the rotation in the positive sense through 90° in an orthonormal basis in R2 is 0 -1 In the orthogonal ba 0 -k/l v1 0 sis with vectors of length k, I respectively, it is , ^ ^ If we choose perpendicular vectors (1, —1,0) and (1,1, —2) in the plane x + y + z = 0 with lengths v2 and V&, then in the basis / = ((1,1,1), (1, -1,0), (1,1, -2)) the rotation /l 0 0 \ we are looking for has matrix 0 0 — V% I. In order \0 1/V3 0 / to obtain the matrix of the rotation in the standard basis, it is enough to change the basis. The transition matrix T for changing the basis from the basis / to the standard basis is obtained by writing the coordinates (under the standard basis) of the vectors of the basis / as the columns of the matrix /ll 1\ T: T = 1 -1 1 . Finally, for the desired matrix R, V 0 -2) we have /l 0 0 \ R = T-\Q 0 -VI \ ■ T-1 \0 1/V3 0 / / 1/3 1/3-^/3 1/3 + v/3/3\ = 1/3 + ^/3 1/3 1/3-^/3 1,1/3-^/3 1/3 + ^/3 1/3 / This result can be checked by substituting into the matrix of general rotation (2.C.20). By normalizing the vector (1,1,1) we obtain the vector (x, y, z) = (l/y/3, l/y/3, l/y/3), cos(^) = 0, sin() = 1. □ 2.C.25. Matrix of general rotation revisited. We de- rive the matrix of (general) rotation from ^>\tfV (2.C.20) through the angle

V on any vector space is called a projection, if we have /"/ = /■ In such a case, we can write, for every vector v e V, v = /(„) + („ _ /(„)) e Im(/) + Ker(/) = V and if v e Im(/) and f(v) = 0, then also v = 0. Thus the above sum of the subspaces is direct. We say that / is a projection to the subspace W = Im(/) along the subspace U = Ker(/). In words, the projection can be described naturally as follows: we decompose the given vector into a component in W and a component in U, and forget the second one. If V has a scalar product, we say that the projection is orthogonal if the kernel is orthogonal to the image. Every subspace W =^ V thus defines an orthogonal projection to W. It is a projection to W along W±, given by the unique decomposition of every vector u into components uw £ W and uw± e W±, that is, linear mapping which maps uw + uw± to uw- 2.3.20. Existence of orthonormal bases. It is easy to see that on every finite dimensional real vector space there exist scalar products. Just choose any basis. Define lengths so that each basis vector is of unit length. Immediately we have a scalar product. Call it orthonormal. In this basis the scalar products of vectors are computed as in the formula in the Theorem 2.3.18. More often we are given a scalar product on a vector space V, and we want to find an appropriate orthonormal basis for it. We present an algorithm using suitable orthogonal projections in order to transform any basis into an orthogonal one. It is called the Gramm-Schmidt orthogonalization process. 109 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA / = {{x,y,z),(-y,x, 0),(zx, zy,z2 - 1)), that is, in the orthogonal basis composed of the directional vector of the axis of rotation and of two mutually perpendicular vectors with sizes Vl — z2 lying in a plane perpendicular to the axis of rotation, the matrix corresponding to the /I 0 0 \ rotation is A = 0 cos(tp) — sm(ip) . The matrix \0 sm( 1,21-> —i, written in the coordinates (1,0) i-> (1,0) and (0,1) i-> (0, -1). By writing the images into the columns we obtain the matrix 1 0 0 -1 In the basis / the conjugation interchanges basis vectors, that is, (1,0) m- (0,1) and (0,1) m- (1,0) and the matrix of The point of this procedure is to transform a given sequence of independent generators v1,..., vk of a finite dimensional space V into an orthogonal set of independent generators of V. Gramm-Schmidt orthogonalization Proposition. Let (wi,..., Uk) be a linearly independent k-tuple of vectors of a space V with scalar product. Then there exists an orthogonal system of vectors (vi,..., Vk) such that Vi £ span{wi,... ,Ui}, and span{wi,... ,Ui} = span{«i,..., v^, for all i = 1,..., k. We obtain it by the following procedure: • The independence of the vectors Ui ensures that ui =f^ 0; we choose v\ = u\. • If we have already constructed the vectors vi,...,V£ with the required properties and if £ < k, we choose Vf+i = Uf+i + aivi + ■ ■ ■ + cifVf, where a, = conjugation under this basis is 0 1 1 0 Proof. We begin with the first (nonzero) vector vi and calculate the orthogonal projection v2 to spanjwij^ C span{«i, v2}. The result is nonzero if and only if v2 is independent of v1. All other steps are similar: In step £, £ > 1 we seek the vector vi+1 = ui+1 + a1v1 + ----h a/vt satisfying {vi+i, Vi) = 0 for all i = 1,..., £. This implies 0 = {ue+1 + a1v1-\-----\-aeve,Vi) = {ue+1,Vi) + a,i{vi,Vi) and we can see that the vectors with the desired properties are determined uniquely up to a scalar multiple. □ Whenever we have an orthogonal basis of a vector space V, we just have to normalise the vectors in order to obtain an orthonormal basis. Thus, starting the Gramm-Schmidt orthogonalization with any basis of V, we have proven: Corollary. On every finite dimensional real vector space with scalar product there exists an orthonormal basis. In an orthonormal basis, the coordinates and orthogonal projections are very easy to calculate. Indeed, suppose we have an orthonormal basis (ei,..., e„) for a space V. Then every vector v = x1e1 + ■ ■ ■ + xnen satisfies (ei,v) = (e,, xiei H-----h xnen) = x{ and so we can always express (1) v = (e1,v)e1 H-----h (e„, v)en. If we are given a subspace W C V and its orthonormal basis (ei,..., e^), then we can extend it to an orthonormal basis (ei,..., en) for V. Orthogonal projection of a general vector v e V to W is then given by the expression v i-> (e1,v)e1 H-----h (en,v)ek. 110 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA b) For the basis (1, i) we obtain 1 i-> 2 + i, i i-> 2i — 1, that is, (1,0) m- (2,1), (0,1) m- (2,-1). Thus the matrix of multiplication by the number 2 + i under the basis (1, i) is: 2 -1^ 1 2 We determine the matrix in the basis /. Multiplication by (2+i) gives us: (1-i) m- (l-i)(2+i) = 3-i, (1+i) m- (1 + 3i). Coordinates (a, &) j of the vector 3 — i in the basis / are given, as we know, by the equation a-(l—i)+b-(l+i) = 3+i, that is, (3 + i)f_ = (2,1). Analogously (1 + 3i)/ = (-1,2). Altogether, we obtain the matrix ^ ^ Think about the following: why is the matrix of multiplication by 2 + i the same in both bases? Would the two matrices in these bases be the same for multiplication by any complex number? □ 2.C.27. Determine the matrix A which, under the standard basis of the space R3, gives the orthogonal projection on the vector subspace generated by the vectors ui = (—1,1,0) and u2 = (-1,0,1). Solution. Note first that the given subspace is a plane containing the origin with normal vector u3 = (1,1,1). The ordered triple (1,1,1) is clearly a solution to the system —Xi + X2 =0, -xi + x3 = 0, that is, the vector u3 is perpendicular to the vectors u1,u2. Under the given projection the vectors u1 and u2 must map to themselves and the vector u3 on the zero vector. In the basis composed of ui, u2, u3 (in this order) is thus the matrix of this projection (1 ! S). \0 0 0/ Using the the transition matrix for changing the basis /-I -1 l\ T = 1 0 1 , T" \0 11/ from the basis (ui, u2, u3) to the standard basis, and from the standard basis to the basis («i, u2, u3) we obtain A -1 A 0 0 1 0 1 ľ 0 1 0 0 1 1/ Vo 0 0 2 1 I\ 31 23 I \ . 3 3 3 / In particular, we need only consider an orthonormal basis of the subspace W in order to write the orthogonal projection to W explicitly. Note that in general the projection / to the subspace W along U and the projection gioU along W is constrained by the equality g = idy —/. Thus, when dealing with orthogonal projections to a given subspace W, it is always more efficient to calculate the orthonormal basis of that space W or W1- whose dimension is smaller. Note also that the existence of an orthonormal basis guarantees that for every real space V of dimension n with a scalar product, there exists a linear mapping which is an isomorphism between V and the space R™ with the standard scalar product (i.e. respecting the scalar products as well). We saw already in Theorem 2.3.18 that the desired isomorphism is exactly the coordinate assignment. In words - in every orthonormal basis the scalar product is computed by the same formula as the standard scalar product in R™. The constant coefficient is the determinant \A\. We shall see later that this coefficient describes how much the linear mapping scales the volumes. We shall return to the questions of the length of a vector and to projections in the following chapter in a more general context. 2.3.21. Angle between two vectors. As we have already noted, the angle between two linearly independent vectors in the space must be the same as when we consider them in the two-dimensional subspace they generate. Basically, this is the reason why the notion of angle is independent of the dimension of the original space. If we choose an orthogonal basis such that its first two vectors generate the same subspace as the two given vectors u and v (whose angle we are measuring), we can simply take the definition from the planar geometry. Independently of the choice of coordinates we can formulate the definition as follows: Angle between two vectors The angle ip between two vectors v and w in a vector space with a scalar product is given by the relation (v,w) cos p ■ IMIIHI □ The angle defined in this way does not depend on the order of the vectors v, w and it is chosen in the interval 0 < p < it. We shall return to scalar products and angles between vectors in further chapters. 2.3.22. Multilinear forms. The scalar product was given as a mapping from the product of two copies of a vector space V into the space of scalars, which was linear in each of its arguments. Similarly, we will work with mappings from the product of k copies of a vector space V into the scalars, which are linear in each of its k arguments. We speak of k-linear forms. Ill CHAPTER 2. ELEMENTARY LINEAR ALGEBRA D. Properties of linear maps 2.D.I. Write down the matrix of the mapping of orthogonal projection on the plane passing through the origin and perpendicular to the vector (1,1,1). Solution. The image of an arbitrary point (vector) x = (x1,x2,x3) e R3 under the considered mapping can be obtained by subtracting from the given vector its orthogonal projection onto the direction normal to the considered plane, that is, onto the direction (1,1,1). This projection p is given by (see 1) as (x, (1,1,1)) = ( l(l,l,l)l2 xľ+x2 + x3 xľ+x2 + x3 xľ+x2 + x3^ 3 3 The resulting mapping is thus We have (correctly) obtained the same matrix as in the exercise 2.C.27. □ 2.D.2. In R3 write down the matrix of the mirror symmetry with respect to the plane containing the origin and (1,1,1) being its normal vector. Solution. As in 2.D.1 we get the image of an arbitrary vector x = (x1,x2,x3) e R3 with the help of the orthogonal projection onto the direction (1,1,1). Unlike in the previous example, we need to subtract the projection twice (see image). Thus we get the matrix: x — 2p = x1 2(x2 +x3) x2 2(x1 +x3) x3 2(x1 +x2) 'T 3 ' Y 3 3 1 2 3 3 2 1 I 32 . 3 3 Second solution. The normed normal vector of the mirror plane is n = ^(1,1,1). We can express the mirror image of v under the mirror symmetry Z as follows: Z(y) = v — 2{v,n)n = v — 2n ■ (nT ■ v) = v — 2(n ■ nT) ■ v = ((E — 2n ■ nT)v (where we have used (v, n) = v ■ nT for the standard scalar product and the associativity of the matrix Most often we will meet bilinear forms, that is, the case a : V x V —> K, where for any four vectors u, v, w, z and scalars a, b, c and d we have a(au + bv, cw + dz) = aca(u, w) + ada(u, z) + bca(v, w) + bda(v, z). If additionally we always have a(u, w) = a(w, u), then we speak of a symmetric bilinear form. If interchanging the arguments leads to a change of sign, we speak of an antisymmetric bilinear form. Already in planar geometry we have defined the determinant as a bilinear antisymmetric form a, that is, a(u, w) = —a(w,u). In general, due to the theorem 2.2.5, we know that the determinant with dimension n can be seen as an n-linear antisymmetric form. As with linear mappings it is clear that every fc-linear form is completely determined by its values on all fc-tuples of basis elements in a fixed basis. In analogy to linear mappings we can see these values as fc-dimensional analogues to matrices. We show this by an example with k = 2, where it will correspond to matrices as we have defined them. Matrix of a bilinear form If we choose a basis uonV and define for a given bilinear form a scalars = a(ui,Uj) then we obtain for vectors v, w with coordinates x and y (as columns of coordinates) a(v,w) = ^ aijxiVj where A is a matrix A = (a^-). A-y, Directly from the definition of the matrix of a bilinear form we see that the form is symmetric or antisymmetric if and only if the corresponding matrix has this property. Every bilinear form a on a vector space V defines a mapping V —> V*, v i-> a(v, ). That is, by placing a fixed vector in the first argument we obtain a linear form which is the image of this vector. If we choose a fixed basis on a finitely dimensional space V and a dual basis V*, then we have the mapping x H> (y H> xT -A-y). All this is a matter of convention. Also we may fix the second vector and get a linear form again. 4. Properties of linear mappings In order to exploit vector spaces and linear mappings in modelling real processes and systems in other sciences, we need a more detailed analysis of properties of diverse types of linear mappings. 112 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA multiplication). We get the same matrix: E — 2n- n = 0 1 0 - 2.4.1. □ 2.D.3. Consider R3, with the standard coordinate system. In the plane z = 0 there is a mirror and at the point [4,3,5] there is a candle. The observer at the point [1,2, 3] is not aware of the mirror, but sees in it the reflection of the candle. Where does he think the candle is? Solution. Independently of our position, we see the mirror image of the scene in the mirror (that is why it is called a mirror image). The mirror image is given by reflecting the scene (space) by the plane of the mirror, the plane 2 = 0. The reflection with respect to this plane changes the sign of the 2-coordinate. That is we can see the candle at the point [4,3,-5]. □ By using the inner product we can determine the (angular) deflection of the vectors: 2.D.4. Determine the deflection of the roots of the polynomial x2 — i considered as vectors in the complex plane. Solution. The roots of the given polynomial are square roots of i. The arguments of the square roots of any complex numbers differ according to the de Moivre theorem by it. Their deflection is thus always it. □ 2.D.5. Determine the cosine of the deflection of the lines p, q in R3 given by the equations p : —2x + y + z = 1 x + 3y — Az = 5 q : x — y = —2 2 = 6 o 2.D.6. Using the Gram-Schmidt orthogonalisation, obtain the orthogonal basis of the subspace U = {(x1,x2,x3,xA)T e R4; x1 + x2 + x3 + xA = 0} of the space R4. We begin with four examples in the lowest dimension of interest. With the standard basis of the plane R2 and with the standard scalar product we consider the following matri-/// ■ ces of mappings / : R2 —> R2: A ■ 1 0 0 0 0 1 0 0 a 0 0 b 0 -1 1 0 The matrix A describes the orthogonal projection along the subspace W = {(0,a); a e R} C R2 to the subspace V = {(a,0); a e R} C R2, that is, the projection to the x-axis along the y-axis. Evidently for this / : R2 —> R2 we have / o / = / and thus the restriction /1 v of the given mapping on its codomain is the identity mapping. The kernel of / is exactly the subspace W. The matrix B has the property B2 = 0, therefore the same holds for the corresponding mapping /. We can envision this as the differentiation of polynomials Ri [x] of degree at most one in the basis (l,x) (we shall come to differentiation in chapter five, see 5.1.6). The matrix C gives a mapping /, which rescales the first vector of the basis a-times, and the second one b-times. Therefore the whole plane divides into two subspaces, which are preserved under the mapping and where it is only a homothety, that is, scaling by a scalar multiple (the first case was a special case with a = 1, b = 0). For instance the choice a = 1, b = — 1 corresponds to axial symmetry (mirror symmetry) under the x-axis, which is the same as complex conjugation x+iy i-> x—iy on the two-dimensional real space R2 ~ C in basis (1, i). This is a linear mapping of the two-dimensional real vector space C, but not of the one-dimensional complex space C. The matrix D is the matrix of rotation by 90 degrees (the angle 7r/2) centered at the origin in the standard basis. We can see at first glance that none of the one-dimensional subspaces is preserved under this mapping. Such a rotation is a bijection of the plane onto itself, therefore we can surely find distinct bases in the domain and codomain, where its matrix will be the unit matrix E. We simply take any basis of the domain and its image in the codomain. But we are not able to do this with the same basis for both the domain and the codomain. Consider the matrix D as a matrix of the mapping g : C2 —> C2 with the standard basis of the complex vector space C2. Then we can find vectors u = (i, 1), v = (—2,1), for which we have 0 -1 1 0 0 -1 1 0 -1 -1 113 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA Solution. The set of solutions of the given homogeneous linear equation is clearly a vector space with the basis (A 1 0 0 lil = 0 1 0 W shall be denoted Denote by vi, v2, v3, vectors of the orthogonal basis obtained using the Gram-Schmidt orthogonalisation process. First set vi = ui. Then let V2 = «2 - ■ Vl 1 1<1 vi = u2 - -vx = 1,0 1 1 that is, choose a multiple v2 = (—1, —1,2,0) . Then let T \\vl 11 1 1 1 ~3'~3'~3' Altogether we have T I |l^2 112 V2 = U3 ■v2 (A 1 -l -i 0 2 -i Due to the simplicity of the exercise we can immediately give an orthogonal basis of the vectors (1,-1,0,0)T, (0,0,1,-1)T, (1,1,-1,-1)T (-1,1,1,-1)\ (1,-1,1,-1) , (-1,-1,1,1) □ 2.D.7. Write down a basis of the real vector space of the matrices 3x3 over R with zero trace. (The trace of a matrix is the sum of the elements on the diagonal). Write the coordinates of the matrix /I 2 0\ 0 2 0 V -2 -3/ in this basis. 2.D.8. Find the orthogonal complement [7± of the subspace U = {(x1,x2,x3,xA); xi = x3,x2 = x3 + 6xA} C R4. Solution. The orthogonal complement [7± consists of just those vectors that are perpendicular to every solution of the system xi - x3 = 0, x2 — x3 — 6x4 = 0. That means that in the basis (u, v) on C2, the mapping g has the matrix K = i 0 0 -i Notice that by extending the scalars to C, we arrive at an analogy to the matrix C with diagonal elements a = cos(^tt) + jsin(1;7r) and its complex conjugate a. In other words, the argument of the number a in polar form provides the angle of the rotation. This is easy to understand, if we denote the real and imaginary part of the vector u as follows xu + Wu = Re u + i Im u ■ The vector v is the complex conjugate of u. We are interested in the restriction of the mapping g to the real vector subspace V = R2 n spanc{u, «}cC2. Evidently, V = spanK{u + u, i(u - u)} = spanK{a;u, -yu} is the whole plane R2. The restriction of g to this plane is exactly the original mapping given by the matrix D (notice this matrix is real, thus it preserves this real subspace). It is immediately seen that this is the rotation through the angle \ it in the positive sense with respect to the chosen basis xu, —yu. Work it by yourself with a direct calculation. Note also why exchanging the order of the vectors u and v leads to the same result, although in a different real basis! 2.4.2. Eigenvalues and eigenvectors of mappings. A key to the description of mappings in the previous examples was the answer to the question "what are the vectors satisfying the equation j(u) = a ■ u for some suitable scalars a?". We consider this question for any linear mapping / : V —> V on a vector space of dimension n over scalars K. If we imagine such an equality written in coordinates, i.e. using the matrix of the mapping A in some bases, we obtain a system of linear equations A ■ x — a ■ x = (A — a ■ E) ■ x = 0 with an unknown parameter a. We know already that such a system of equations has only the solution x = 0 if the matrix A—aE is invertible. Thus we want to find such values aeK for which A — aE is not invertible, and for that, the necessary and sufficient condition reads (see Theorem 2.2.11) (1) det(yl -a-E) = 0. If we consider A = a as a variable in the previous scalar equation, we are actually looking for the roots of a polynomial of degree n. As we have seen in the case of the matrix D, the roots may exist in an extension of our field of scalars, if they are not in K. 114 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA A vector is a solution of this system if and only if it is perpendicular to both vectors (1,0, —1, 0), (0,1, —1, —6). Thus we have U± = {a- (1, 0, -1, 0) + b ■ (0,1, -1, -6); a,b e R}. □ 2.D.9. Find an orthonormal basis of the subspace V C where V = {(xi, x2, x3, x4) e Ci + 2^2 + x3 = 0}. Solution. The fourth coordinate does not appear in the restriction for the subspace, thus it seems reasonable to select (0,0,0,1) as one of the vectors of the orthonormal basis and reduce the problem into the subspace R3. If we set the second coordinate equal to zero, then in the investigated space there are vectors with reverse first and third coordinate, notably, the unit vector (^j, 0, —^75,0). This vector is perpendicular to any vector which has first coordinate equal to the third coordinate. In order to get into the investigated subspace, we choose the second coordinate equal to the negative of the sum of the first and the third coordinate, and then normalise. Thus we choose the vector (-7 , 0) and we are finished. □ V6' %/6' VE 2.D.10. Find the eigenvalues and the associated subspaces of eigenvectors of the matrix A- Solution. First we find the characteristic polynomial of the matrix: -1-A 1 0 -1 2 3 — A -2 0 2-A = A3 - 4A3 + 2A + 4 3 This polynomial has roots 2,1 + \/3, 1 — \/3, which are then the eigenvalues of the matrix. Their algebraic multiplicity is one (they are simple roots of the polynomial), thus each has associated only one (up to a non-zero multiple) eigenvector. Otherwise stated, the geometric multiplicity of the eigenvalue is one, see 3.4.10). We determine the eigenvector associated with the eigenvalue 2. It is a solution of the homogeneous linear system with the matrix A - 2E: —3xi + X2 = 0 —lxi+x2 = 0 2xi - 2x2 = 0. Eigenvalues and eigenvectors Scalars A e K satisfying the equation j(u) = A ■ u for some nonzero vector u e V are called the eigenvalues of mapping f. The corresponding nonzero vectors u are called the eigenvectors of the mapping f. If u, v are eigenvectors associated with the same eigenvalue A, then for every linear combination of u and v, f(au + bv) = af(u) + bf(y) = X(au + bv). Therefore the eigenvectors associated with the same eigenvalue A, together with the zero vector, form a nontrivial vector subspace V\ C V. We call it the eigenspace associated with A. For instance, if A = 0 is an eigenvalue, the kernel Ker / is the eigenspace Vq. We have seen how to compute the eigenvalues in coordinates. The independence of the eigenvalues from the choice of coordinates is clear from their definition. But let us look explicitely what happens if we change the basis. As a direct corollary of the transformation properties from the paragraph 2.3.16 and the Cauchy theorem 2.2.7 for calculation of the determinant of product, the matrix A' in the new coordinates will be A' = P~1AP with an invertible matrix P. Thus \P~1AP - \E\ = \P~1AP - P~1\EP\ = \P~\A- \E)P\ = |P_1||(A- A£)||P| = \A-\E\, because the scalar multiplication is commutative and we know that |P_1| = |P|_1. For these reasons we use the same terminology for matrices and mappings: Characteristic polynomials For a matrix A of dimension n over K we call the polynomial \A — \E\ £ Kn [A] the characteristic polynomial of the matrix A. Roots of this polynomial are the eigenvalues of the matrix A. If A is the matrix of the mapping / : V —> V in a certain basis, then \A — \E\ is also called the characteristic polynomial of the mapping f. Because the characteristic polynomial of a linear mapping / : V —> V is independent of the choice of the basis of V, the coefficients of individual powers of the variable A are scalars expressing some properties of /. In particular, they too cannot depend on the choice of the basis. Suppose dim V = n and A = (a^) is the matrix of the mapping in some basis. Then \A - A ■ E\ = (-1)™A™ + (-l^Vii + ■ ■ ■ + ann)A™-1 + ■■■ + \A\\°. The coefficient at the highest power says whether the dimension of the space V is even or odd. 115 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA The system has solution x\ = x2 = 0, x3 e R arbitrary. So the eigenvector associated with the value 2 is then the vector (0,0,1) (or any multiple of it). Similarly we determine the remaining two eigenvectors - as solutions of the system [A - (1 + V3)E}x = 0. The solution of the system (-2 - V3)xx + x2 = 0 -lxi + (2 - V3)x2 = 0 2a;i - 2x2 + (1 - V3)x3 = 0 is the space {(2 - V3,1,2) i, t e R}. That is the space of eigenvectors associated with the eigenvalue 1 + V3- Similarly we obtain that the space of eigenvectors associated with the eigenvalue 1 — \/3 is {(2 + \/3, 1, —2) t, t e R}. □ 2.D.11. Determine the eigenvalues and eigenvectors of the <2im3> matrix A = Describe the geometric interpretation of this mapping and write down its matrix in the basis: ei = (1,-1,1) e2 = (1,2,0) e3 = (0,1,1) The most interesting coefficient is the sum of the diagonal elements of the matrix. We have just proved that it does not depend on the choice of the basis and we call it the trace of the matrix A and denote it by Tr A. The trace of the mapping f is denned as a trace of the matrix in an arbitrary basis. In fact, this is not so surprising once we notice that the trace is actually the linear approximation of the determinant in the neighbourhood of the unit matrix in the direction A. We shall deal with such concepts in Chapter 8 only. But since the determinant is a polynomial, we may see easily that the only terms in det(_E + tA) which are linear in the real parameter t are just the trace. We shall see relation to matrix exponential later in Chapter 8. The coefficient at A0 is the determinant \A\ and we shall see later that it describes the rescaling of volumes by the mapping. 2.4.3. Basis of eigenvectors. We discuss a few important properties of eigenspaces now. Theorem. Eigenvectors of linear mappings f : V —> V associated to different eigenvalues are linearly independent. Proof. Let ai,..., ak j^jt; I. mapping / and ui be distinct eigenvalues of the .., Uk eigenvectors with these eigenvalues. The proof is by induction on the number of linearly independent vectors among the chosen ones. Assume that ui,...,ue are linearly independent and ui+i — J2i ciui is tneir linear combination. We can choose 1 = 1, because the eigenvectors are nonzero. But then f(ue+i) = ai+i ■ ui+i = J2i=i aW ■ Ui, that is, f(ui+1) = ^2 ai+i -Ci-Ui a- f(ui) = ^2( Solution. The characteristic polynomial of the matrix A is = -A3+4A2-2A = -A(A2-4A+2) 1-A 1 1 1 0 2 - A 1 2 1-A The roots of this polynomial are the eigenvalues, thus the eigenvalues are 0, 2 + \/2, 2 — \/2. Thus eigenvalues are 0, 2 + \[2, 2 — \[2. We compute the eigenvectors associated with the particular eigenvalues: • 0: We solve the system 0 Its solutions form a one-dimensional vector space of eigenvectors: span{(l, —1,1)}. By subtracting the second and the fourth expression in the equalities we obtain 0 = Y?i=i(ai+i ~ ad ' ci ' ui- All the differences between the eigenvalues are nonzero and at least one coefficient q is nonzero. This is a contradiction with the assumed linear independence ui,...,ue, therefore also the vector u;+i must be linearly independent of the others. □ The latter theorem can be seen as a decomposition of a linear mapping / into a sum of much simpler mappings. If there are n = dim V distinct ? eigenvalues A,, we obtain the entire V as a direct sum of one-dimensional eigenspaces V\z. Each of them then describes a projection on this invariant one-dimensional subspace, where the mapping is given just as multiplication by the eigenvalue A,. Furthermore, this decomposition can be easily calculated: 116 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA • 2 + \/2: We solve the system /-(1 + v2) 1 0 \ fXl\ i -V5 i ) \x2 = o. V 1 2 -(1 + V2)/ VW The solutions form a one-dimensional space span{ (1,1+ • 2 — \/2: We solve the system av2-i) i o \ /,a 1 v"2 1 \\x2 \= 0. V 1 2 (v^-l)/ w Its solutions form a space of eigenvectors span{(l, 1 — y/2,1- v2)}. Hence the given matrix has eigenvalues 0, 2 + \/2 and 2 — \/2, with the associated one-dimensional spaces of eigenvectors span{(l, —1,1)}, span{(l, 1 + \/2,1 + \/2)} and span{(l, 1 — \/2, 1 — V^)} respectively. The mapping can thus be interpreted as a projection along the vector (1,-1,1) into the plane given by the vectors (1,1 + y/2,1 + y/2) and (1,1 - y/2,1 - y/2) composed with the linear mapping given by "stretching" by the factor corresponding to the eigenvalues in the directions of the associated eigenvectors. Now we express it in the given basis. For this we need the matrix T for changing the basis from the standard basis to the new basis. This can be obtained by writing the coordinates of the vectors of the original basis under the new basis into the columns of the matrix T. But we shall do it in a different way - we obtain first the matrix for changing the basis from the new one to the original one, that is, the matrix T-1. We just write the coordinates of the vectors of the new basis into the columns: /I 1 0\ T-1 = -1 2 1 . V1 0 l) Then / 0 0 1 \ T = T_1 1 = 1 0 -1 , V-2 1 3 J and for the matrix B of a mapping under new basis we have (see 2.3.16) /0 5 2 \ B = TAT'1 = [ 0 -2 -1 . \0 14 6 / □ You can find more exercises on computing with eigenvalues and eigenvectors on the page 133. Basis of eigenvectors Corollary. If there exist n mutually distinct roots \ of the characteristic polynomial of the mapping f : V —> V on the n-dimensional space V, then there exists a decomposition of V into a direct sum of eigenspaces each of dimension one. This means that there exists a basis for V consisting only of eigenvectors and in this basis the matrix for f is the diagonal matrix with the eigenvalues on the diagonal. This basis is uniquely determined up to the order of the elements and scale of the vectors. The corresponding basis (expressed in the coordinates in an arbitrary basis ofV) is obtained by solving n systems of homogeneous linear equations of n variables with matrices (A — Xi ■ E), where A is the matrix of f in a chosen basis. 2.4.4. Invariant subspaces. We have seen that every eigenvector v of the mapping / : V —> V generates a subspace span{w} c V, which is preserved by the mapping /. More generally, we say that a vector sub-space W C V is an invariant subspace for a linear mapping f,iff(W)cW. If V is a finite dimensional vector space and we choose some basis («i,..., uk) of a subspace W, we can always extend it to be a basis (ui,..., Uk, Mfc+i, • • • ,un) for the whole space V. For every such basis, the mapping will have a matrix A of the form (1) A ■ B C 0 D where B is a square matrix of dimension k, D is a square matrix of dimension n — k and C is a matrix of the type n/(n — k). On the other hand, if for some basis («i,..., un) the matrix of the mapping / is of the form (1), then W = span{ui,..., Uk} is invariant under the mapping /. By the same arguments, the mapping with the matrix A as in (1) leaves the subspace span{ufc+i,..., un} invariant, if and only if the submatrix C is zero. From this point of view the eigenspaces of the mapping are special cases of invariant subspaces. Our next task is to find some conditions under which there are invariant complements of invariant subspaces. 2.4.5. We illustrate some typical properties of mappings on the spaces R3 and R2 in terms of eigenvalues and eigenvectors. (1) Consider the mapping given in the standard basis by the matrix A 117 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA In the case of a 3 x 3 matrix, you can use this special formula to find its characteristic polynomial: 2.D.12. For any nxn matrix A its characteristic polynomial ^4 — A E | is of degree n, that is, it is of the form \A-\E\ = cn An+c„_i An_1H-----hci A+c0, c„ ^ 0, while we have ^ = (-1)™, Cn_! = (-I)™"1 tr A c0 = I A\. If the matrix ^4 is three-dimensional, we obtain \A-\E\ = -A3 + (trA) A2 + ciA+ \A\. By choosing A = 1 we obtain \A- E \ = -1 + trvl + ci + | yl |. From there we obtain \A-\E\ = -A3 + (trA) A2 + (| A- E\ + 1 - trA- | A\) A+ | A\. Use this expression for determining the characteristic polynomial and the eigenvalues of the matrix 32 -67 47 7 -14 13 -7 15 -6 o 2.D.13. Find the orthonormal complement of the vec-torspace spaned by the vectors (2,1,3), (3,16,7), (3, 5,4), (-7,7,-10). Solution. In fact the task consists of solving the system 2.A.3, which we have done already. □ 2.D.14. Pauli matrices. In physics, the state of a particle with spin | is described with Pauli matrices. They ^xT^c are the 2 x 2 matrices over complex numbers: 0i 0 1\ /0 -iN 1 0 0 -1 vi oy " 0 For square matrices we define their commutator (denoted by square brackets) as [oi, a2\ := o\o2 — o2o\ Show that [ai, o2] = 2ia3 and similarly [a1,a3] = 2io2 and [02,03] = 2ia1. Furthermore, show that o2 = o2 = of = 1 and that the eigenvalues of the matrices 01, o2, o3 are ±1. Show that for matrices describing the state of the particle with spin 1, namely 1 A 1 °\ 1 A ~i 0 \ A 0 0 \ 7=2 [l I I '71(i 0 -l]Ao 0 0 1 We compute \A - A^l = with roots Ai = eigenvalue A = 0 1 - A 0 1 0 -A = -A3 + A2 + A-l, 1. The eigenvectors with \ ľ 0 -A ~ 0 0 0 / Vo 0 0 / : 1,A2 = 1,A3 = - 1 can be computed: /-1 or 000 \ 1 0 -i/ with the basis of the space of solutions, that is, of all eigenvectors with this eigenvalue ui = (0,1,0), w2 = (1,0,1). Similarly for A = — 1 we obtain the third independent eigenvector A 0 A A 0 A 0 2 - 0 2 0 ^W3 = (-1,0,1) U 0 1/ Vo 0 0/ Under the basis «i, u2, u3 (note that u3 must be linearly independent of the remaining two because of the previous theorem and ui, u2 were obtained as two independent solutions) / has the diagonal matrix /I 0 0\ A= 0 1 0 . \0 0 -l) The whole space R3 is a direct sum of eigenspaces, R3 = Vi ffi V2, with dim V\ = 2, and dim V2 = 1. This decomposition is uniquely determined and says much about the geometric properties of the mapping /. The eigenspace V\ is furthermore a direct sum of one-dimensional eigenspaces, which can be selected in other ways (thus such a decomposition has no further geometrical meaning). (2) Consider the linear mapping / : r2 [x] —> r2 [x] defined by polynomial differentiation, that is,/(l) = 0, j(x) = 1, j(x2) = 2x. The mapping / thus has in the usual basis (l,x, x2) the matrix The characteristic polynomial is | A—A- E\ = —A3, thus it has only one eigenvalue, A = 0. We compute the eigenvectors: 0 1 °\ ŕ 1 °\ 0 0 2 . - 0 0 1 Vo 0 0 Vo 0 0 ,0 i 0 >0 0 -I, The space of the eigenvectors is thus one-dimensional, generated by the constant polynomial 1. The striking property of this mapping is that is no basis for which the matrix would be diagonal. There is the "chain" of vectors mapping four independent generators as follows: |x2i-^a;i-^li-^0 builds a sequence of subspaces without invariant complements. 118 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA , the commuting relations are the same as in the case of Pauli matrices. Equivalently it can be shown that under the notation 1 := ^ l) '= *fj3'^ := i&2,K := ici forms the vector space with basis (1,1, J, K) of an algebra of quaternions (the algebra is a vector space with binary bilinear operation of multiplication, in this case the multiplication is given by matrix multiplication). In order for the vector space to be an algebra of quaternions it is necessary and sufficient to show the following properties: P = J2 = K2 = —1 and IJ = -JI = K,JK = -KJ = I and KI = -IK = J. 2.D.15. Can the matrix «-(; f) be expressed in the form of the product B = P-1 ■ D ■ P for some diagonal matrix D and invertible matrix PI If possible, give an example of such matrices D, P, and find out how many such pairs there are. Solution. The matrix B has two distinct eigenvalues, and thus such an expression exists. For instance it holds that [5 6\ _ i fy/2 -V2\ (11 0 \ i / y/2 y/2\ \6 5) 2\V2 V2j\0 -1)'2\-V2 V2J- There exist exactly two diagonal matrices D: (11 0\ (-1 0\ \0 -1)' \0 11J ' but the columns of the matrix P-1 can be substituted with their arbitrary non-zero scalar multiples, thus there are infinitely many pairs D, P. □ As we have already seen in 2.D. 11, based on the eigenvalues and eigenvectors of the given 3x3 matrix, we can often interpret geometrically the mapping it induces in R3. In particular, we notice that can do so in the following situations: If the matrix has 0 as eigenvalue and 1 as an eigenvalue with geometric multiplicity 2, then it is a projection in the direction of the eigenvector associated with the eigenvalue 0 on the plane given by the eigenspace of the eigenvalue 1. If the eigenvector associated with 0 is perpendicular to that plane, then the mapping is an orthogonal projection. If the matrix has eigenvalue —1 with the eigenvector perpendicular to the plane of the eigenvectors associated with the eigenvalue 1, then it is a mirror symmetry through the plane of the eigenvectors associated with 1. 2.4.6. Orthogonal mappings. We consider the special case tJT j-p.,, of the mapping / : V —> W between spaces fi^TWC/ with scalar products, which preserve lengths for ÚM^U,t^___ all vectors u e V. Orthogonal mappings A linear mapping / : V —> W between spaces with scalar product is called an orthogonal mapping, if for all u e V (f(u),f(u)) = (u,u). The linearity of / and the symmetry of the scalar product imply that for all pairs of vectors the following equality holds: (f(u + v), f(u + v)) = (f(u), f(u)) + (f(v), /(„)) + 2{f(u),f(v)). Therefore all orthogonal mappings satisfy also the seemingly stronger condition for all vectors u,v e V: (f(u),f(v)) = (u,v), i.e. the mapping / leaves the scalar product invariant if and only if it leaves invariant the length of the vectors. (We should have noticed that this is true for all fields of scalars, where 1 + 1^0, but it does not hold true for Z2.) In the initial discussion about the geometry in the plane we proved in the Theorem 1.5.10 that a linear mapping R2 —> R2 preserves lengths of the vectors if and only if its matrix in the standard basis (which is orthonormal with respect to the standard scalar product) satisfies AT ■ A = E, that is, A-1 = AT. In general, orthogonal mappings / : V —> W must be always injective, because the condition (f(u), J(u)) = 0 implies (u, u) = 0 and thus u = 0. In such a case, the dimension of the range is always at least as large as the dimension of the domain of /. But then both dimensions are equal and / : V —> Im / is a bijection. If Im / ^ W, we extend the orthonormal basis of the image of / to an orthonormal basis of the range space and the matrix of the mapping then contains a square regular submatrix A along with zero rows so that it has the required number of rows. Without loss of generality we can assume that W = V. Our condition for the matrix of an orthogonal mapping in any orthonormal basis requires that for all vectors x and y in the space K": (A ■ xf ■ (A ■ y) = xT ■ (AT ■ A) ■ y = xT ■ y. Special choice of the standard basis vectors for x and y yields directly AT ■ A = E, that is, the same result as for dimension two. Thus we have proved the following theorem: Matrix of orthogonal mappings Theorem. Let V be a real vector space with scalar product and let f : V —> V be a linear mapping. Then f is orthogonal if and only if in some orthogonal basis (and then consequently in all of them) its matrix A satisfies AT = A-1. 119 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA If the matrix has eigenvalue 1 with an eigenvector perpendicular to plane of the eigenvectors associated with the eigenvalue —1, then it is an axial symmetry (in space) through the axis given by the eigenvector associated with 1. 2.D.16. Determine what linear mapping R3 by the matrix 1 is given _I _2x 3 3 \ _1 _8 3 3 I 1 1/ Solution. The matrix has a double eigenvalue —1, its associated eigenspace is span{(2,0,1), (1,1,0)}. Further, the matrix has 0 as the eigenvalue, with eigenvector (1,4, —3). The mapping given by this matrix under the standard basis is then an axial symmetry through the line given by the last vector composed with the projection on the plane perpendicular to the last vector, that is, given by the equation x + Ay — 3z = 0. □ 2.D.17. The theorem 2.4.7 gives us tools for recognising a matrix of a rotation in R3. It is orthogonal (rows orthogonal to each other equivalently the same for the columns). It has three distinct eigenvalues with absolute value 1. One of them is the number 1 (its associated eigenvector is the axis of the rotation). The argument of the remaining two, which are necessarily complex conjugates, gives the angle of the rotation in the positive sense in the plane given by the basis u\ + u\, i{u\ -«!)■ 2.D.18. Determine what linear mapping is given by the matrix 3 16 -12 5 25 25 -16 93 24 25 125 125 12 24 107 25 125 125 Í). 3 3>' 5 5., §2); § - |j,w3 = (1, §2). All three eigenval- Solution. First we notice, that the matrix is orthogonal (rows are mutually orhogonal, and equivalently the same with columns). The matrix has the following eigenvalues and corresponding eigenvectors: 1, v1 = (0,1, ues have absolute value one, which together with the observation of orthogonality tells us that the matrix is a matrix of rotation. Its axis is given by the eigenvector corresponding to the eigenvalue 1, that is the vector (0,1, |). The plane of rotation is the real plane in R3, which is given by the intersection of two dimensional complex space in C3 generated by the remaining eigenvectors with R3. It is the plane Proof. Indeed, if / preserves lengths, it must have the claimed property in every orthonormal basis. On the other hand, the previous calculations show that this property for the matrix in one such basis ensures length preservation. □ Square matrices which satisfy the equality AT = A-1 are called orthogonal matrices. The shape of the coordinate transition matrices between orthonormal bases is a direct corollary of the above theorem. Each such matrix must provide a mapping K" —> K" which preserves lengths and thus satisfies the condition S*-1 = . When changing from one orthonormal basis to another one, the matrix of any linear mapping changes according to the relation A' = STAS. 2.4.7. Decomposition of an orthogonal mapping. We take a more detailed look at eigenvectors and eigenvalues of orthogonal mappings on a real vector space V with scalar product. Consider a fixed orthogonal mapping / : V —> V with the matrix A in some orthonormal basis. We continue as with the matrix D of rotation in 2.4.1. We think first about invariant subspaces of orthogonal mappings and their orthogonal complements. Namely, given any subspace W C V invariant with respect to an orthogonal mapping / : V —> V, then for all v e W1- and nelfwe immediately see (f(v),w) = (f(v), /o/-») = <«,/-»} = 0 since f~1(w) G W, too. But this means that also f(W±) C W1- and we have proved a simple but very important proposition: Proposition. The orthogonal complement of a subspace invariant with respect to an orthogonal mapping is also invariant. If all eigenvalues of an orthogonal mapping are real, this jji 11 claim ensures that there always exists a basis of V composed of eigenvectors. Indeed, the restriction of ~X f to the orthogonal complement of an invariant sub-space is again an orthogonal mapping, therefore we can add one eigenvector to the basis after another, until we obtain the whole decomposition of V. However, mostly the eigenvalues of orthogonal mappings are not real. We need to deviate into complex vector spaces. We formulate the result right away: 120 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA span{ (1,0, 0), (0, —4, 3)} (the first generator is the (real multiple of) v2 + v3, the other one is the (real multiple of) i(v2 — v3), see 2.4.7). We can determine the rotation angle in this plane, It is a rotation by the angle arccos(|) = 0,2957T, which is the argument of the eigenvalue | +1 i (or minus that number, if we would choose the other eigenvalue). It remains to determine the direction of the rotation. First, recall that the meaning of the direction of the rotation changes when we change the orientation of the axis (it has no meaning to speak of the direction of the rotation if we do not have an orientation of the axis). Using the ideas from the proof of the theorem 2.4.7, we see that the given matrix acts by rotating by arccos(|)) in the positive sense in the plane given by the basis ((1,0,0), (0, §)). The first vector of the basis is the imaginary part of the eigenvector associated with the eigenvalue | + |i, the second is then the (common) real part of the eigenvectors associated with the complex eigenvalues. The order of the vectors in the basis is important (by changing their order the meaning of the direction changes). The axis of rotation is perpendicular to the plane. If we orient using the right-hand rule (the perpendicular direction is obtained by taking the product of the vectors in the basis) then the direction of the rotation agrees with the direction of rotation in the plane with the given basis. In our case we obtain by the vector product (0,1,-1) x (1,1,-1) = (0,-1,-1). It is thus a rotation through arccos(|) in the positive sense about the vector (0, —1, —1), that is, a rotation through arccos(|) in the negative sense about the vector (0,1,1). □ 2.D.19. Determine what linear mapping is given by the matrix -1 3 -1 5 5 5 -8 9 2 5 5 5 8 -4 3 5 5 5 Solution. By already known method we find out that the matrix has the following eigenvalues and corresponding eigenvectors: 1, (1,2,0); | + fa, 1, (1,1 + - i); § -|i,(l,l — 2,-1 + i). Though all three eigenvectors have absolute value 1, they are not orthogonal to each other, thus the matrix is not orthogonal. Consequently it is not a matrix of rotation. Nevertheless, it is a linear mapping which is "close" to a rotation. It is a rotation in the plane given by two complex eigenvectors (but this plane is not orthogonal to the vector (1,2,0), but it is preserved by the map). It remains to Orthogonal mapping decomposition Theorem. Let f : V —> V be an orthogonal mapping on a real vector space V with scalar product. Then all the (in general complex) roots of the characteristic polynomial f have length one. There exists the decomposition of V into one-dimensional eigenspaces corresponding to the real eigenvalues A = ±1 and two-dimensional subspaces Px^x with A £ C\R, where f acts by the rotation by the angle equal to the argument of the complex number A in the positive sense. All these subspaces are mutually orthogonal. Proof. Without loss of generality we can work with the space V = Rm with the standard scalar product. The mapping is thus given by an orthogonal matrix A which can be equally well seen as the matrix of a (complex) linear mapping on the complex space Cm (which just happens to have all of its coefficients real). There exist exactly m (complex) roots of the characteristic polynomial of A, counting their algebraic multiplicities (see the fundamental theorem of algebra, 12.2.8). Furthermore, because the characteristic polynomial of the mapping has only real coefficients, the roots are either real or there are a pair of roots which are complex conjugates A and A. The associated eigenvectors in Cm for such pairs of complex conjugates are actually solutions of two systems of linear homogeneous equations which are also complex conjugate to each other - the corresponding matrices of the systems have real components, except for the eigenvalues A. Therefore the solutions of this systems are also complex conjugates (check this!). Next, we exploit the fact that for every invariant sub-space its orthogonal complement is also invariant. First we find the eigenspaces V±i associated with the real eigenvalues, and restrict the mapping to the orthogonal complement of their sum. Without loss of generality we can thus assume that our orthogonal mapping has no real eigenvalues and that dim V = 2n > 0. Now choose an eigenvalue A and let u\ be the eigenvector in C2n associated to the eigenvalue A = a + i(3, (3 =^ 0. Analogously to the case of rotation in the plane discussed in paragraph 2.4.1 in terms of the matrix D, we are interested in the real part of the sum of two one-dimensional (complex) subspaces W = span{uA} ffi span{uA}, where u\ is the eigenvector associated to the conjugated eigenvalue A. Now we want the intersection of the 2-dimensional com- plex subspace W with the real subspace ] which is clearly generated (over R) by the vectors u\ + u\ and i(u\ —u\). We call this real 2-dimensional subspace Px x C R2n and notice, this subspace is generated by the basis given by the real and imaginary part of u\ xx = Reux, -y\ = -ImuA. 121 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA determine the direction of the rotation. First, we should recall that the meaning of the direction of the rotation changes when we change the orientation of the axis (it has no meaning to speak of the direction of the rotation if we do not have an orientation of the axis). Using the same ideas as in the previous example, we see that the given matrix acts by rotating by arccos (|)) in the positive sense in the plane given by the basis ((1,1,-1), (0,1,). The first vector of the basis is the imaginary part of the eigenvector associated with the eigenvalue | + |i, the second is then the (common) real part of the eigenvectors associated with the complex eigenvalues. The order of the vectors in the basis is important (by changing their order the meaning of the direction changes). The "axis" of rotation is not perpendicular to the plane, but we can orient the vectors lying in the whole half-plane using the right-hand rule (the perpendicular direction is obtained by taking the product of the vectors in the basis) then the direction of the rotation agrees with the direction of rotation in the plane with the given basis. In our case we obtain by the vector product (0,1,-1) x (1,1,-1) = (0,-1,-1). It is thus a rotation through arccos(|) in the positive sense about the vector (0,-1,-1), that is, a rotation through arccos(|) in the negative sense about the vector (0,1,1). □ 2.D.20. Without any written computation determine the spectrum of the linear mapping / : R3 —> R3 given by (xi,x2,x3) l-> (xi +x3,x2,x1 +x3). O 2.D.21. Find the dimension of the eigenspaces of the eigenvalues Aj of the matrix /4 0 0 0\ 1 5 0 0 3/ Because A ■ (u\ + u\) = \u\ + \u\ and similarly with the second basis vector, it is clearly an invariant subspace with respect to multiplication by the matrix A and we obtain A ■ xx = axx + py\, A-yx = -ayx + fixx. Because our mapping preserves lengths, the absolute value of the eigenvalue A must equal one. But that means that the restriction of our mapping to Px x is the rotation by the argument of the eigenvalue A. Note that the choice of the eigenvalue A instead of A leads to the same subspace with the same rotation, we would just have expressed it in the basis x\, y\, that is, the same rotation will in these coordinates go by the same angle, but with the opposite sign, as expected. The proof of the whole theorem is completed by restricting the mapping to the orthogonal complement and finding another 2-dimensional subspace, until we get the required decomposition. □ We return to the ideas in this proof once again in chapter three, where we study complex extensions of the Euclidean vector spaces, see 3.4.4. Remark. The previous theorem is very powerful in dimen-sion three. Here at least one eigenvalue must be real ±1, since three is odd. But then the associated eigenspace is an axis of the rotation of the three-dimensional space through the angle given by the argument of the other eigenvalues. Try to think how to detect in which direction the space is rotated. Note also that the eigenvalue —1 means an additional reflection through the plane perpendicular to the axis of the rotation. O ./img/0163b.jpg We shall return to the discussion of such properties of matrices and linear mappings in more details at the end of the next chapter, after illustrating the power of the matrix calculus in several practical applications. We close this section with a general quite widely used definition: 122 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA Spectrum of linear mapping 2.4.8. Definition. The spectrum of a linear mapping f : V —> V, or the spectrum of a square matrix A, is a sequence of roots of the characteristic polynomial / or A, along with their multiplicities, respectively. The algebraic multiplicity of an eigenvalue means the multiplicity of the root of the characteristic polynomial, while the geometric multiplicity of the eigenvalue is the dimension of the associated subspace of eigenvectors. The spectral diameter of a linear mapping (or matrix) is the greatest of the absolute values of the eigenvalues. In this terminology, our results about orthogonal mappings can be formulated as follows: the spectrum of an orthogonal mapping is always a subset of the unit circle in the complex plane. Thus only the values ±1 may appear in the real part of the spectrum and their algebraic and geometric multiplicities are always the same. Complex values of the spectrum then correspond to rotations in suitable two-dimensional sub-spaces which are mutually perpendicular. 123 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA E. Additional exercises for the whole chapter 2.E.I. Kirchhoff's Circuit Laws. We consider an application of Linear Algebra to analysis of electric circuits, using Ohm's law and Kirchhoff's voltage and current laws. Consider an electric circuit as in the figure and write down the values of the currents there if you know the values Vx = 20, 14 = 120, 14 = 50, i?i = 10, R2 = 30, R3 = 4, RA = 5, R5 = 10, Notice that the quantities J, denote the electric currents, while Rj are resistances, and 14 are voltages. Solution. There are two closed loops, namely ABEF and EBCD and two branching vertices B and E of degree no less than 3. On every segment of the circuit, bounded by branching points, the electric current is constant. Set it to be I\ on the segment EFAB, I2 on EB, and I3 on BCDE. Applying Kirchhoff's current law to branching points B and E we obtain: I\ + I2 = I3 and I3 — I\ = I2, which are, of course the same equations. In case there are many branching vertices, we write all Kirchhhoff's Current Law equations to the system, having at least one of those equations redundant. Choose the counter clockwise orientations of the loops ABEF and EBCD. Applying Kirchhoff Voltage Law and Ohm's Law to the loop ABEF we obtain the equation: 14 + hR3 - I2R5 + V3 + hRi + hRi = 0. Similarly, the loop EBCD implies -14 + I3R2 -V3 + R5I2 = 0. By combining all equations, we obtain the system h + h - I3 = 0, (Rs + Ri + R^h - R5I2 + = -14-14, Rsh + R2I3 = 14 + 14-Substituing the prescribed values we obtain the linear system h + h - I3 = 0, 19/i - 10/2 + = -70, IO/2 + 30/3 = 170. This has solutions h =-§§ « -1.509, I2 = ^ « 4.132, 73 = ^^ 2.623. □ 2.E.2. The general case. In general, the method for electrical circuit analysis can be formulated along the following steps: i) Identify all branching vertices of the circuit, i.e vertices of degree no less than 3; ii) Identify all closed loops of the circuit; iii) Introduce variables Ik, denoting oriented currents on each segment of the circuit between two branching vertices; 124 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA iv) Write down Kirchhoff's current conservation law for each branching vertex. The total incoming current equals the total outgoing current; v) Choose an orientation on every closed loop of the circuit and write down Kirchhoff's voltage conservation law according to the chosen orientation. If you find an electric charge of voltage Vj and you go from the short bar to the long bar, the contribution of this charge is Vj. It is — Vj if you go from the long bar to the short one. If you go in the positive direction of a current I and find a resistor with resistance Rj, the contribution is —Rjl, and it is Rjl if the orientation of the loop is opposite to the direction of the current I. The total voltage change along each closed loop must be zero. vi) Compose the system of linear equations collecting all equations, representing Kirchhoff's current and voltage laws and solve it with respect to the variables, representing currents. Notice that some equations may be redundant, however, the solution should be unique. To illustrate this general approach, consider the circuit example in the diagram. Solution. i) The set of branching vertices is {B, C, F, G, H}. ii) The set of closed loops is {ABHG, FHBC, GHF, CDEF}. iii) Let I\ be the current on the segment GAB, I2 on the segment GH, I3 on the segment HB, J4 on the segment BC, I5 on the segment FC, on the segment FH, I? on GF, and Is on CDEF. iv) Write Kirchhoff's current conservation laws for the branching vertices: • vertex B: I\ + I3 = I4 • vertex C: I4 + 15 = J§ • vertex F: I8 = I5 + I6 — I7 • vertex G: — I7 = I\ +I2 • vertex H: I2 + Iq = I3 v) Write Kirchhoff's voltage conservation for each of the closed loops traversed counter-clockwise: • loop ABHG: -RJ2 + V3 + R2h - V2 = 0 • loop FHBC: V4 + R3h - V3 = 0 • loop GHF: RJ2 - V1 = 0 • loop CDEF: R4IS - V4 = 0 Set the parameters: Rx =4, R2 = 7, R3 = 9, RA = 12, Vx = 10, V2 = 20, , V3 = 60, , VA = 120, to obtain the system h+h-h=Q h + k- hi = 0 h+h-h-h=Q h+h+h=Q 125 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 12 - I3 + I6 = 0 77i - 4J2 = -40 9J4 = -60 4J2 = 10 12 J8 = 120 with the solution set h = J2 = f, J3 = J4 = J5 = f, 4 = ^, ^ = f, J8 = 10. □ 2.E.3. Solve the system of equations xx + x2 + x3 + x4 — 2x5 = 3, 2a;2 + 2x3 + 2x4 - 4x5 = 5, —xi — x2 — x3 + x4 + 2x5 = 0, —2xi + 3x2 + 3x3 — 6x5 = 2. Solution. The extended matrix of the system is / 1 1 1 1 -2 3 \ 0 2 2 2 -4 5 -1 -1 -1 1 2 0 V -2 3 3 0 -6 2 / Adding the first row to the third, adding its 2-multiple to the fourth, and adding the (—5/2)-multiple of the second to the fourth we obtain / 1 1 1 1 -2 3 \ / 1 1 1 1 -2 3 \ 0 2 2 2 -4 5 0 2 2 2 -4 5 0 0 0 2 0 3 0 0 0 2 0 3 V 0 5 5 2 -10 8 / V 0 0 0 -3 0 -9/2 J The last row is clearly a multiple of the previous, and thus we can omit it. The pivots are located in the first, second and fourth. Thus the free variables are X3 and X5 which we substitute by the real parameters t and s. Thus we consider the system xi + X2 + t + X4 — 2s = 3, 2x2 + 2t + 2x4 - 4s = 5, 2x4 = 3. We see that X4 = 3/2. The second equation gives 2x2 + 2f + 3 - 4s = 5, that is, x2 = l- t + 2s. From the first we have xi + l-i + 2s + i + 3/2-2s = 3, tj. xx = 1/2. Altogether, (xi, x2, x3, x4, x5) = (1/2, 1— t+2s, t, 3/2, s), t,s G R. Alternatively, we can consider the extended matrix and transform it using the row transformations into the row echelon form. We arrange it so that the first non-zero number in every row is 1, and the remaining numbers in the column containing this 1 are 0. We omit the fourth equation, which is a combination of the first three. Sequentially, multiplying the second and 126 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 1111-2 the third row by the number 1/2, subtracting the third row from the second and from the first and by subtracting the second row from the first we obtain 5/2 ) ~ 3/2j 3/2 J 0 2 2 0 0 0 1 1 1 0 110 V 0 0 0 1 0 If we choose again 23 1111-2 0 111-2 0 0 0 1 0 10 0 0 0 110 0 0 0 1 0 0 s (t,s e K), we obtain the general solution (2.E.3) as above. □ 2.E.4. Find the solution of the system of linear equations given by the extended matrix / 3 3 2 2 1 1 0 5-4 \ 5 3 3 3\ 4 1 5 ) 3 \ / 3 3 2 1 3 \ 4 0 -3 -1 -2 6 1 0 5 -4 3 1 5 / V 0 6 1 14 0 / Solution. We transform the given extended matrix into the row echelon form. We first copy the first three rows and into the last row we write the sum of the (2)-multiple of the first and of the (—3)-multiple of the last row. By this we obtain / 3 3 2 1 2 11 0 0 5-43 \ 5 3 3 -3 Copying the first two rows and adding a 5-multiple of the second row to the 3-multiple of the third and its 2-multiple to the fourth gives / 3 3 2 0 -3 -1 0 5-4 \ 0 6 1 Copying the first, second and fourth row, and adding the fourth to the third, yields 1 3 \ / 3 3 2 1 3 \ -2 6 0 -3 -1 -2 6 3 1 0 0 -17 -1 33 14 0 / V 0 0 -1 10 12 / (3 3 2 1 3 \ (3 3 2 1 3 \ 0 -3 -1 -2 6 0 -3 -1 -2 6 0 0 -17 -1 33 0 0 -18 9 45 V 0 0 -1 10 12 / V 0 0 -1 10 12 / With three more row transformations, we arrive at / 3 3 2 1 3 ^ { 3 3 2 1 3 ^ 0 -3 -1 -2 6 0 -3 -1 -2 6 0 0 -18 9 45 0 0 2 -1 -5 V 0 0 -1 10 12 j 0 0 1 -10 -12 j / 3 3 2 1 3 (3 3 2 1 3 \ 0 -3 -1 -2 6 0 -3 -1 -2 6 0 0 1 -10 -12 0 0 1 -10 -12 V 0 0 2 -1 -5 ) ^ 0 0 0 19 19 / The system has exactly 1 solution. We determine it by backwards elimination (3 3 2 1 3 N \ í 3 3 2 0 2 ^ 0 -3 -1 -2 6 0 -3 -1 0 8 0 0 1 -10 -12 0 0 1 0 -2 V 0 0 0 1 1 / { 0 0 0 1 1 ) 3 0 0 6 / ' 1 1 0 0 2 ^ /I 0 0 0 4 \ 0 -3 0 0 6 0 1 0 0 -2 0 1 0 0 -2 0 0 1 0 -2 0 0 1 0 -2 0 0 1 0 -2 V 0 0 0 1 1 ) v 0 0 0 1 1 ) V 0 0 0 1 1 / 127 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA The solution is x1 = 4, x2 =-2, x3 =-2, x4 = 1. □ 2.E.5. Find all the solutions of the homogeneous system x + y = 2z + v, z + Au + v = 0, — 3u = 0, z = —v of four linear equations with 5 variables x, y, z, u, v. Solution. We rewrite the system into a matrix such that in the first column there are coefficients of x, in the second there are coefficients of y, and so on. We put all the variables in equations to the left side. By this, we obtain the matrix /l 1 -2 0 0 1 0 0 0 \0 0 1 We add (4/3)-multiple of the third row to the second and subtract then the second row from the fourth to obtain /l 1 —2 0 -1\ /l 1 —2 0 -1\ 0 0 1 4 1 0 0 1 0 1 000 -3 0 ~000 -3 0' \0 0 1 0 1/ \0 0 0 0 0/ 0 -1\ 4 1 -3 0 0 1/ We multiply the third row by the number —1/3 and add the 2-multiple of the second row to the first, which gives /l 1 -2 0 0 1 0 0 0 \0 0 0 0 -1\ 0 1 -3 0 0 0 / /l 1 0 0 1\ 0 0 10 1 0 0 0 1 0 \0 0 0 0 0/ From the last matrix, we get immediately (reading from bottom to top) u = 0, z + v = 0, x + y + v = 0. Letting v = s and y = t, the complete solution is (x, y, z, u, v) = (-i - s, i, -s, 0, s). which can be rewritten as fx\ y t,s e z u w = t i o o Vo/ + s o -1 o V 1 / t,s e'. Notice that the second and the fifth column of the matrix together form a basis for the solutions. These are the columns which do not contain a leading 1 in any of its entries. □ 2.E.6. Determine the number of solutions for the systems (a) 12x1 + x\ \/5x2 + + Ux3 5x3 2x3 -9, -9, -7; (b) 4xi + 2x2 - 12x3 = 0, 5a;i + 2x2 ^3 = 0, 2xi - x2 + 6X3 = 4; 128 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA (c) 4xi + 2x2 - 12x3 = 0, 5xi + 2x2 — x3 = 1, —2xi — X2 + 6x3 = 0. Solution. The vectors (1,0, —5), (1,0,2) are clearly linearly independent, (they are not multiples of each other) and the vector (12, \/5,11) cannot be their linear combination (its second coordinate is non-zero). Therefore the matrix whose rows are these three linearly independent vectors (from the left side) is invertible. Thus the system for case (a) has exactly one solution. For cases (b) and (c), it is enough to note that (4,2,-12) = -2(-2,-1,6). In case (b) adding the first equation to the third multiplied by two gives 0 = 8, hence there is no solution for the system. In case (c) the third equation is a multiple of the first, so the system has infinitely many distinct solutions. □ 2.E.7. Find a linear system, whose set of solutions is exactly {(t + 1, 2t, 3t, At); t e R}. Solution. Such a system is for instance 2xi — X2 = 2, 2x2 — X4 = 0, 4x3 — 3x4 = 0. These solutions are satisfied for every t e R. The vectors (2,-1,0,0), (0,2,0,-1), (0,0,4,-3) giving the left-hand sides of the equations are linearly independent (the set of solutions contains a single parameter). □ 2.E.8. Solve the system of homogeneous linear equations given by the matrix /0 V2 V3 V& 0 \ 2 2^-2 -y^ 0 2^ 2V3 -V3 ' \3 3 -3 0 / o 2.E.9. Determine all solutions of the system X2 + X4 = 1, 2x2 - 3x3 + 4x4 = -2, x2 X3 + X4 = 2, - £3 = 1. O 2.E.10. Solve 3x - 5y + 2u + Az = 2, 5x + 7y - Au - 6z = 3, 7x - Ay + + 3z = 4, X + 6y - 2u - 5z = 2 o 2.E.11. Determine whether or not the system of linear equations 3xi + 3x2 + x3 = 1, 2xi + 3x2 — x3 = 8, 2xi — 3x2 + x3 = 4, 3xi — 2x2 + 23 = 6 of three variables x1, x2, x3 has a solution. O 129 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.E.12. Determine the number of solutions of the system of 5 linear equations AT-x = (1,2,3,4,5)T, where /3 1 7 5 0\ x = (Xl,x2,x3)T and A= 0 0 0 0 1 . \2 1 4 3 0/ Repeat the question for the system AT-x = (1,1,1,1,1)T 2.E.13. Depending on the parameter aeR, determine the solution of the system of linear equations ax i + 4x2 +2 x3 = 0, 2xi + 3x2 — x3 = 0. 2.E.14. Depending on the parameter aeR, determine the number of solutions of the system (2\ 5 3 V-3/ O o í4 1 4 a\ M 2 3 6 8 X2 3 2 5 4 \6 -1 2 -8/ \xA) o 2.E.15. Decide whether or not there is a system of homogeneous linear equations of three variables whose set of solutions is exactly (a) {(0,0,0)}; (b) {(0,1,0), (0,0,0), (1,1,0)}; (c) {(x, 1,0); x e R}; (d) {(x,y,2y); x,y G R}. O 2.E.16. Solve the system of linear equations, depending on the real parameters a, b. x + 2y + bz = a x — y + 2z = 1 3a; — y = 1. 2.E.17. Using the inverse matrix, compute the solution of the system %i + x2 + x3 + x4 = 2, x\ + x2 — x3 — xA = 3, x\ — x2 + x3 — x4 = 3, X\ — X2 — X3 + X4 = 5. o o 130 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.E.18. For what values of parameters a.iel has the system of linear equations x\ — ax2 — 2x3 = b, x\ + (1 — a)X2 = b — 3, x\ + (1 — a)X2 + ax3 = 2b — 1 (a) exactly one solution; (b) no solution; (c) at least 2 solutions? (i.e. infinitely many solutions) Solution. We rewrite it, as usual, in the extended matrix, and transform: At the first step we subtract the first row from the second and the third; and at the second step we subtract the second from the third. We see that the system has a unique solution (determined by backward elimination) if and only if a ^ 0. If a = 0 and b = —2, we have a zero row in the extended matrix. Choosing x3 e K as a parameter then gives infinitely many distinct solutions. For a = 0 and b ^ —2 the last equation a = b + 2 cannot be satisfied and the system has no solution. Note that for a = 0, b = —2 the solutions are (xi, x2, x3) = (-2 + 2t, -3 - 2t, t), feR and for a =^ 0 the unique solution is the triple '-3a2-a&-4a + 2& + 4 2& + 3a + 4 b + 2\ ' ' J ■ a a a J □ & = Find real numbers bi,b2,b3 such that the system of linear equations A - x = b has: (a) infinitely many solutions; (b) unique solution; (c) no solution; (d) exactly four solutions. Solution. Since the first row is the sum of the other two, it is enough to choose b\ = b2 + b3 in case a) and b1 ^ b2 + b3 in case c). Variant b) cannot occur, since the matrix A is not invertible. As long as we work over reals, there cannot be any finite number of solutions, except zero or one. Thus d) is impossible. □ 2.E.20. Factor the following permutations into a product of transpositions: . (1 2 3 4 5 6 7^ l) \7 6 5 4 3 2 1 . (I 2 3 4 5 6 7 8 U> \6 4 1 2 5 8 3 7 A 2 3 4 56789 10 m> U 6 1 10 2 5 9 8 3 7 131 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA O i) ii) iii) 2.E.21. Determine the parity of the given permutations: '\ 2 3 4 5 6 7^ 7 5 6 4 1 2 3, 1 2 3 4 5 6 7 8^ 6 7 1 2 3 8 4 5, 123 4 56789 10 971 10 25493 6 O 2.E.22. Find the algebraically adjoint matrix F* for fa ß 0\ F = 7 5 0 , q, ß,j,S £ R. \0 0 1/ 2.E.23. Calculate the algebraically adjoint matrix for the matrices /3 -2 0 -1\ 0 2 2 1 (b) O (a) 1 -2 -3 -2 \0 1 2 1 / where 2 denotes the imaginary unit. 1 + 2 22 3-2i 6 o 2.E.24. Is the set V = {(1, x); a; £ R} with operations (B:VxV^V, (l,y) ffi (l,z) = (l,z + y) for all ©:RxV^K « ©(l,y) = (l,y«) for all - a vector space? O 2.E.25. Express the vector (5,1,11) as a linear combination of the vectors (3,2,2), (2,3,1), (1,1,3), that is, find numbers p,q,r £ R, for which (5,1,11) = p (3, 2, 2) + g (2, 3,1) + r (1,1, 3). O 2.E.26. In R3, determine the matrix of rotation through the angle 120° in the positive sense about the vector (1, 0,1) O 2.E.27. In the vector space R3, determine the matrix of the orthogonal projection onto the plane x + y — 2z = 0. O 2.E.28. In the vector space R3, determine the matrix of the orthogonal projection on the plane 2x — y + 2z = 0. O 2.E.29. Determine whether the subspaces U = ((2,1, 2,2)) and V = ((-1,0, -1,2), (-1,0,1,0), (0,0,1, -1)) of the space R4 are orthogonal. If they are, is R4 = U ffi V, that is, is U± = V1 O 2.E.30. Let p be a given line: p: [1,1] + (4,l)i, t £ R Determine the parametric expression of all lines q that pass through the origin and have deflection 60° with the line p. Q 132 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.E.31. Depending on the parameter ( 6 8, determine the dimension of the subspace U of the vector space R3, if U is generated by the vectors (a) u± = (1,1,1), u2 = (l,t,l), u3 = (2,2,t); (b) wi = (t, t, t), u2 = (-At,-At, At), u3 = (-2, -2, -2). o 2.E.32. Construct an orthogonal basis of the subspace ((1,1,1,1), (1,1,1,-1), (-1,1,1,1)) of the space R4. O 2.E.33. In the space R4, find an orthogonal basis of the subspace of all linear combinations of the vectors (1, 0,1,0), (0,1,0,-7), (4,-2,4,14). Find an orthogonal basis of the subspace generated by the vectors (1, 2,2, —1), (1,1, —5, 3), (3,2,8, —7). O 2.E.34. For what values of the parameters a, b G R are the vectors (1,1,2,0,0), (1,-1,0,1, a), (l,&,2,3,-2) in the space R5 pairwise orthogonal? O 2.E.35. In the space R5, consider the subspace generated by the vectors (1,1,-1,-1,0), (1, —1, —1,0, —1), (1,1,0,1,1), (—1,0, —1,1,1). Find a basis for its orthogonal complement. O 2.E.36. Describe the orthogonal complement of the subspace V of the space R4, if V is generated by the vectors (—1,2,0,1), (3,1, -2, 4), (-4,1, 2, -4), (2, 3, -2, 5). O 2.E.37. In the space R5, determine the orthogonal complement W1- of the subspace W, if (a) W = {(r + s + t, -r + t, r + s, -t, s + t); r, s, t G R}; (b) W is the set of the solutions of the system of equations x\ — x3 = 0, x\ — x2 + x3 — x4 + x5 = 0. o 2.E.38. In the space R4, let (1,-2,2,1), (1,3,2,1) be given vectors. Extend these two vectors into an orthogonal basis of the whole R4. (You can do this in any way you wish, for instance by using the Gram-Schmidt orthogonalization process.) O 2.E.39. Define an inner product on the vector space of the matrices from the previous exercise. Compute the norm of the matrix from the previous exercise, induced by the product you have denned. O 2.E.40. Find a basis for the vector space of all antisymmetric real square matrices of the type 4x4. Consider the standard inner product in this basis and using this inner product, express the size of the matrix / 0 3 1 0\ -3012 -1-10 2 \0 -2-2 0/ O 133 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.E.41. Find the eigenvalues and the associated eigenspaces of eigenvectors of the matrix: Solution. The characteristic polynomial of the matrix is A3 — 6A2 + 12A — 8, which is (A — 2)3. The number 2 is thus an eigenvalue with algebraic multiplicity three. Its geometric multiplicity is either one, two or three. We determine the vectors associated to this eigenvalue as the solutions of the system -x1 +x2 = 0, (A - 2£)x = -xx +x2 = 0, 2xi -2x2 = 0. Its solutions form the two-dimensional space ((1, —1, 0), (0,0,1)}. Thus the eigenvalue 2 has algebraic multiplicity 3 and geometric multiplicity 2. □ 2.E.42. Determine the eigenvalues of the matrix (-13 0 -30 V-12 5 4 2\_ -10 0 12 9 5 4 1/ 6 O 2.E.43. Given that the numbers 1, —1 are eigenvalues of the matrix /-ll 5 4 A = 1\ 0 -3 0 1 -21 11 8 2 \-9 5 3 1/ find all solutions of the characteristic equation A — A E =0. Hint: if you denote all the roots of the polynomial A — A E\ by Ai,A2,A3,A4,then ^4 | = Ai ■ a2 ■ a3 ■ a4, and trA = Ai + a2 + a3 + a4. o 2.E.44. Find a four-dimensional matrix with eigenvalues Ai = 6 and A2 = 7 such that the multiplicity of A2 as a root of the characteristic polynomial is three, and that (a) the dimension of the subspace of eigenvectors of \2 is 3; (b) the dimension of the subspace of eigenvectors of A2 is 2 (c) the dimension of the subspace of eigenvectors of A2 is 1 O 2E.45. Find the eigenvalues and the eigenvectors of the matrix: 2.E.46. Determine the characteristic polynomial A — A E |, eigenvalues and eigenvectors of the matrix 134 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 4 -1 6 2 1 6 2 -1 8 respectively. O 135 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA Solutions to the exercises 2 A.ll. There is only one such matrix X, and it is -32 -8 1 10 -4 1 12 -5 0 5 -2 Ab = A-3 = '14 13 -13> 13 14 13 v 0 0 27 y 2.A./5. 2A.16. CT í2 -3 0 0 -5 8 0 0 0 0 0 -1 0 0 0 0 0 -5 2 \o 0 0 3 -V 0 1 -1 0 \1 -1 -1 2A.17. In the first case we have 1 -1 0 yi-1 = in the second 2.C.9. (2 + ^,2-^). 2.C.10. The vectors are dependent whenever at least one of the conditions a = b = 1, a = c = 1, b = c = 1 is satisfied. 2.C.11. Vectors are linearly independent. 2.C.12. It suffices to add for instance the polynomial x. 2D.5. cos = 2.D./2. Je I ^ - A E \ = —A3 + 12A2 - 47A + 60,. Ai = 3, A2 = 4, As = 5. 2D.20. The solution is the sequence 0,1, 2. 2D.21. The dimension is 1 for Ai = 4 and 2 for A2 = 3. 2.E.8. The solutions are all scalar multiples of the vector (l + v^, -V3, 0, 1, 0) 2.E.9. XI = 1 + t, X2 = |, X3 = t, : 2.E.10. The system has no solution. 2.E.11. The system has a solution, because re: 2 2 w r3^ 3 -3 V-2/ -1 1 \1/ 8 4 w 136 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA + 2x3 = 1, + X3 = 2, + 4x3 = 3, + 3X3 = 4, = 5 2.E.12. The system of linear equations 3xi xi 7xi 5xi X2 has no solution, while the system 3xi + 2x3 = 1, XI + X3 = 1, 7x1 + 4x3 = 1, 5xi + 3x3 = 1, X2 =1 has a unique solution xi = —1, x2 = 1, x3 = 2. 2.E.13. The set of all solutions is given by {(-10t, (a + 4)t, (3a - 8)t) ; t e R}. 2.E.14. For a = 0, the system has no solution. For a ^ 0 the system has infinitely many solutions. 2.E.15. The correct answers are „yes", ,410", „no" and „yes" respectively. 2.E.16. i) If b ± -7, then x = z = (2 + a)/(b + 7), y = (3a - b - l)/(b + 7). ii) If b = -7 and a ± -2, then there is no solution, ill) If a = — 2 and b = —7 then the solution is x = z = t, y = 3t — 1, for any t. 2.E.17. (1 1 1 1 1\ 1 -1 -1 1-11-1 \1 -1-1 1/ We can then easily obtain 13 3 XI = -, X2 =--, X3 =--, X4 = —. 4 4 4 4 2.E.20. i) (1,7)(2,6)(5,3), ii) (1, 6)(6, 8)(8,7)(7, 3)(2,4), ill) (1,4)(4,10)(10, 7)(7, 9)(9, 3)(2, 6)(6, 5) 2.E.21. i) 17 inversions, odd, ii) 12 inversions, even iii) 25 inversions, odd 2.E.22. From the knowledge of the inverse matrix F~x we obtain 1 1 1\ 1 1 1 -1 -1 1 -1 1 -1 V -1 -1 1 / 3 1 F* (aS - ßi) F'1 -ß for any a, j3, 7, 8 e R. 2.E.23. The matrices are (a) ŕ1 0 -4\ -1 6 (b) 6 -3 + 2i -2i 1+i \ 2 1-6 -10/ 2.E.24. It is easy to check that it is a vector space. The first coordinate does not affect the results of the operations - it is just the vector space (R, +, •) written in a different way. 2.E.25. There is a unique solution p = 2, q = -2, r = 3. 2.E.26. 2.E.27. 1 1/4 3/4 > VE/A -1/2 y 3/4 V6/A 1/4 j I 5/6 -1/6 l/3\ -1/6 5/6 1/3 . V 1/3 1/3 1/3/ 137 CHAPTER 2. ELEMENTARY LINEAR ALGEBRA 2.E.28. I 5/9 2/9 -4/9\ 2/9 8/9 2/9 \-4/9 2/9 5/9 / 2.E.29. The vector that determines the subspace U is perpendicular to each of the three vectors that generate V. The subspaces are thus orthogonal. But it is not true that R4 = U ffi V. The subspace V is only two-dimensional, because (-1,0, -1, 2) = (-1,0,1,0) - 2 (0,0,1, -1). 2.E.30. qi : (2 - V3 2 ,2^3+ ±)t, g2: (2 + ^,-273 + 1)*. 2.E.31. In the first case we have dim U = 2 for t e {1, 2}, otherwise we have dim U = 3. In the second case we have dim U = 2 for t # 0 and dim U = 1 for t = 0. 2.E.32. Using the Gram-Schmidt orthogonalization process we can obtain the result ((1,1,1,1), (1,1,1,-3), (-2,1,1,0)). 2.E.33. We have for instance the orthogonal bases ((1,0,1,0), (0,1,0,-7)) for the first part, and ((1, 2, 2, -1), (2, 3, -3, 2), (2, -1, -1, -2)). for the second part. 2.E.34. The solution is a = 9/2, b = —5, because 1 + 6 + 4 + 0 + 0 = 0, 1-6 + 0 + 3- 2a = 0. 2.E.35. The basis must contain a single vector. It is (3,-7,1,-5,9). (or any non-zero scalar multiple thereof) 2.E.36. The orthogonal complement V± is the set of all scalar multiples of the vector (4, 2, 7,0). 2.E.37. (a) W1- = <(1,0, -1,1,0), (1, 3,2,1, -3)); (b) Wx = {(1,0,-1,0,0), (1,-1,1,-1,1)). 2.E.38. There are infinitely many possible extensions, of course. A very simple one is (1,-2,2,1), (1,3,2,1), (1,0,0,-1), (1,0,-1,1). 2.E.39. For instance, one can use the inner product that follows from the isomorphism of the space of all real 3x3 matrices with the space R9. If we use the product from R9, we obtain an inner product that assigns to two matrices the sum of products of two corresponding elements. For the given matrix we obtain = Vl2 + 22 + O2 + O2 + 22 + O2 + l2 + (-2)2 + (-3)2 = V23. 2.E.40. 2.E.42. The matrix has only one eigenvalue, namely —1, since the characteristic polynomial is (A + l)4 2.E.43. The root —1 of the polynomial | A — A E \ has multiplicity three. 2.E.44. Possible examples are, (a) 0 0 °\ /6 0 0 0 7 0 0 (b) 0 7 1 0 0 0 7 0 0 0 7 0 \o 0 0 V 0 0 V 0 0 °\ (c) 0 7 1 0 0 0 0 7 0 1 V 138 _CHAPTER 2. ELEMENTARY LINEAR ALGEBRA_ 2.E.45. There is a triple eigenvalue —1. The corresponding eigenspace is {(1,0,0), (0, 2,1)). 2.E.46. The characteristic polynomial is — (A — 2)2 (A — 9), that is, the eigenvalues are 2 and 9 with associated eigenvectors (1,2,0), (-3,0,1) a (1,1,1) 139 CHAPTER 3 Linear models and matrix calculus where are the matrices useful? - basically almost everywhere.. A. Linear optimization Let us start with an example of a very simple problem: 3.A.I. A company manufactures bolts and nuts. Nuts and bolts are moulded - moulding a box of bolts takes one minute, a box of nuts is moulded for 2 minutes. Preparing the box itself takes one minute for bolts, 4 minutes for nuts. The company has at its disposal two hours for moulding and three hours for box preparation. Demand says that it is necessary to manufacture at least 90 boxes of bolts more than boxes of nuts. Due to technical reasons it is not possible to manufacture more than 110 boxes of bolts. The profit from one box of bolts is $4 and the profit from one box of nuts is $6. The company has no trouble with selling. How many boxes of nuts and bolts should be manufactured in order to have maximal profit? Solution. Write the given data into a table: Bolts 1 box Nuts 1 box Capacity Mould 1 min ./box 2 min ./box 2 hours Box 1 min ./box 4 min ./box 3 hours Profit $4/box $6/box We have already developed a useful package of tools and it is time to show some applications of matrix calculus. The first three parts of this chapter are independent and the readers more interested in the theory might skip any them and continue with the fourth part straight ahead. It might seem that the assumption of linearity of relations between quantities is too restrictive. But this is often not so. In real problems, linear relations may appear directly. A problem may be solved as a result of an iteration of many linear steps. If this is not the case, we may still use this approach at least to approximate real non-linear processes. We should also like to compute with matrices (and linear mappings) as easily as we can compute with scalars. In order to do that, we prepare the necessary tools in the second part of this chapter. We also present a useful application of matrix decompositions to the pseudoinverse matrices, which are needed for numerical mastery of matrix calculus. We try to illustrate all the phenomena with rather easy problems. Still some parts of this chapter are perhaps difficult for first reading. This in particular concerns the very first part providing some glimpses towards the linear optimization (linear programming), and the third part devoted to iterated processes (the Frobenius-Perron theory). The rest of the chapter comes back to some more advanced parts of the matrix calculus (the Jodan canonical form, decompositions, and pseudo-inverses of matrices). The reader should feel free to move forward if getting lost. 1. Linear optimization The simplest linear processes are given by linear mappings ip : V —> W on vector spaces. As we can surely imagine, the vector v e V can represent the state of some system we are observing, while ip(v) gives the result after some process is realized. If we want to reach a given result b e W of such a process, we solve the problem x2 + 90 xx < 110 The objective function (the function that gives the profit for given number of manufactured nuts and bolts) is 4xx+Qx2. The previous system of inequalities defines a region in R2. Optimisation of the profit means finding in this region the point (points) in which the objective function has the maximum value, that is, to find the largest k such that the line 4a;i + Qx2 = k has a non-empty intersection with the given region. Graphically, we can find the solution for example by placing the line p into the plane such that it satisfies the equation 4a;i + Qx2 = 0 and start moving it "upwards" as long as it has some intersection with the area. It is clear that the last intersection is either a point or a line parallel to p forming a border of the region. Thus we obtain (see the figure) the point a;i = 110 and x2 = 5. Maximum possible income is thus 4- 110 + 6- 5 = $470. . 1 Jo ho Ao Hi -9 -% xfjj i"' X,M10 □ 3A.2. Minimisation of costs for feeding. A stable in Nisovice u Volyne buys fodder for winter: hay and oats. The draw more interesting conclusions in the setup of linear optimization models (called also linear programming). 3.1.1. Linear optimization. In the practical column, the previous chapter started with a painting problem, and we shall continue here in a similar way. Imagine that our very specialized painter in a black&white world is willing to paint facades of either small family houses or of large public buildings, and that he (of course) uses only black and white colours. He can arbitrarily choose proportions between x\ units of area for the small houses or x2 units for the large buildings. Assume that his maximal workload in a given interval is L units of area, his net income (that is, after subtracting the costs) is c\ per unit of area for small houses and c2 per unit of area for large buildings. Furthermore, he has only W kg of white colour and B kg of black colour at his disposal. Finally, a unit of area for small houses requires wx kg of white colour and b\ kg of black colour. For large buildings the corresponding values are w2 and b2. If we write all this information as inequalities, we obtain the conditions (1) (2) (3) xx + x2 < L WxXx + w2x2 < W bxxx + b2x2 < B. The total net income of the painter, which is the following linear form h, h(xx, x2) = cxxx + c2x2, is to be maximized. Each of the given inequalities clearly determines a half-plane in the plane of the variables (x x, x 2), bounded by a line given by the corresponding equality, and we must also assume that both xx and x2 are non-negative real numbers (because the painter cannot paint negative areas). Thus we have constraints for the values (xx, x2) - either the constraints are un-S^SdbTcaaoi1^™ satisfiable, or they allow points inside a polygon with at most ^tomtamttiuefOTh best through one of the vertices with hand-written description "optimal constant value of h five vertices. See the diagram. -L i J»|X11-lozX^3 141 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS nutritional values of the fodder and required daily portions for one foal are given in the table: g/kg Hay Oats Requirements Dry basis 841 860 > 6300 g Digestible nitrogen stuff 53 123 > 1150g Starch 0.348 0.868 < 5.35 g Calcium 6 1.6 >30g Phosphate 2.8 3.5 <44g Natrium 0.2 1.4 ~7g Cost 1.80 1.60 Every foal must obtain in its daily meal at least 2 kg of oats. The average cost (counting the payment for the transportation) is €1.80 per 1kg of hay and €1.60 per 1kg of oats. Compose a daily diet for one foal which has minimum costs. O The previous three examples could be solved by drawing the diagram and checking all the vertices on the boundary of the polygonal area M C R2. Morevoer, we know that the maximum will be at one of the extremes in the direction of the normal to the denning line from the linear cost function h. But the principle works in higher dimensions as well. If there is a function / : Rn -> Rn, f{xu ...,xn) = c0 + c1x1 + ■ ■ ■ + cnxn (we call it the objective function), its values in the points A = (ai,..., an) and B = A + u = (a1 + ui,..., an + un) differ by f(A — B) = f(u) = f(ui,..., un) = c\U\ + ■ ■ ■ + cnun, which is the scalar product of the vectors (ci,..., c„) and (wi,..., w„). The relation between the scalar product and the cosine of the angle between vectors ensures that the given function / defines a hy-persurface in Rn with normal (ci,..., c„). This hypersurface splits the space Rn into two half-spaces. Clearly the given function grows if moving towards one of those half-spaces and declines in the other one. This is essentially the same principle as we saw when discussing the visibility of segments in dimension 2 (we checked whether the observer is to the left or to the right from the oriented segment, cf. 1.5.12). This observation leads to an algorithm for finding the extremal values of the linear objective function / on the set M of admissible points denned by linear inequalities. We shall deal with the standard problem of linear programming. That is, we want to maximize the linear function h = c1x1 + ■ ■ ■ + cnxn on the set M given by Ax < b and x > 0 (here the inequality between vectors means the inequality between all their individual components). As explained in 3.1.6 we may add slack variables xs, one for each equation. How to solve such a problem? We seek the maximum value of a linear form h over subsets M of a vector space which are defined by linear inequalities. In the plane, M is given by the intersection of half planes. Next, note that every linear form over real vector space h : V —> R (that is, arbitrary linear scalar ,i-rfT function) is monotone in every chosen direction. -'. *j30r~ More precisely, if we choose a fixed starting vec-vl^iip " tor u G V and "directional" vector v e V, then composition of our form h with parametrization yields t i-> h(u + t v) = h(u) + t h(v). This expression is indeed either increasing or decreasing, or constant (depending on whether h(y) is positive, negative or zero), as a function of t. Thus, if the set M is bounded as at our picture above, we easily find the solution by testing the value of h at the vertices of the boundary polygon. In general, we must expect that problems similar to the one with the painter are either unsatisfiable (if the given set with constraints is empty), or the profit is unbounded (if the constraints allow for unbounded directions in the space and the form h is non-zero in some of the unbounded directions) or they attain a maximal solution in at least one of the "vertices" of the set M. Normally the maximum is attained at a single point of M, but sometimes it is attained on a part of the boundary of the set M. Try to choose explicit values for the parameters w1, w2, bi, b2, ci, C2, draw the above picture for these parameters and find the explicit solution to the problem (if it exists)! 3.1.2. Terminology. In general we speak of a linear programming problem whenever we seek either the maximum or minimum value of a linear form h over Rn on a set bounded by a system of linear inequalities which we call linear constraints. The vector on the right side is then called the vector of constraints. The linear form h is also called the objective function} In real practice we meet hundreds or thousands of constraints for dozens of variables. The standard maximization problem is defined by seeking a maximum of the objective function while the restrictive inequalities are < and the variables are non-negative. On the other hand, the standard minimization problem is defined by seeking a minimum of the objective function while the restrictive inequalities are > and the variables are non-negative. Leonid Kantorovich and Tjalling Koopmans shared the 1975 Nobel prize in economics for their formulations and solution of economical and logistics problems in a similar way during the second world war. But it was George B. Dantzig who independently developed general linear programming formulation in the period 1946-49, motivated by planning problems in US Air Force. Among others, he invented the simplex method algorithm. 142 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Thus we may restrict ourselves to the problem of maximizing fcona vector space of solutions of systems of linear equations with the additional condition that all the values of the coordinates must be non-negative. If there are more general inequalities, we can always change them into our form by multiplying them with — 1 and minimization of value of h corresponds to maximization of -h. As explained in more details in 3.1.1 and 3.1.6, we add the first row of coefficients of h (with minus signs) and use the simplex tableau: -Ci . 0 an • 0-In bi "ml ^mn K We start the algorithm if we find m columns (here m is the number of equations in the problem) such that Gauss elimination for these columns leads to a unit submatrix in a and positive values at the positions of all b{. The coordinates corresponding to these columns and values 1 in them are called the basic coordinates. We restrict ourselves to the cases where all b{ are nonnegative in the original problem and then we choose all the slack variables as the basic ones and the initialization of the algorithm is done. Next we move in the following iterated steps (compare the more theoretical explanation in 3.1.6): We choose the first column from the left having a non-positive value in the first row. In this column (let it be the j-th column), we pick up the positive entry a{j in a which provides the minimal relation b{ /a{j (we call this entry the pivot). Finally we eliminate the entire chosen column with the help of the chosen a,. This means we achieve by elementary row transformations that the j-th column contains the value 1 at it's i-th row, with all other values vanishing. We explain the procedure by a example: 3.A.3. Minimize the function — 3a; — y — 2z under the conditions x,y,z > 0 and + z > -4, + z < 3, + 3z < 8. Solution. First we multiply the objective function and the first inequality by —1. We get the equivalent task of maximizing It is easy to see that every general linear programming list problem can be transformed into a standard one 120^. either types. Aside from sign changes, we sSggEfc? can work with a decomposition of the variables that have no sign restriction into a difference of two non-negative ones. Without loss of generality we will work only with the standard maximization problem. 3.1.3. Formulation using linear equations. Finding an optimum is not always as simple as in the previous 2-dimensional case. The problem can contain many variables and constraints and even deciding whether the set M of the feasible points is non-empty can be a problem. We do not have the ambition to go into detailed theory here. But we mention at least some ideas which show that the solution can be always found, and then we build an effective algorithm solving the problem in the next paragraphs. We begin by comparison with systems of linear equations - because we understand those well. We write the equations (l)-(3) in 3.1.1 in the general form: a ■ x < b, where x is now an n-dimensional vector, b is an m-dimensional vector and a is the corresponding matrix. By an inequality between vectors we mean individual inequalities between all coordinates. We want to maximize the product c ■ x for a given row vector of coefficients of the linear form h a the feasible values of x. If we add new auxiliary variables xs, one for every equation and add another variable z for the value of the linear form h, we can rewrite the whole system as a system of linear equations (1) (o a em) ' = (a-x + x^) = (b) where the matrix is composed of the blocks with 1 + n + m columns and 1+m rows, with corresponding individual components of the vectors. We call the new variables xs the slack variables. Moreover, we require non-negativity for all coordinates x and xs. If the given system of equations has a solution, we seek values for the variables z, x and xs, such that all x and xs are non-negative and z is maximized. In paragraph 4.1.11 on page 239 we will discuss this situation from the viewpoint of affine geometry. Now we just notice that being on the boundary of the set of feasible points M of the problem is equivalent to having some of the slack variables vanishing. Our algorithm will try to move from one such position to another while increasing h. But we shall need some conceptual preparation first. 143 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS the function 3x + y + 2z under the conditions —x + y — z < 4, 2x + z < 3, x + y + 3z < 8. Introducing the non-negative slack variables u, v, w, we obtain the tableau with the objective function 3x + y + 2z + 0 ■ u + 0 ■ v + 0 ■ w: -3 -1 -2 0 0 0 0 -1 1 -1 0 0 0 4 © o 1 0 0 0 3 1 1 3 0 0 0 8 Since the right-hand column is non-negative, setting u = 4, v = 3, w = 8, x = y = z = Q provides an admissible solution to the system which corresponds to the choice of the basic variables u, v, w, and the algorithm may begin. The first column is already with a negative entry in the first row, so we choose this one. We have circled the pivot, i.e. the value two there (we compare the relations of the elements and those in the last column, i.e. § and |, and take the minimal one, since we need to keep the last column positive during the elimination). Next we eliminate the first column with the help of the pivot (we multiply the third row by \, and subtract its reasonable multiples from the other rows, not forgetting the first row of the tableau, so that only zero entries remain there): 0 -1 1 2 0 3 2 0 9 2 0 © 1 2 0 1 2 0 11 2 0 0 1 2 0 1 2 0 3 2 0 1 5 2 0 1 2 0 13 2 Now the basic variables are x = 3/2, u = 11/2, w = 13/2, which reflects the fact that we moved as much from the former slack variable v to the new basic variable x as possible. This increased the value of the objective function, which we may read in the right top corner of the tableau. Next, we choose the pivot from the second column and the above rule yields the first row in A (^ < ^). We have already circled the 1 in the tableau above. We eliminate: 0 0 -1 1 2 0 10 0 0 1 2 1 1 2 0 11 2 0 0 1 2 0 1 2 0 3 2 0 0 © -1 -1 0 1 Specifically, in our problem of the black&white painter from 3.1.1, the system of linear equations looks like this: A -ci -C2 0 0 X\ (0\ 0 1 1 1 0 0 L 0 Wl W2 0 1 0 x-3 W Vo bi b2 0 0 V X4 W w 3.1.4. Duality of linear programming. Consider the real jj> i, matrix A with m rows and n columns, vector of con- straints b and row vector c giving the objective func-Aijft tion. From this data we can consider two problems of 1 linear programming for i£t™ and y G Rm. Dual problems of linear programming Maximization problem: Maximize c ■ x under the conditions A ■ x < b and x > 0. Minimization problem: Minimize yT ■ b under the condition yT ■ A > c and y > 0. We say that these two problems are dual problems of linear programming. Before deriving further properties of linear programming we need some terminology. We say that the problem is solvable if there is an admissible vector x (or admissible vector yT) which satisfies all constraints. A solvable maximization (minimization) problem is bounded, if the objective function is bounded from above (bellow) over the set of admissible vectors. Lemma (Weak duality theorem). If x e K™ is an admissible vector for the standard maximization problem, and if y £ Rm is an admissible vector for the dual minimization problem, then c ■ x < yT ■ b Proof. It is a simple observation. Since x > 0 and c < yT ■ A, it follows that c ■ x < yT ■ A ■ x. But also y > 0 and A ■ x < b, hence c ■ x < yT ■ A ■ x < yT ■ b, which is what we wanted to prove. □ We see immediately that if both dual problems are solvable, then they must be bounded. Even more interesting is the following corollary, which is directly implied by the inequality in the previous proof. Corollary. If there exist admissible vectors x and y of dual linear problems such that for the objective functions c ■ x = yT ■ b, then both are optimal solutions for the corresponding problems. 3.1.5. Theorem (Strong duality theorem). If a standard problem of linear programming is solvable and bounded, then its dual is also bounded and solvable. There exists an optimal solution for each of the problems, and the optimal values of the corresponding objective functions are equal. 144 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Again, we have shifted the new basic variable from u to y and the objective function increased. The next pivot will be the circled number 3 in the third column and fourth row: 0 0 0 o 0 0 □ 0 0 0 o 0 5 6 1 6 _ 1 3 1 6 _ 1 6 1 3 31 3 17 3 4 3 1 3 This is the resulting tableau, where the basic variables are 2 = !,y=^,2=| and their values are read from the last column. Notice that all the original variables are among the basic ones and their values are non-zero. This is not always the case, see the example 3.A.1 above and its explanation via this algorithm in 3.1.6. The maximal value ^ for the objective function is now in the right top corner. As mentioned in the theoretical explanation, the final tableau also provides the solution of the dual problem, i.e. the minimization of 4u + 3v + 8w under the condition u + 2v + w < 3, —u + w > 1, u + v + 3w > 2. According to the strong duality theorem (see 3.1.5), the minimal value is again while the corresponding values of the variables u, v and w are read off the first row in the corresponding columns: u= \,v = §, u> = §■ You may check directly that the numbers c4, c5, c6 in the first row of the tableau and the value h in the top right corner satisfy 4c4 + 3cs + 8cg = h. Indeed, the numbers tell how many times the appropriate row (the one with original value 1) has been added. Thus we obtain the right linear combination for h. □ 3.A.4. Some game theory. Imagine a game played by two v players - a billionaire and fate. The billionaire would like to invest into gold, silver, diamonds or stocks of an important IT software company. The wins and losses of such investments are well known for the last four years (for simplicity, we consider only the last four years and write them into the matrix A = (a^-)): gold silver diamonds software 2001 2% 1% 4% 3% 2002 3% -\% -2% 6% 2003 1% 2% 3% -4% 2004 -2% \% 2% 3% Proof. As already proved in the latter corollary, once it is established that the values of the objective functions for the dual problems equal, we have the required optimal solutions to both problems. It remains to prove the other implication, i.e. the existence of an optimal solution under the assumptions in the theorem, as well as the fact, that the objective functions share their values in such a case. This will be verified by delivering an efficient algorithm in the next paragraph. □ We notice yet another corollary of the just formulated duality theorem: Corollary (Equilibrium theorem). Consider two admissible vectors x and y for the standard maximization problem and its dual problem as defined in 3.1.4. Then both vectors are optimal if and only if yi = Ofor all coordinates with index i for which *YTj=\aijxj < hi and simultaneously x j = Ofor all coordinates with index j such that J3^=i Viaij > ci- Proof. Suppose both relations regarding the zeros among x{ and are true. (N ~-±Z-$ Since the summands with strict inequality ~^b^% have zero coefficients, we have m n Y = Y Y a^x3= Y vi^j i=l i=l j=l i=l j=l and for the same reason m n n ViaijXi ~ Y^y C3Xr i = l j = l j = l This shows one implication, by the duality theorem. Suppose now that both x and y are optimal vectors. Then m m n n Y.!/h YY.!/m*Y i=l i=l j=l j=1 But the left- and right-hand sides are equal, and hence there is equality everywhere. If we rewrite the first equality as m / n \ ^yiibi-^aijXj] =0, i=l ^ j=l ' then we see that it can be satisfied only if the relation from the statement holds. But it is a sum of non-negative numbers and equals zero. From the second equality we similarly derive the second part and the proof is finished. □ The duality theorem and equilibrium theorem are useful when solving linear programming problems, because they show us relations between zeros among the additional variables and the fulfillment of the constraints. As usual, it is good to know that the problem is solvable in principle and to have some theory related to that, but we still need some clever ideas to make it all into an efficient algorithmic procedure. The next paragraph will provide some insight to this. 145 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS The billionaire would like to invest for one year only. How should he split his investment in order to ensure the maximal win independently of development on the stock market? We assume that next year will be some (unknown) probabilistic mix of the previous four ones. In terms of our game, fate will play some stochastic vector (xi, x2,x3,x4) fixing the behaviour of the market (as a probabilistic mixture of the previous ones), while the billionaire will play another stochastic vector (2/1,2/2,2/3, Ha) describing the split of his investment. The win of the billionaire is ^+ j=i XiVj^ij ■ Solution. The task is to find the stochastic vector (2/1,2/2,2/3,2/4), which will maximize the minimum of all values J2i j=i xiVjaij f°r the fixed matrix A and any stochastic vector (xi,x2,x3,24). A very observant reader could imagine that this task is equivalent to the problem of maximizing z\ + z2 + z3 + 24 under the condition ATz < (1, ..., 1)T, z > 0 (and the requested stochastic vector y is then obtained by normalizing the vector z, the requested optimal value is the inverse of the optimal value obtained). 1 Thus, we have to solve a linear programming problem. We introduce the slack variables »i, w2, w3, w4, and transform the problem to the standard form max {zi + z2 + z3 + z4 | (AT\E4) (z,w) = (1,1,1,1)T} . We work with the table: -1 -1 - -1 -1 0 0 0 0 0 2 3 1 -2 1 0 0 0 1 T] -1 2 1 0 1 0 0 1 -2 3 2 0 0 1 0 1 3 6 - -4 3 0 0 0 1 1 0 3 2 1 4 -\ 0 0 1 4 0 1 4 0 4 1 2 -3 0 0 1 2 0 1 2 0 1 2 5 4 \ 0 0 1 4 0 3 4 LH 1 2 3 4 1 0 0 1 4 0 1 4 0 ® 25 4 1 0 0 3 4 0 1 4 The observation comes from the proof of the von Neumann Minimax theorem, 1928. The theorem claims that any probabilistic extension of a matrix game enjoys an equilibrium state. 3.1.6. The algorithm. As already explained, the linear pro-^Ji,, gramming problem of maximizing the linear objec-!*?jte tive function h = cx under the conditions Ax < b A~>jK' can be turned into solving the system of equations (1) in 3.1.3, where we added the slack variables xs. If all entries in b are non-negative, then the choice of xs = b and x = 0 provides an admissible solution of the system with the value of the objective function h = 0. This is the choice of the origin x = 0 as one of the vertices of the distinguished region M of the admissible points. We can understand this as choosing the variables the basic variables, whose values are given by the right hand sides of the equation, while all the other variables are set to zero. In the general case (allowing for negative entries in b), we shall see in 4.1.11 that we always can find an admissible vertex. That is, the choice of the basic variables in the above sense, describing an admissible solution. Next, we shall assume to have such a vertex already. The idea of the algorithm is to perform equivalent row transformations of the entire system in such a way, that we move to other vertices of the region M and the function h increases. In order to move to more interesting vertices in M, we must bring some of the slack variables to zero while the appropriate column for the unit matrix would move to one of those columns corresponding to the variables x. A simple check reveals that in order to do this, we must choose some of the negative entries in the first line of the matrix 3.1.3(1), pick up this column and choose a line in such a way that using the Gaussian elimination to push the other entries in this particular column to zero, the right hand sides of the equations remain non-negative. The latter condition means that we have to choose the index i such that bija^ is minimal. This entry in the matrix is called the pivot for the next step in the elimination. Of course, the non-positive coefficients a{ j are not taken into consideration, since they would not lead to any increase in the objective function. When there are no more negative entries in the first row, we are finished, and the claim is that the optimal value of h appears in the right hand top corner of the matrix. The reader should think of all the above claims in de-la*' and cneck whether the algorithm must ter-mX'^jS* minate. But the most striking point is the fol-^^Jw. l°wm8: The slack variables parts of the matrix are closely linked to the dual linear programming problem, and there is an invariant of the entire procedure: Writing (—c,cs,h) for the current first line in the matrix and (x,xs) for the current values of the variables, we obtain c ■ x = cs ■ b = h at each step (check this!). In particular at the moment of the termination of the above algorithm, the coefficients y = cs in the first row represent admissible values of the dual problem (while the values c stay for the slack variables in the dual problem), and the right hand top corner provides the value of the corresponding objective function y-b. Since the two objective functions are equal, we know that the 146 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 0 0 3 2 1 5 0 0 1 10 1 5 3 10 0 0 © 19 5 0 0 1 10 8 15 11 30 0 0 5 6 3 5 0 m 3 10 1 15 23 30 0 0 1 3 3 5 0 0 l 5 1 15 4 15 0 0 5 6 1 5 0 0 l 10 2 15 1 30 0 0 0 - 188 85 9 17 0 4 85 7 85 42 85 0 0 0 - 114 85 6 17 0 3 85 16 85 11 85 0 0 0 146 85 5 17 0 23 85 19 85 56 85 0 0 0 ( 89\ 85J 2 17 0 18 85 11 85 19 85 0 m 0 78 85 5 17 0 11 85 2 85 12 85 188 89 0 0 0 25 89 0 44 89 17 89 86 89 114 89 0 0 0 18 89 0 21 89 2 89 37 89 146 89 0 0 0 9 89 0 55 89 1 89 26 89 85 89 0 0 0 10 89 0 18 89 11 89 19 89 78 89 m o 0 17 89 0 5 89 8 89 30 89 The last table is already the optimal one, since there are no negative values in the first row. We can read off the optimal solution: z2 = §§,23 = z4 = §§> zi = 0. The optimal value (upper right corner) is z\ + z2 + z3 + Z4 = ||. After rescaling to a stochastic vector (multiplying with ||) we get the solution of the original problem: yi = 0, y2 30 y3 37 fg. Va = §§■ with the optimal value ||. B. Difference equations Distinct linear dependences can be an excellent tool for describing various models of growth. We begin with a very popular population model that uses a linear difference equation of second order: 3.B.I. Fibonacci sequence. In the beginning of spring, a stork brought two newborn rabbits, male and female, to a meadow. The female, after being two months old, is able to deliver two newborns, male and female. The newborns can then start delivering after one month and then every month. Every female is pregnant for one month and then she delivers. How many pairs of rabbits □ algorithm provides the optimal solution. Great! (But check all the details.) We show how all this works for the simple problem from 3.A.I. In practice, the very first column of the matrix in question does not change during the procedure at all, so we can omit it completely. Thus we deal with the matrix: -4 -6 0 0 0 0 0 1 2 1 0 0 0 120 1 4 0 1 0 0 180 -1 1 0 0 1 0 -90 1 0 0 0 0 1 110. We cannot find an admissible solution by fixing the basic variables here, since there are negative values in b. We try to initiate the above algorithm by changing the sign in the last but one row and performing the Gaussian elimination for the very first column aiming to have only the 1 in the last but one row there. We obtain: 0 -10 0 0 -4 0 360 0 © 0 0 1 0 30 0 5 0 0 1 0 90 0 -1 0 0 -1 0 90 0 1 0 0 1 0 20 We choose the boxed entries for the basic variables, this represents the values x1 = 90, x2 = 0, x3 = 30, x4 = 90, x5 = 0, x6 = 20, and h = 440 = 4 ■ 90 = -4 ■ (-90) which is an admissible solution. We have also circled the pivot for the next step, i.e. the element in the second column which we want to replace with 1 and eliminate the rest of the column (remember this is the one yilding the smalest ratio with the last right hand column entry among the positive elements -30/3 = 10 which is less then 90/5 = 16 and 20/1 = 20). This leads to the next admissible vertex in our region M and, of course the value for h will increase: 0 0 0 0 0 0 0 0 0 0 10 3 0 0 1 3 _5 3 1 3 _ 1 3 0 0 0 0 © 0 460 10 40 100 10 with x\ = 100, x2 = 10, x3 = x5 = 0, x4 = 40, x6 = 10, and h = 460 = 4 ■ 100 + 6 ■ 10 = f ■ 120 - § ■ (-90). We still have one of the entries in the first line negative. We circled the next pivot leading to 0 0 9 3 0 0 1 470 0 0 1 2 0 0 1 2 5 0 0 -2 0 0 1 50 0 0 0 0 0 1 110 0 0 1 2 0 0 3 2 15 147 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS will be there after nine months (if none of them dies and none "move in")? Solution. After one month, there is still one pair, but the female is already pregnant. After two months, first newborns are delivered, thus there are two pairs. Every next month, there are that many new pairs as there were pregnant females one month before, which equals to the number of at least one month-old pairs, which equals the number of pairs that were there two months ago. The total number of pairs pn after n months is thus the sum of the number of pairs in the previous two months. For the number of pairs we thus have the following homogeneous linear recurrent formula (1) Pn+2 = Pn+1 +Pn, 1, which, along with the initial conditions p\ = 1 and p2 = 1, uniquely determines the number of pairs of rabbits at the meadow in individual months. Linearity of the formula means that all members of the sequence (p„) appear to the first power. Hopefully the meaning of the word recurrence is clear. For the value of the n-th member we can derive an explicit formula. In searching for the formula we can use the observation that for certain r the function rn is a solution of the difference equation without initial conditions. This r can be obtained by substitution into the recurrent relation: rn+2 _ rn+i rn ^ after djvjdjug by r" we obtain r2 = r + 1. This is the characteristic equation of the given recurrent formula. Thus our equation has roots 1~2V^ and 1+2V^ and the sequences an = i1^)71 and bn = Q^-)11, n > 1 satisfy the given relation. The relation is also satisfied by any linear combination, that is, any sequence cn = san + tbn, s,ieR, The numbers s and t can be chosen so that the resulting combination satisfies the initial conditions, in our case c\ = 1, c2 = 1. For simplicity, it is convenient to define the zero-th member of the sequence as cq = 0 and compute s and t from the equations for c0 and c\. We find that s = and thus (1 + V5)n - (1 - V5)n "v/5' V5 (2) Pn 2™(v/5) Such a sequence satisfies the given recurrent formula and also the initial conditions c0 = 0, c\ = 1. Hence it is the unique sequence given by these requirements. Note that the value of Pn in the formula (2) is an integer for any natural n (all terms with the final values x\ = 110, x2 = 5, x3 = 0, x4 = 50, x5 = 15, xG = 0, and 9 h = 470 = 4 ■ 110 + 6 ■ 5 = - ■ 120 + 1 ■ 110. Let us remind why we can be sure that this is the optimal solution. Thanks to fact that the first line is exclusively non-negative, we have got admissible solution of the dual problem which leads to the same value as the solution of the original one. Thus the equillibrium theorem claims we are done! 3.1.7 Notes about linear models in economy. Our simple scheme of the black&white painter from the paragraph 3.1.1 can be used to illustrate one of the typical economical models, the model of production planning. The model tries to capture the problem completely, that is, to capture both external and internal relations. The left-hand sides of the equations (1), (2), (3) in 3.1.1, and the objective function h(xi,x2) express various production relations. Depending on the character of the problem, we have on the right-hand sides either exact values (and so we solve equations) or capacity constraints and goal optimization (then we obtain linear programming problems). Thus in general we can solve the problem of source allocation with supplier constraints and either minimize costs or maximize income. We can also interpret duality from this point of view. If our painter would like to quantify his efforts related to the total amount of his work by hl per unit, the white colour painting adds yw, while the additional work related to the black colour is yB, then he minimizes the objective function L-yL + Wyw + ByB with constraints yi + wxyw + bxyB > ci yi + w2yw + b2yB > c2. But that is exactly the dual problem to the original one and the theorem 3.1.5 says that the optimal state is when the objective functions have the same value. Among economical models, we can find many modifications. One of them is the problem of financial planning, which is connected to the optimization of portfolio. We are setting up a volume of investment into individual investment possibilities with the goal to meet the given constraints for risk factors while maximizing the profit, or dually minimize the risk under the given volume. Another common model is marketing application, for instance allocation of costs for advertisement in various media or placing advertisement into time intervals. Restrictions are in this case determined by budget, target population, etc. Very common are models of nutrition, that is, setting up how much of different kinds of food should be eaten in order to meet total volume of specific components, e.g. minerals and vitamins. Problems of linear programming arise with personal tasks, where workers with specific qualifications and other 148 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS in the Fibonacci sequence are integers), although it might not seem so at the first glance. □ We do some exercises about solving linear difference equation of the second order with constant coefficients. The sequence satisfying the given recurrence equation of the second order is uniquely determined whenever we prescribe any two neighbouring members. Note a further use of complex numbers: to determine the explicit formula for the n-th member of the sequence of real numbers we might require calculations with complex numbers. This happens when the characteristic polynomial of the difference equation has complex roots. 3.B.2. Find an explicit formula for the sequence satisfying the following linear difference equation with the initial conditions: xn+2 = 2xn + n, xx = 2, x2 = 2. Solution. The homogeneous equation is Xji+2 — 2xn. Its characteristic polynomial is x2 —2, its roots are ±\f2. The solution of the homogeneous equation is of the form a(v/2)n + b(-V2)n, foranya,&eR. We look for the particular solution using the method of indeterminate coefficients. The non-homogeneous part of the equation is a linear polynomial n. Thus a particular solution will be of the form of a linear polynomial in the variable n. That is, kn + /, where k,l e R. By substituting into the original equation we obtain k(n + 2) + / = 2(kn + /) + n. By comparing the coefficients of the variable n on both sides of the equation, we obtain the relation k = 2A+1, that is, k = —1. By comparing the absolute terms we obtain 2k + / = 21, that is, / = —2. Thus the particular solution is the sequence -n - 2. Thus the solution of the non-homogeneous difference equation of the second order without initial condition is of the form a(V2)n + b(-V2)n — Ti — 2, a, b £ R properties are distributed into working shifts. Common are also problems of merging, problems of splitting and problems of goods distribution. 2. Difference equations We have already met difference equations in the first chapter, albeit briefly and of first order only. Now we consider a more general ?f theory for linear equations with constant si/, - ^ coefficients. This not only provides very practical tools but also represents a good illustration for the concepts of vector spaces and linear mappings. Homogeneous linear difference equation of order k 3.2.1. Definition. A homogeneous linear difference equation (or homogeneous linear recurrence) of order k is given by the expression ioxn + o-iXn-i H----+ dkXn-k = 0, a0 ^ 0 ak ^ 0, where the coefficients a{ are scalars, which can possibly depend on 7i. We usually denote the sequence in question as a function xn = f(n) = - — f(n - 1) a0 -f(n-k). a0 A solution of this equation is a sequence of scalars xt, for all i £ N (or i e Z), which satisfy the equation with any By giving any k consecutive values x{ in the sequence, all other values of x{ are determined uniquely. Indeed, we work over a field of scalars, thus the S^Ti8 values a0 and ak are invertible and hence, using the recurrent definition, any xn can be computed uniquely from the preceding k values, and similarly for xn_k. Induction thus immediately proves that all remaining values are uniquely determined. The space of all infinite sequences x{ forms a vector space, where addition and multiplication by scalars works coordinate-wise. The definition immediately implies that a sum of two solutions of a homogeneous linear difference equation or a multiple of a solution is again a solution. Analogously as with homogeneous linear systems we see that the set of all solutions forms a subspace. Initial conditions on the values xq, • • •, xk-i of the solution reprezent a fc-dimensional vector in Kk. The sum of initial conditions determines the sum of the corresponding solutions, similarly for scalar multiples. Note also that substituting zeros and ones into initial k values immediately yields k linearly independent solutions of the difference equation. Thus, although the vectors are infinite sequences, the set of all solutions has finite dimension. The dimension equals the order of the equation k. Moreover, we can easily obtain a basis of all those solutions. Again we speak of the fundamental 149 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Now, by substitution in the initial conditions, we determine the indeterminate a,b e K. To simplify the calculation, we use a little trick: from the initial conditions and the given recurrence relation we compute the member x0 : x0 = \{x2 — 0) = 1. The given recurrence formula along with the conditions x§ = \ and x\ = 1 is then clearly satisfied by the same formula that satisfies the original initial conditions. Thus we have the following relations for a, b: x0: a(v/2)° + &(-V/2)°-2 = l, thus a + b = 3, xx : V2a - V2b = 5, whose solution gives us a = 6+^, b = 6~54^. The solution is thus the sequence ^^(v^ + ^(-V^-n-2. □ 3.B.3. Determine the basis of the space of all solutions of the homogeneous difference equation Xn+4 Xn_j_3 + Xn_|_i %n: Express your solution in terms of real valued functions. Solution. The characteristic polynomial of the given equation is x4 — . x + 1. If we are looking for its roots, we solve the equation x4 -x3 - 2 + 1 = 0 The left side factors as (a;-l)2(a;2 + a; + l) -\ + i& = cos(27r/3) + with two complex roots x\ isin(27r/3) andx2 = ~ *-^r ~ cos(27r/3) — isin(27r/3) and a double root 1. Thus the basis of the vector space of the sequences that are a solution of the difference equation in question is the following quadruple of se-quences: {(-± + V3)"}~i, {(-§ - n/3)"}~ 1, {l}$£Li (constant sequence) and {n}^=1. If we are looking for a basis of real valued functions, we must replace two of the generators (sequences) from this basis by some sequences that are real only. As these generators are power series whose members are complex conjugates, it suffices to take as suitable generators the sequences given by the half of the sum and by the half of the i-th multiple of the difference of that complex generators. This yields the following real system of solutions and all other solutions are its linear combinations. As we have just checked, if we choose k indices i, i + 1,... ,i + k — lin sequence, the homogeneous linear difference equation gives a linear mapping Kk —> K°° of fc-dimensional vectors of initial values into infinitely-dimensional sequences of the same scalars. The independence of such solutions is equivalent to the independence of the initial values - which can be easily checked by a determinant: If we have a fc-tuple of solutions (x1^ , if), it is independent if and only if the following determinant, sometimes called the Casoratian, is non-zero for one n un+l [k] Cn+1 7^0 ■^n+k-l ■ ■ ■ ^n+k-1 which then implies the non-vanishing of Cn for all n. 3.2.2. Recurrences with constant coefficients. It is difficult to find a universal mechanism for finding a solution (that is, a directly computable expression) of general homogeneous linear difference equations. We shall come back to this problem in the end of chapter 13. In practical models there are very often equations, where the coefficients are constant. In this case it is possible to guess a suitable form for the solu-i^Ofe tion and indeed to find k linearly independent 'iv^^-t^ solutions. This would then be a complete solution of the problem, since all other solutions would be linear combinations of them. For simplicity we start with equations of second order. Such recurrences are very often encountered in practical problems, where there are relations based on two previous values. A linear difference equation (recurrence) of second order with constant coefficients is thus a formula (1) /(n + 2) = af(n + l)+bf(n)+c, where a,b,c are known scalar coefficients. Consider a population model. We assume that the individuals in a population mature and start breeding two seasons later (that is, they add to the value /(n + 2) by a multiple b j(n) with positive b > 1), while immature individuals at the same time weaken and destroy part of the mature population (that is, the coefficient a at /(n+1) is negative). Furthermore, it might be that somebody destroys (uses, eats) a fixed amount c of individuals every season. A similar situation with c = 0 and both other coefficients positive determines the famous Fibonacci sequence of numbers y0, yi,..., where yn+2 = yn+i + yn, see 3.B.I. If we have no idea how to solve a mathematical problem, we can always blindly try some known solutions of a similar problems. Thus, let us substitute into the equation (1) with coefficient c = 0 a similar solution as with the linear equations 150 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS basis of the solution space: {l}^Li (constant sequence), {n}~ !, {cos(2n7r/3)}~ lf {sin(2n7r/3)}~□ 3.B.4. Solve the following difference equation: Xn+4 Xn_j_3 Xn_(_2 + Xn_(_x Xn. Solution. From the theory we know that the space of the solutions of this difference equation is a four-dimensional vector space whose generators can be obtained from the roots of the characteristic polynomial of the given equation. The characteristic polynomial is x4 - x3 + x2 - x + 1 = 0. It is a reciprocal equation (that means that the coefficients at the (n — fc)-th and fc-th power of x, k = 1,..., n, are equal). We can use the substitution u = x + \. After dividing the equation by x2 (zero cannot be a root) and substituting (note that x2 + ^2 = u2 — 2) we obtain a;2-2; + l- - + --=u2-u-l = 0. x xz Thus we obtain the indeterminates 1*1,2 = 1±2V^. From there then by the equation x2 — ux + 1 = 0 we determine the four roots 1 ± V5± ^-10 + 2^ 2+2,3,4 = -^-• Note that the roots of the characteristic equation could have been "guessed" right away since x5 + 1 = (x + l)(x4 - x3 + x2 - X + 1). Thus the roots of the polynomial x4 —x3 +x2 —x+l are also the roots of the polynomial x5 +1, which are exactly the fifth roots of the —1. By this we obtain that the solutions of the characteristic polynomial are the numbers 2+2 = cos(;|) ± i sin (I) and 23,4 = cos^) ± i sin(3^1). Thus the real basis of the space of the solution of the given difference equation is for instance the basis of the sequences cos(I|L), sin(I|I), cos(222i) and sin (2^), which are sines and cosines of the arguments of the corresponding powers of the roots of the characteristic polynomial. Note that we have incidentally derived the algebraic expressions for cos(f) = ±±^, sin(f) = ^10-2^ cos(ir) = andsin(^) = v=5±=S. This is because all the roots of the equation have absolute value 1, they are the real and imaginary parts of the corresponding roots). □ from the first chapter (cf. 1.2.1), that is, we try j(n) = A™ for some scalar A. By substitution into the equation we obtain xn+2 _ aXn+l _ bxn = _ fla _ &) = q. This relation will hold either for A = 0 or for the choice of the values Ai = i(a+ Va2 + 4b), A2 = i(a- \/a2 +4b). It is easy to see that such solutions work. We just had to choose the scalar A suitably. But we are not finished, since we want to find a solution for any two initial values /(0) and /(l). So far, we have only found two specific sequences satisfying the given equation (or possibly even only one sequence if A2 = Ai). As we have already derived for linear recurrences, the sum of two solutions fi(n) and J2(n) of our equation f(n + 2) — a f(n + 1) — b j(n) = 0 is again a solution of the same equation. The same holds for scalar multiples of the solution. Our two specific solutions thus generate the more general solutions f(n) = dA™ + C2A™ for arbitrary scalars C\ and C2. For a unique solution of the specific problem with given initial values /(0) and /(l), it remains only to find the corresponding scalars C\ and C2. 3.2.3. The choice of scalars. We show how this can work with an example. Consider the problem: 1 (1) Vn+2 = Vn+1 + ^Vn, VO = 2, yi = 0. Here Ai;2 = ^(1 ± \/3) and clearly yo = C1 + C2 = 2 yi = \ci(\ + Vl) + \c2(\-Vl) is satisfied for exactly one choice of these constants. Direct calculation yields C\ = 1 — \\[3, C2 = 1 + and our problem has unique solution Note that even if the found solution for our equation with rational coefficients and rational initial values looks complicated and is expressed with irrational numbers, we know a priori that the solution itself is again rational. But without this "step aside" into a larger field of scalars, we would not be able to describe the general solution. We will often meet similar phenomena. Moreover, the general solution often allows us to discuss qualitative behaviour of the sequence of numbers j(n) without direct enumeration of the constants. For example, we may see whether the values approach some fixed value with increasing n or oscillate in some interval or whether they are unbounded. 151 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.B.5. Determine the explicit expression of the sequence satisfying the difference equation xn+2 = 2xn+1 — 2xn with initial values x\ = 2,x2 = 2. Solution. The roots of the characteristic polynomial x2—2x+ 2 are 1 + i and The basis of the (complex) vector space of the solution is thus formed by the sequences yn = (1 + i)n and zn = (1 — i)n. The sequence in question can thus be expressed as a linear combination of these sequences (with complex coefficients). It is thus xn = a ■ yn + b ■ zn, where a = ai + ia2, b = b\ + ib2. From the recurrent relation we compute x0 = ^(2x1 — x2) = 0 and by substitution n = 0 and n = 1 into the expression of xn we obtain 1 = xq = ai + ia2 + bi + ib2 2 = xx = (a1+ia2)(l+i) + (b1+ib2)(l-i). By comparing the real and the complex part of both equations, we obtain a linear system of four equations with four indeterminates oi + &i = 1 a2 + b2 = 0 ^ — a2 + b1 + b2 = 2 ^ + a2 — b1 + b2 = 0. These equations imply that a1 = b\ = b2 = | and a2 = —1/2. Thus we can express the sequence in question as The sequence can also be expressed using the real basis of the (complex) vector space of the space of solutions, that is, using the sequences un = ^(yn + zn) = (y/2)n cos(Ij:) and vn = \i{zn — Vn) = (v^)™ sin(I^L). The transition matrix for the changing the basis from the complex one to the real one is the inverse matrix is T 1 = J, for expressing the se- quence xn using the real basis, that is, for expressing the coordinates (c, d) of the sequence xn under the basis {«„, vn}, we have 3.2.4. General homogeneous recurrences. We substitute xn = \n for some (yet unknown) scalar A into the general homogeneous equation from the definition 3.2.1 (with constant coefficients). For every n we obtain the condition An-fe(a0Afe + a1A k-l + Ofc) 0. T" This means that either A = 0 or A is the root of the so-called characteristic polynomial in the parentheses. The characteristic polynomial is independent of n. Assume that the characteristic polynomial has k distinct roots Ai,..., Afc. For this purpose, we can extend the field of scalars we are working in, for instance Q into R or C. Of course, if the inicial conditions are in the original field then the solutions stay there since the recurrence equation itself does. Each of the roots gives us single possible solution Xn = (A,)™. We need k linearly independent solutions. Thus we should check the independence by substituting k values for n = 0,..., k — 1 for k choices of A, into the Casoratian (see 3.2.1). Thus we obtain the Vandermonde matrix. It is a good but not entirely trivial exercise to show that for every k and any fc-tuple of distinct A, the determinant of such a matrix non-zero, see 2.B.7 on the page 89. It follows that the chosen solutions are linearly independent. Thus we have found the fundamental system of solutions of the homogeneous difference equation in the case that all the (possibly complex) roots of its characteristic polynomial are distinct. Now we suppose A is a multiple root. We ask whether xn = n\n could be a solution. We arrive at the condition a0n\n H-----h ak(n - k)\n~k = 0. This condition can be rewritten as A(a0An + ■ ■ ■ + ak\n-k)' = 0 where the dash denotes differentiation with respect to A (cf. the infinitesimal definition in 5.1.6, and 12.2.7 for the purely algebraic treatment). Moreover, a root c of a polynomial / has multiplicity greater than one if and only if it is a root of /', see 12.2.7 for the proof. Our condition is thus satisfied. With greater multiplicity £ of the root of the characteristic polynomial we can proceed similarly and */ use the (now obvious) fact that a root with multiplicity £ is a root of all derivatives of the polynomial up to order £—1 (inclusively). Derivatives look like this: 152 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS thus we have again an alternative expression of the sequence xn where there are no complex numbers (but there are square roots): Xn = (^Tcos (IE)+(v5)"sin(IE). We could have obtained these by solving two linear equations in two variables c, d, that is, 1 = x0 = c ■ u0 + d ■ v0 = c and 2 = x\ = c ■ ui + d ■ i>i = c + d. □ 3.B.6. A simplified model for the behaviour of gross domestic product. Consider the difference equation (1) Vk+2 - a(l + b)yk+1 + abyk = 1, where yk is the gross domestic product at the year k. The /) /^C^S^n constant a is the consumption tendency, / XT \ which is a macro economical factor that ' ^ gives the fraction of money that the peo- ple spend (from what they have at their disposal). The constant b describes the dependence of the measure of investment of the private sector on the consumption tendency. Further, we assume that the size of the domestic product is normalised such that the right-hand side of the equation is 1. Compute the values yn for a = |, b = i, y0 = 1, y1 = 1. Solution. Look first for the solution of the homogeneous equation (the right side being zero) in the form of rk. The number r must be a solution of the characteristic equation x2 - a(l + b)x + ab = 0, that is, x2 - x + ^ = 0, which has a double root \. All the solutions of the homogeneous equation are then of the form a(^)n + bn(^)n. Note also that if we find some solution of the non-homogeneous equation (the particular solution), then we can add to it any solution of the homogeneous solution, to obtain another solution of the non-homogeneous equation. It can be shown that all solutions of the non-homogeneous equation can be found in this way. In this problem, it is easy to check that the constant function yn = c is a solution provided c = 4. All solutions of the difference equation yk+2 - yk+i + ^ ■ yk = l are thus of the form 4 + a(^)n + bn(^)n. We require that yo = Vi = 1 and these two equations give a = b = — 3. f(\) = a0\n + --- + ak\n-k /'(A) = aonX71'1 + ■■■ + ak(n - fc)An~fe_1 /"(A) = a0n(n-l)\n-2+- ■ ■ +ak(n-k)(n-k-l)Xn-k-2 /W =a0n---(n-£ + l)\n-e + ■■■ + ak(n -k)---(n-k-£ + l)\n-k-e. We look at the case of a triple root A and try to find a solution in the form n2\n. By substitution into the definition, we obtain the equation a0n2\n + ■■■+ ak(n - k)2\n-k = 0. Clearly the left side equals the expression A2/"(A) + A/'(A) and because A is a root of both derivatives, the condition is satisfied. Using induction, we prove that even for the general condition of the solution in the form xn = ne\n, a0neXn + ...ak(n- k)eXn-k = 0, the solution can be obtained as a linear combination of the derivatives of the characteristic polynomial starting with the expression (check the combinatorics!) A^)+Q^-i^-D + .... We have thus come close to the complete proof of the following: Homogenous equations with constant coefficients Theorem. The solution space of a homogeneous linear difference equation of order k over the field ofscalars K = C is the k-dimensional vector space generated by the sequences xn = nl\n, where A are (complex) roots of the characteristic polynomial and the powers £ run over all natural numbers 0,..., r\ — 1, where r\ is the multiplicity of the root A. Proof. The relation between the multiplicity of roots and the derivatives of real polynomials will be proved later (cf. 5.3.7), while the fact that every complex polynomial has exactly as many roots (counting multiplicities) as its degree will appear in 10.2.11. It remains to prove that the fc-tuple of solutions thus found is linearly independent. Even in this case we can prove inductively that the corresponding Casora-tian is non-zero. We have done this already in the case of the Vandermonde determinant before. To illustrate of our approach we show how the calculation looks for the case of a root Ai with multiplicity one and a root 153 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Thus the solution of this non-homogeneous equation is yn = 4 - 3 ( - I - 3ti A2 with multiplicity two: ,2 J V2/ Again, as we know that the sequence given by this formula satisfies the given difference equation and also the given initial conditions, it is indeed the only sequence characterized by these properties. □ 3.B.7. Find a sequence which satisfies the given non-homogeneous difference equation with the initial conditions: xn+2 = xn+1 + 2xn + 1, xx = 2, x2 = 2. Solution. The general solution of the homogeneous equation is of the form a(—1)" + b2n. A particular solution is the constant —1/2. The general solution of the given non-homogeneous equation without initial conditions is thus a(-l)n + b2n - i Substituting in the initial conditions, then gives the constants a = —5/6, b = 5/6. The given difference equation with initial conditions is thus satisfied by the sequence x - --f-ir + -2™-1 - - Xn~ 6[ ' +3 2 □ 3.B.8. Determine the sequence of real numbers that satisfies the following non-homogeneous difference equation with initial conditions: 2xn+2 = -xn+1 + xn + 2, xx = 2, x2 = 3. Solution. The general solution of the homogeneous equation is of the form a(—1)" + b(l/2)n. A particular solution is the constant 1. The general solution of the non-homogeneous equation without initial conditions is thus «(-1)"+ + By substitution with the initial conditions, we obtain the constants a = 1, b = 4. The given equation with initial conditions is thus satisfied by the sequence 1 xn = (-lT + 4(-\ +1. C(A",A",nA") A? A™+1 An+2 An+i A 2 n+2 2 TlAJ (n+m+l ^2 (n + 2)An+2 = A?A|" A^A2.™ -A? A2.™ 1 Ai A? 1 A2 A2 1 Ai — A2 Ai(Ai-A2) Ai — A2 Ai(Ai-A2) n (n+l)A2 (ti + 2)A2 1 0 0 n A2 A2 x2 A2 = A"A2re+1(A1-A2)2^0. In the general case the proof can be carried on inductively in a similar way. □ 3.2.5. Real basis of the solutions. For equations with real 1 coefficients, initial real conditions always lead to real solutions (and similarly with scalars Z or Q). However, the corresponding fundamental solutions derived using the above theorem might exist only in the complex domain. We try therefore to find other generators, which will be more convenient. Because the coefficients of the characteristic polynomial are real, each of its roots is either real or the roots are paired as complex conjugates. If we describe the solution in polar form as A™ = | A |n (cos np + i sin 711,2) A™ = IA |n (cos np — i sin np), we see immediately that their sum and difference leads to two linearly independent solutions xn = A|n cos np, yn = |A|n sinni^. Difference equations very often appear as a model of dynamics of some system. A nice topic to think about is the connection between the absolute values of individual roots and the stability of the solution. We will not go into details here, because only in the fifth chapter we will speak of convergence of values to some limit value. There is space for some interesting numerical experiments: for instance with oscillations of suitable population or economical models. 3.2.6. The non-homogeneous case. As in the case of systems of linear equations we can obtain all so-■/ lutions of non-homogeneous linear difference equations a0(n)xn + ai(n)3;„_i H-----h ak(n)xn-k = b(n), where the coefficients a{ and b are scalars which might depend on ?i, with ao(n) ^ 0, ak(n) =^ 0. Again, we proceed by finding one solution and adding the complete vector space of dimension k of solutions to the corresponding homogeneous □ system. Indeed each such sum yields a solution. Since the 154 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.B.9. Determine sequences satisfying xn+2 - 6xn+1 + 5xn = n en . Solution. Solve first the homogeneous part. We get: 4M = a ■ (1)™ + c2 ■ 5™. To find the particular solution we can use the method of variation of the constant. The Wronski determinant is Wj+1=det( [}+2 lJ+2 ) =4-y+1 Thus, - n— 1 /n—1 . a 1 nt^ . „• / nt^ J eJ cre = ci + c2 ■ 5™ - - 3 eJ + XI 4 - ■ i ^ 4-5J+1 j=o \j=o 5™, ci,c2 G □ 3.B.10. Determine an explicit expression of the sequence satisfying the difference equation xn+2 = 3xn+i + 3xn with members x\ = 1 and x2 = 3. O 3.B.11. Determine an explicit formula for the n-th member of the unique solution {xn}^=1 that satisfies the following conditions: Xn+2 = Xn+1 - Xn, Xi = 1, X2 = 5. O 3.B.12. Determine an explicit formula for the n-th member of the unique solution {i„}™=1 that satisfies the following conditions: -xn+3 = 2xn+2 + 2xn+1 + xn, xx = 1, x2 = 1, 23 = 1. o 3.B.13. Determine an explicit formula for the n-th member of the unique solution {xn}^=1 that satisfies the following conditions: -xn+3 = 3xn+2 + 3xn+1 + xn, x-i = 1, x2 = 1, x3 = 1. o C. Population models Population models, which we consider now, have recurrence relations in vector spaces. The unknown in this case is not a sequence of numbers but a sequence of vectors. The role of coefficients is played by matrices. We begin with a simple (two-dimensional) case. difference of two solutions of a non-homogeneous system is a solution of the homogeneous system, we obtain all solutions in this way. When we were working with systems of linear equations, it was possible that there was no solution. This is not possible with difference equations. But it is not always easy to find that one particular solution of a non-homogeneous system, particularly if the behaviour of the scalar coefficients in the equation is complicated. Even for linear recurrences with constant coefficients it may not be easy to find a solution if the right-hand side is complicated. But we can always try to find a solution in a form similar to the right hand side. Consider the case when the corresponding homogeneous system has constant coefficients and b(n) is a polynomial of degree s. The solution can then be found in the form of the polynomial xn = qq + a in + ■ ■ ■ + asns with unknown coefficients a{, i = 1,..., s. By substitution into the difference equation and comparing the coefficients of the individual powers of n we obtain a system of s + 1 equations for s + 1 variables a{. If this system has a solution, then we have found a solution of our original problem. If it has no solution, we can try again with an increase in the degree s of the polynomial in question. For instance, the equation xn — xn_2 = 2 cannot have a constant solution, because substitution of the potential solution xn = a0 yields the requirement a0 — a0 = 0 = 2. But by setting xn = a0 + a\n we obtain a solution xn = a0 + n, with a0 arbitrary. Thus the general solution of our equation is xn = Ci + C2(-l)n + n. We use this method, the method of indeterminate coefficients for example in 3.B.6. 3.2.7. Variation of constants. An other possible way to solve such an equation is the variation of constants method. Here we find first a solution of the homogeneous equation, where we consider the constants Ci as functions (n) of the variable n. Then we look for a particular solution of the given equation in the form We illustrate the method on second order equations. Suppose that the homogeneous part of the second order non-homogeneous equation Xn+2 + anXn+l + bnXn = /„ has il1' and x^ as a basis of solutions. We will be looking for a particular solution of the non-homogeneous equation in the form T = A + R •^n ^-n-^n ' n^n 155 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.C.I. Savings. A friend and I save for a holiday together by monthly payments in the following way. At the beginning I give 10 €and he gives 20 €. Every consecutive month each of us gives as many as last month plus one half of what the other has given the month before. How much will we have after one year? How much money will I pay in the twelfth month? Solution. Let the amount of money I pay in the n-th month be denoted by xn, and the amount my friend pays is yn. Thus in the first month we deposit x1 = 10, y± = 20. For the following payments we can write down a recurrent relation: ^n+l xn + ^Vn Vn+l Vn + ~2Xn If we denote the common savings by zn = xn + yn, then by 311(1 in Particular with some conditions on An and Bn to be imposed. We have Xn+1 = An+iX^^ + Bn+iX^+i = AnX^^ + BnX^1 + (An+1 - An)x{^+1 + (Bn+1 - Bn)x{?+1 = AnX^+i + Bn^lj + SAnX^+i + SBnX^^, where SAn = An+1 - An and SBn = Bn+1 - Bn. In order to be able to use the same An, Bn in the expression for xn+i, we impose for all n the condition SA^l, + 8Bnx^lx = 0. Thus, for all n T ,1-/1 + R summing the equations we obtain zn+i = zn + \zn = \zn. This is a geometric sequence and we obtain zn = 3.(|)n_1. In a year we will have z1 + z2 + ■ ■ ■ + z12. This partial sum is easy to compute 3-i 12 .(f) 1 - 1 772,5. In a year we will have saved over 772 €. The recurrent system of equation describing the savings system can be written by matrices as follows: in xn+l Vn+1 It is thus again a geometric sequence. Its elements are now vectors and the quotient is not a scalar, but a matrix. The solution can be found analogously: n-l The power of the matrix acting on the vector (x1,y1) can be found by expressing this vector in the basis of eigenvectors. 2 1 -0 The characteristic polynomial of the matrix is (1—A) and the eigenvalues are thus Ai2 = §, \ . The corresponding eigenvectors are thus (1,1) and (1, — 1). For the initial vector (x\,y\) = (1, 2) we compute J =2 (l)~2 (-1. and thus 3 /3\™ 1 fl 1 nv1 /1 x,_ _ vyJ ~ 2 {2) [lj 2 V2 That means that in the 12th month I pay 12 /-, \ 12 X12 = I - I - I - I =130 -1 Xn+2 — An+1X^l2 + Bn+1X^l2 — A -r^ 4- R -r^ 4- X A -r^ i AR -r^2) — s±nXn+2 -|- onxn+2 -|- USinxn+2 -|- UJDnxn+2- Now, fn — xn+2 + anxn+l — An (2^+2 + anXrXl + bnx\J^2) + Bn(x x3(t + 1 Xi(t + 1 xs{t + 1 xe(t + 1 x7(t + 1 . xs(t + l, \xg(t + l)j 1111111 1\ 00000000* 010000000 001000000 000100000 000010000 000001000 .000000100. \0 00000010/ /xi(t)\ 1 x2(t) » x3(t) Xi(t) xs(t) xe(t) x7{t) . x8(t) \xa(t)/ The characteristic polynomial of the given matrix is A9 —A7 — A6 — A5 — A4 — A3 — A2 — A — 1. The roots of this polynomial are hard to explicitly express, but we can estimate one of them very well - Ai = 1.608 (why must it be smaller than (\/5 + l)/2)?). Thus the population grows according to this model approximately with the geometric sequence 1.608*. 3.C.3. Pond. Suppose we have a simple model of a pond where there lives a population of white fish (roach, bleak, vimba, nase, etc.). Assume that 20 % of babies survive their second year and from that age on they are able to reproduce. For these young fish, approximately 60 % of them survive their third year and in the following years the mortality can be ignored. Furthermore we assume that the birth rate is three times the number of fish that can reproduce. Such a population would clearly fill the pond very quickly. Thus we want to maintain a balance by using a 3=0 fjXj+l ' W3+i D \ J 3 X 3 + 1 3=0 and the aquired general solution of our recurrence equation is xn = ClXM + C2x^ + 1 9 3 ) ' [ 0 0.75 ) ' [ 9 3 From there we have for large k e N that (a) rj-iK 5 5\ (1 0\ (5 5 4 2) ' [o OJ ' \4 2 1_ (-10 25 10 I -8 20 (xn,..., xn_fc_|_i) (filled by the initial condition in the beginning of the process). In the next step we update the state vector Yn+1 = (xn+i,Xn, ... , Xn-k+2), where the first entry xn+i = a±xn + ■ ■ ■ + akxn-k+i is computed by means of a homogeneous difference equation, while the other entries are just a shift by one position with the last one forgotten. The corresponding square matrix of order k that satisfies Yn+i = A ■ Yn is as follows: (ai a2 . • ak-i ak\ 1 0 . 0 0 A = 0 1 ' 0 0 \0 0 1 0/ A while ago, we derived an explicit procedure for the complete formula for the solution of such an iterated process with a special type of matrix. In general, it will not be easy even for very similar systems. A typical case is the study of the dynamics of populations in some biological systems which we discuss below. The characteristic polynomial | A — A E\ of our matrix is p(A) = (-l)k(\k-a1\k-1-----ak), as we can check directly or by expanding the last column and employing induction on k. Thus, the eigenvalues are exactly the roots A of the characteristic polynomial of the linear recurrence. We should have expected this, because having a nonzero solution xn = \n to the linear recurrence means that the matrix A must bring (Afe,..., A)T to its A-multiple. Thus every such A must be eigenvalue of the matrix A. 3.3.2. Leslie model for population growth. Imagine that we are dealing with some system of individuals (cattle, insects, cell cultures, etc.) divided into m groups (according to their age, evolution stage, etc.). The state Xn is thus given by the vector Xn = (u1,.. .,um)T depending on the time tn in which we are observing the system. A linear model of evolution of such system is then given by the matrix A of dimension m, which gives the change of the vector Xn to Xn+i = A ■ Xn when time changes from tn to tn+1. 159 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS (b) rj-iK 10 2 7 1 0 0\ 0 0 ' 0 0 0 0 10 2 7 1 (c) rj-iK 10 10 9 3 10 10 9 3 1.05fe 0 0 0 _ 1.05fc /-30 100 60 V-27 90 because for large k e N we can set (a) (l 0\fe A 0N [o 0.8) ~ lo 0, (b) /0.95 0 V /0 0N \ 0 0.85y) ~ \ 0 0, (c) 'i.05 o y _ fimk o^ 0 0.75) ~ \ 0 0, Note that in variant (b), that is for a = 0.175, it is not necessary to compute the eigenvectors. Thus we have (a) DA _ 1_ {-10 25\ {D0 Kk)~10\-8 20)'\K0 = J_{5 (-2D0 + 5K0 10 U (-2D0 + 5K0 (b) (c) Dk Kk 0 0 0 0 D0 Ko Dk\ 1.05fe ^-30 100 Kk Do Ko 60 V~27 90 1.05fc {10 (-3D0 + 10K{ o, 60 V 9 (-3Do + 10#o) These results can be interpreted as follows: (a) If 2D0 < 5K0, the sizes of both populations stabilise on non-zero sizes (we say that they are stable); if 2Dq > 5Kq, both populations die out. (b) Both populations die out. (c) For 3D0 < 10K0 begins a population boom of both kinds; for 3D0 > 10K0 both populations die out. (h h h ■ fm — 1 fm ^ 0 0 . 0 0 0 0 . 0 0 0 0 t-z 0 0 0 0 . • Tm-1 0 / As an example, we consider the Leslie model for population growth. Here there is the matrix A = whose parameters are tied with the evolution of a population divided into m age groups such that /, denotes the relative fertility of the corresponding age group (in the observed time shift from N individuals in the i-th group arise new fcN ones - that is, they are in the first group), while t{ is the relative mortality in the i-th group in one time interval. Clearly such a model can be used with any number of age groups. All coefficients are thus non-negative real numbers and the numbers t{ are between zero and one. Note that when all r are equal one, it is actually a linear recurrence with constant coefficients and thus has either exponential growth/decay (for real roots A of the characteristic polynomial) or oscillation connected with potential growth/decay (for complex roots). Before we introduce a more general theory, we consider in more detail this specific model. Direct computation with the Laplace expansion of the last column yields the characteristic polynomial pm (A) of the matrix A for the model with m groups: Pm(A) = -Apm_i(A) + (-l)m"1/mTl . . .Tm_l. By induction we derive that this characteristic polynomial is of the form pm(A) = (-l)m(Am - axA"1"1-----am_!A - am). The coefficients ai,..., am, are all positive if all parameters n and fi are positive. In particular, — fm t\ . . . tm — l. Consider the distribution of the roots of the polynomial jiiV 1 Pm- We write the characteristic polynomial in the form ft, | Pm(A) = ±Xm(l - g(A)) where g(A) = a1\~1 + ■ ■ ■ + am\~m is a strictly decreasing non-negative function for A > 0. For A positive but very small the value of q will be arbitrarily large, while for large A, it will be arbitrarily close to zero. Thus, evidently there exists exactly one positive A for which g(A) = 1 and thus also Pm (A) = 0. In other words, for every Leslie matrix (with all the parameters /, and t{ positive), there exists exactly one positive real eigenvalue. For actual Leslie models of populations a typical situation is when the only real eigenvalue Ai is greater or equal to one, while the absolute values of the other eigenvalues are strictly less than one. If we begin with any state vector X, given as a sum of eigenvectors X = X\ + ■ ■ ■ + Xm 160 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Even a small change in the size of a can lead to a completely different result. This is caused by the constancy of the value of a: it does not depend on the size of the populations. Note that this restriction (that is, assuming a to be constant) has no interpretation in reality. But still we obtain an estimate on the sizes of a for stable populations. □ 3.C.5. Remark. Another model for the populations of predators and preys is the model by Lotka and Volterra, which describes a relation between the populations by a system of two ordinary differential equations. Using this model both populations oscillate, which is in accord with observations. Other interesting and well-described models of growth can be found in the collection of exercises after this chapter, (see 3.G.2 In linear models an important role is played by primitive matrices (3.3.3). 3.C.6. Which of the matrices A = 1/7 6/7 C ■ B = D E - '1/2 0 1^ 0 1 1/2 .I/2 0 1/6, /1/3 1/2 0 0 \ 1/2 1/3 0 0 0 1/6 1/6 1/3 \l/6 0 5/6 2/3/ 0 0\ 0 0 0 1 0 0 0 0 1 0/ 1 1 are primitive? Solution. (1/7 6/49 \ ^6/7 43/49 Ca = 3/8 1/4 l/4\ 1/4 3/8 1/4 . 3/8 3/8 1/2/ So the matrices A and C are primitive, since (respectively) A2 and C3 are positive matrices. The middle column of the matrix B71 is always (for n e N) the vector (0,1,0)T which contains the entry 0. Hence the matrix B cannot be primitive. The product I ° \ 0 a a/Q + b/3 'H' G \bj \5a/6 + 2b/3j , implies that the matrix D2 has in the right upper /1/3 1/2 0 0 \ 1/2 1/3 0 0 0 1/6 1/6 1/3 \l/6 0 5/6 2/3/ 0 a, b e corner a zero two-dimensional (square) sub-matrix. By induction, the same property is shared by the matrices D3 = D-D2, with eigenvalues A,, then iterations yield Ak ■ X = \kXi + ... A^lm. Thus under the assumption that |Aj| < 1 for all i > 2, all components in the eigensubspaces decrease very fast, except for the component \iXk. The distribution of the population among the age groups are thus very fast approaching the ratios of the components of the eigenvector to the dominant eigenvalue Ai. As an example, consider the matrix below where individual coefficients are taken from the model for sheep breeding, that is, the values r contain both natural deaths and activities of breeders. A- ( 0 0.95 0 0 V o 0.2 0 0.8 0 0 0.8 0 0 0.7 0 0.6 0 0 0 0.6 °\ 0 0 0 0/ The eigenvalues are approximately 1.03, 0, -0.5, -0,27 + 0.74i, -0.27- 0.74i with absolute values 1.03, 0, 0.5, 0.78, 0.78 and the eigenvector corresponding to the dominant eigenvalue is approximately XT = (30 27 21 14 8). We have chosen the eigenvector whose coordinates sum to 100, thus it directly gives us the percentage distribution of the population. Suppose instead that we wish for a constant population, and that one year old sheep are removed for consumption. Then we need ask how to decrease r2 so that the dominant eigenvalue would be one. A direct check shows that the farmer could then eat about 10% more of one year old sheep to keep the population constant. 3.3.3. Matrices with non-negative elements. Real matrices which have no negative elements have very special properties. They are very often present in practical models. Thus we introduce the Perron-Frobenius theory which deals with such matrices. Actually, we show some results of Perron, we omit the more general situations due to Frobenius.2 We begin with some definitions in order to formulate our Oskar Perron and Ferdinand Georg Frobenius were two great German mathematicians at the break of the 19th and 20th centuries. Even in this textbook we shall meet their names in Analysis, Number Theory, Algebra. Look up the index. 161 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS DA = D-D3,...,Dn = D ■ Dn~x,thus the matrix D is not primitive. The matrix E is a permutation matrix (in every row and every column there is exactly one non-zero element, 1). It is not difficult to see that a power of a permutation matrix is again a permutation matrix. Thus the matrix E is also not primitive. This is easily verified by calculating the powers E2, E3, E4. The matrix E4 is a unit matrix. □ D. Markov processes 3.D.I. Sweet-toothed gambler. A gambler bets on a coin -whether a flip results in a head or in a tail. At the start of the game he has three sweets. On every flip, he bets on a sweet. If he wins, he gains one additional sweet. If he looses, he looses the sweet. The game ends when he loses all sweets or has at least five sweets. What is the probability that the game does not end after four bets? Solution. Before the j-th round we can describe the state of the player by the random vector Xj = (po(J),Pi(J),P2(j),P3(J),P4(j),P5(J)), where p{ is the probability that the player has i sweets. If the player has before the j-th bet i sweets (i = 2, 3,4), then after the bet he has (i — 1) sweets with probability 1/2, and he has (i + 1) sweets with probability 1/2. If he attains five sweets or loses them all, the number of sweets does not change. The vector Xj+1 is then obtained from the vector Xj by multiplying it with the matrix A : At the start, 0 0 1 0 W After four bets the situation is described by the vector 3/16 X5 = A4X1= 5/lg 0 \3/8j 11 0.5 0 0 0 0 0 0.5 0 0 0 0 0.5 0 0.5 0 0 0 0 0.5 0 0.5 0 0 0 0 0.5 0 0 Vo 0 0 0 0.5 V X, = Positive and primitive matrices Definition. A positive matrix means a square matrix A all of whose elements a{j are real and strictly positive. A primitive matrix is a square matrix A whose power Ak is positive for some positive k e N. Recall that spectral radius of a matrix A is the maximum of absolute values of all (complex) eigenvalues of A. The spectral radius of a linear mapping on a (finite dimensional) vector space coincides with the spectral radius of the corresponding matrix for some basis. In the sequel, the norm of a matrix 4 £ I™ or of a vector i el" will mean the sum of the absolute values of all elements. For a vector x we write \x\ for its norm. The following result is very useful and hopefully understandable. But the difficulty of its proof is rather not typical for this textbook. If you prefer, readjust the theorem and skip the proof till later on. Perron Theorem Theorem. If A is a primitive matrix with spectral radius A £ R, then A is a root of the characteristic polynomial of A with multiplicity one and A is strictly greater than the absolute value of all other eigenvalues of A. Furthermore, there exists an eigenvector x associated with A such that all elements xi of x are positive. Proof. We shall present rather a sketch of the proof and we shall rely on intuition from elementary geometry. Notice that the matrices A and Ak share the eigenvectors, while the corresponding eigenvalues are A and \k respectively. Thus the assertion of the theorem holds if and only if the same is true for Ak. In particular, we may assume the matrix A itself is positive, without any loss of generality. Many of the necessary concepts and properties will be discussed in chapter four and in the subsequent chapters devoted to analytical aspects, so the reader might come back to this proof later. The first step is to show the existence of an eigenvector which has all elements positive. Consider the standard simplex S = {x = (xi,..., xn)T, \x\ = l,x{ > 0, i = 1,. .. ,n}. Since all elements in the matrix A are positive, the image Ax for x e S has all coordinates positive too. The mapping x ^ \A- x\~1(A ■ x) thus maps S to itself. This mapping S —> S satisfies all the assumptions of the Brouwer fixed point theorem3 and thus there This theorem is a great example of a blend of (homological) Algebra, (differential) Topology and Analysis. We shall discuss it in Chapter 9, cf. ?? on page ??. 162 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS that is, the probability that the game ends in the fourth bet or sooner is one half. Note that the matrix A describing the evolution of the probabilist vector X is itself probabilistic, that is, in each column the sum is one. But it does not have the property required by the Perron-Frobenius theorem. By a simple computation you can check (or you can see it straight without any computation) that there exist two linearly independent eigenvectors corresponding to the eigenvalue 1. These correspond to the case that the player has no sweet, that is a; = (1,0,0, 0,0,0)T, or to the case when the player has 5 sweets and the game thus ends with him keeping all the sweets, that is, x = (0,0,0,0,0,1)T. All other eigenvalues (approximately 0.8, 0.3, —0.8, —0.3) are in absolute value strictly smaller than one. Thus the components in the corresponding eigensub-spaces with iteration of the process with arbitrary initial distribution vanish and the process approaches the limiting value of the probabilistic vector of the form (a,0,0,0,0,l — a), where the value a depends on the initial number of sweets. In our case it is a = 0.4, if there were 4 sweets at the start, it would be a = 0.2 and so on. □ 3.D.2. Car rental. A company that rents cars every week has two branches - one in Prague and one in Brno. A car rented in Brno can be returned in Prague and vice versa. After some time it has been discovered that in Prague, roughly 80 % of the cars rented in Prague and 90 % of the cars rented in Brno are returned there. How to distribute the cars among the branches such that in both there is at the start of the week always the same number of cars as in the week before? How will the situation look like after a long time, if the cars are distributed at the start in a random way? Solution. Denote the components of the vector in question, that is, the initial number of cars in Brno and in Prague by xb and xp respectively. The distribution of the cars between branches is then described by the vector x = ( XB ). If we consider such a multiple of the vector x such that the sum of its components in 1, then its components give the percentage distribution of the cars. According to the statement, the state at the end of the week is described by the vector {OA 0.2\ (xB\ ^ . . {OA 0.2\ , J V0.9 0.8){xP)-ThemamxA = [0.9 0.8Jthusde-scribes our (linear) system of car rental. If at the end of the week in the branches there should be the same number of cars as at the beginning, we are looking for such a vector x for exists vector y 0. In order to prove the rest of the theorem, we consider the mapping given by the matrix A in a more suitable basis, where the coordinates of the eigenvector would be (A,..., A). Moreover, We multiply the mapping by by the constant A-1. Thus we work with the matrix B, B = X~1(Y~1 ■ A ■ Y), where Y is the diagonal matrix with coordinates of the above eigenvector y on its diagonal. Evidently B is also a positive matrix. By the construction, the vector z = (1,..., 1)T is its eigenvector with eigenvalue 1, because Y ■ z = y. It remains to prove that fi = 1 is a simple root of the characteristic polynomial of the matrix B and that all other roots have absolute value strictly smaller than one. Then the proof of the Perron thoerem is finished. In order to do that we use an auxiliary lemma. Consider for the moment the matrix B to define the linear mapping that maps the row vectors u = (ui,..., un) n> u ■ B = v, that is, using multiplication from the right (i.e. B is viewed as the matrix of a linear map on one-forms). Since z = (1,..., 1)T is an eigenvector of the matrix B (with eigenvalue 1), the sum of the coordinates of the row vector v n n u ■ B ■ (1,..., 1)T = ^ Uibij = y^Uj = l, i,j=l i=l whenever u e S. Therefore the simplex S maps onto itself and thus has in S a (row) eigenvector w with eigenvalue one (a fixed point, by the Brouwer theorem again). Because some power B is positive by our assumption, the image of the simplex S under B lies inside of S. We continue with the row vectors. Denote by P the shift of the simplex S into the origin by the eigenvector w we have just found. That is, P = — w+S. Evidently P is a set containing the origin and is denned by linear inequalities. Moreover, the vector subspace VcR" generated by P is invariant with respect to the action of the matrix B through multiplication of the row vectors from the right. Restriction of our mapping to P, and P itself satisfy the assumptions of the auxiliary lemma proved below and thus all its eigenvalues are strictly smaller than one. Now, the entire space decomposes as the sum R" = V ffi span{w} of invariant subspaces, w is the eigenvector with eigenvalue 1, while all eigenvalues of the restriction to V are strictly smaller in absolute value. The theorem is nearly proved. We have just to consider the problem that the mapping under question was given by 163 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS which that Ax = x. That means that we are looking for an eigenvector of the matrix A associated with the eigenvalue 1. The characteristic polynomial of the matrix A is (0.1 — A) (0.8 — A) — 0.9.0.2 = (A - 1)(A + 0.1) and 1 is indeed an eigenvalue of the matrix A. The corresponding eigenvector x = (^) satisfies the equation^ _^2) 0. It is thus a multiple of the vector ■ F°r determining the percentage distribution we are looking for a multiple such i A).2> that xb+xp = 1. That is satisfied by the vector TT ( q g 0.18 The suitable distribution of the cars between Prague 0.82y and Brno is such that 18% of the cars are in Brno and 82% of the cars are in Prague. If we choose arbitrarily the initial state x = J, then the state after n weeks is described by the vector xn = Anx. It is useful to express the initial vector x in the basis of the eigenvectors of A. The eigenvector for the eigenvalue 1 has already been found. Similarly we find eigenvectors for the eigenvalue —0.1. That is for instance the vector -1 1 )' The initial vector can be expressed as a linear combination a; = a ^ q g J ^ ^ V 1 ^' ^e State a^ter U wee'cs *s tnen ;o.i8\ ,(-\ xn = An(a ( 0g2 ] +b The second summand is approaching zero for n —> oo. Thus ^0.18^ the state stabilises at a 0.82 That is, the coordinate of the initial vector at the direction of the first eigenvector. The coefficient can be easily expressed using the initial states of the cars: a = ^f^. □ 3.D.3. In a certain game you can choose one of two opponents. The probability that you beat the better one is 1/4, while the probability that you beat the worse one is 1/2. But the opponents cannot be distinguished, thus you do not know which one is the better one. You await a large number of games. For each of them you can choose a different opponent. Consider the following two strategies: 1. For the first game choose the opponent randomly. If you win a game, carry on with the same opponent; if you lose a game, change the opponent. multiplication of the row vectors from the right with the matrix B, while originally we were interested in the mapping given by the matrix B and multiplication of the column vectors were from the left. But this is equivalent to the multiplication of the transposed column vectors with the transposed matrix B in the usual way - from the left. Thus we have proven the claim about eigenvalues for the transpose of B. But transposing does not change the eigenvalues and so the proof is complete. □ A bounded polyhedron in R™ is a nonempty subset defined by linear inequalities, sitting in some large enough ball. Simplex S from the proof or any its translation are examples. Lemma. Consider any bounded polyhedron P C Kn, containing a ball around origin 0 G Rn. If some iteration of the linear mapping ip : Kn —> Kn maps P into its interior (that isip(P) C P and the image does not intersect with the boundary), then the spectral radius of the mapping ip is strictly less than one. Proof. Consider the matrix A of the mapping ip in the standard basis. Because the eigenvalues of Ak are the fc-th powers of the eigenvalues of the matrix A, we may assume (without loss of generality) that the mapping ip already maps P into P. Clearly ip cannot have any eigenvalue with absolute value greater than one. We argue by contradiction and assume that there exists an eigenvalue A with |A| = 1. Then there are two possibilities, either \k = 1 for suitable k or there is no such k. The image of P is a closed set (that means that if the points in the image ip(P) get arbitrarily close to some point y in R™, then the point y is also in the image - this is a general feature of the linear maps on finite dimensional vector spaces). By our assumption, the boundary of P does not intersect with the image. Thus ip cannot have a fixed point on the boundary and there cannot even be any point on the boundary to which some sequence of points in the image would converge. The first argument excludes that some power of A is one, because such a fixed point of ipk on the boundary of P would then exist and thus it would be in the image. In the remaining case there would be a two-dimensional subspace If C R" on which the restriction of ip acts as a rotation by an irrational angle and thus there exists a point y in the intersection of W with the boundary of P. But then the point y could be approached arbitrarily close by the points from the set ipk(y) (through all iterations) and thus would have to be in the image too. This leads to a contradiction and thus the lemma is proved. □ 3.3.4. Simple corollaries. Once we know the Perron theorem, the following very useful claim has a sur-/ prisingly simple proof. It shows how strong is the primitivity assumption of a matrix. 164 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 2. For the first two games, choose an opponent randomly. Then for the next two games, if you lost both the previous games, change the opponent, otherwise stay with the same opponent. Which of the two strategies is better? Solution. Both strategies define a Markov chain. For simplicity denote the worse opponent by A and the better opponent by B. In the first case for the states "game with A" and "game with B" (in this order), we obtain the probabilistic transition matrix i/2 3/4' 1/2 1/4, This matrix has all of its elements positive. Thus it suffices to find the probabilistic vector x^, which is associated with the eigenvalue 1. We compute _ 1 5' 5 Its components correspond to the probabilities that after a long sequence of games the opponent is the player A or player B. Thus we can expect that 60 % of the games will be played against the worse of the two opponents. Because 2 3 1 2 1 5 ~ 5 ' 2 + 5 ' 4' there will be roughly 40 % against the better of the two opponents. For the second strategy, use the states "two games in a row with A" and "two games in a row with B" which lead to the probabilistic transition matrix 3/4 9/16\ 1/4 7/16,)' It is easily determined that now 9 4^ 13' 13 Against the worse opponent one would then play (9/4)-times more frequently than against the better one. Recall that for the first strategy it is (3/2)-times more frequently. The second strategy is thus better. Note also that for the second strategy, roughly 42,3 % of the games are winning ones. It suffices to enumerate 11 9 1 4 1 0.423 = — =---+---. 26 13 2 13 4 □ Corollary. If A = {ai3) is a primitive matrix and x G W1 is its eigenvector with all coordinates non-negative and eigenvalue a, then a > 0 is the spectral radius of A. Moreover, n n mmj6{i,...,n} < a < maXj6{lv..tTt} y^ajj- i=l i=l Proof. Because A is primitive, we can choose k such that Ak has only positive elements. Then Ak ■ x = \kx is a vector with all coordinates strictly positive. Obviously a > 0. According to the Perron theorem, the spectral radius fi of A is an eigenvalue and the associated eigenvectors y have positive coordinates only. Thus we may choose such an eigenvector with the property that the difference x — y has only strictly positive coordinates. Then for all (positive integer) powers m we have 0 < Am ■ (x - y) = Xmx - nmy, but also a < [i. If [i = a + a, a > 0, then 0 < Xmx — (a + a)my < Xm(x -y- m^y) A which is clearly negative for m large enough. Hence a = fi. It remains to estimate the spectral radius using the minimum and maximum of sums of individual columns of the matrix. We denote them by bmin and bmax- Choose x to be the eigenvector with the sum of coordinates equal to one and count: y' aijXj = y' \xj = a n / n a = ^ f ^ dij j Xj <~^2 kmaxXj = K 71 / 71 a y ^ ( y ^ o>ij J xj ^ y ^ bmin2j bmin. j = l ^2 = 1 ' j = l □ Note that for instance all Leslie matrices from 3.3.2, as soon as all their parameters /, and tj are strictly positive, are primitive. Thus we can apply the just derived results to them. (Compare this with the ad hoc analysis of the roots of the characteristic polynomial from 3.3.2) 3.3.5. Markov chains. A very frequent and interesting case of linear processes with only non-negative elements in a matrix is a mathematical model of a system which can be in one of m states with various probabilities. At a given point of time the system is in state i with probability x{. The transition form the state i to the state j happens with probability Uj. We can write the process as follows: at time n the system is described by the stochastic vector (we also say probability vector) xn = (ui(n),... ,um(n))T. This means that all components of the vector x are real non-negative numbers and their sum equals one. Components 165 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.D.4. Absent-minded professor. Consider the following situation. An absent-minded professor carries an umbrella with him, but with probability 1/2 he forgets it from wherever he is leaving. In the morning, he leaves home to go to his office. From his office, he goes for lunch at a restaurant, and then goes back to his office. After he is finished with his work at the office, he leaves for home. Suppose (for simplicity) that he does not go anywhere else. Suppose also that if he leaves it in the restaurant, it will remain there until the next time. Consider this situation as a Markov process and write down its matrix. What is the probability that after many days in the morning the umbrella is located in the restaurant? It is convenient to choose one day as a time unit: from morning to morning. Solution. /11/16 3/8 l/4\ A = 3/16 3/8 1/4 V 1/8 1/4 1/2/ Compute the element a\, that is, the probability that the umbrella starts its day at home and stays there, that is, it will be there the next morning. There are three distinct possibilities for the umbrella: D the professor forgets it when leaving home in the morning pi = h DPD the professor takes it to the office, then he forgets to take it on to lunch and in the evening he takes it home: p2 = 1 I 1 = 1 2 ' 2 ' 2 8' DPRPD the professor takes the umbrella with him all the time and does not forget it anywhere: Pz = \ ■ \ ■ \ ■ \ = \\- In total a{=pi+p2+p3 = The eigenvector of this matrix corresponding to the dominant eigenvalue 1 is (2,1,1), and thus the desired probability is 1/(2 + 1 + 1) = 1/4. □ 3.D.5. Algorithm for determining the importance of pages. Internet browsers can find (almost) all pages containing a given word or phrase on the Internet. But how can a user sort the pages such that a list is sorted according to the relevance of the given pages? One of the possibilities is the following algorithm: the collection of all found pages is considered to be a system, and each of the found pages is one of its states. We describe a random walk on these pages as a Markov process. The probabilities of transitions between pages are given by the hyperlink: each link, say from page give the distribution of the probability of individual possibilities for the state of the system. The distribution of the probabilities at time n + 1 is given via multiplication by the transition matrix T = (Uj), that is, ^n+l — T • Xn. Since we assume that the vector x captures all possible states of the system and moves again to some of these states with the total probability one, all columns of T are also given by stochastic vectors. We call such matrices stochastic matrices. Note that every stochastic matrix maps every stochastic vector a; to a stochastic vector Tx again: X ti5x5 X (X) xi ~ X 1. Such a sequence xn+i = Txn is called a (discrete) Markov process and the resulting sequence of vectors xq, x±,... is called a Markov chain xn. Now we can exploit the Perron-Frobenius theory in its full power. Because the sum of the rows of the matrix is always equal to the vector (1,..., 1), we see that the matrix T — E is singular and thus one is an eigenvalue of the matrix T. Furthermore, if T is a primitive matrix (for instance, when all elements are non-zero), we know from the corollary 3.3.4 that one is a simple root of the characteristic polynomial and all others have absolute value strictly smaller than one. This leads to: Ergodic Theorem Theorem. Markov processes with primitive matrices T satisfy: • there exists a unique eigenvector of the matrix T with the eigenvalue 1, which is stochastic, • the iterations Tkx0 approach the vector x^for any initial stochastic vector xq. Proof. The first claim follows directly from the positiv-j+ ,, ity of the coordinates of the eigenvector derived in the "Kt^> Perron theorem. Next, assume that the algebraic and geometric multiplicities of the eigenvalues of the matrix T are the same. Then every stochastic vector x0 can be written (in the complex extension Cn) as a linear combination xq = ci^oo + c2y2 H-----h cnyn, where y2 ... ,yn extend x^ to abasis of the eigenvectors. But then the fc-th iteration gives again a stochastic vector Xk = Tk ■ x0 = ci^oo + X2c2y2 H-----h Xkcnyn. Now all eigenvalues A2, ■ ■ ■ \n are in absolute value strictly smaller than one. So all components of the vector xk but the first one approach (in norm) zero. But xk is still stochastic, thus the only possibility is that c\ = 1 and the second claim is proved. 166 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS A to page B, determines the probability (l/(total number of links from the page A)), with which the process moves from page A to page B. If from some page there are no leading links, we consider it to be a page from which a link leads to every other page. This gives a probabilistic matrix M (the element corresponds to the probability with which we move from the i-th page to the j-th page). Thus if one randomly clicks on links in the found pages (and from a linkless page one just chooses randomly the next one) the probability that at a given time (sufficiently large from the beginning) one is located on the i-th page corresponds to the i-th component of the unit eigenvector of the matrix M, corresponding to the eigenvalue 1. Looking at the sizes of these probabilities we define the importance of the individual pages. This algorithm can be modified by assuming that users stop clicking from a link to a link after certain time and again starts on a random page. Suppose that with probability d he chooses a new page randomly, and with probability (1 — d) keeps on clicking. In such a situation the probability of transition between any two pages Si and Sj is non-zero - it is d/n + (1 — d) /total number of links at the page Si if from Si there is a link to Sj, and d/n otherwise (if there are no links at Si, then it is 1 /n). According to the Perron-Frobenius theorem the eigenvalue 1 is with multiplicity one and dominant, and thus the corresponding eigenvector is unique (if we chose transitional probabilities only as described in the previous paragraph, it would not have to be so). For an illustration, consider pages A, B, C and D. The links lead from A to B and to C, from B to C and from C to A, from D nowhere. Suppose that the probability that the user chooses a random new page is 1/5. Then the matrix M looks as follows: M The eigenvector corresponding to the eigenvalue 1 is (305/53,175/53,315/53,1), the importance of the pages is thus given according to the order of the sizes of the corresponding components, that is, C > A > B > D. Another various applications of the Markov chains are in the additional exercises after this chapter, see 3.G.3 A/20 1/20 17/20 l/4\ 9/20 1/20 1/20 1/4 9/20 17/20 1/20 1/4 \l/20 1/20 1/20 1/4/ In fact, even if the algebraic and geometric multiplicities of eigenvalues do not coincide we reach the same conclusion using a more detailed study of the root subspaces of the matrix T. (We meet them when discussing the Jordan matrix decomposition later in this chapter.) Consequently, even in the general case the eigensubspace spanjxoo} comes with the unique invariant (n — 1)-dimensional complement, on which are all eigenvalues in absolute value smaller than one and the corresponding components in xk approach zero as before. See the note 3.4.11 where we finish this argument in detail. □ 3.3.6. Iteration of the stochastic matrices. We reformulate the previous theorem into a simple, but surprising result. By convergence to a limit matrix in the following theorem we mean the following: if we say that we want to bound the possible error e > 0, then we can find a lower bound on the number of iterations k after which all the components of the matrix differ from the limit one by less than e. Corollary. Let T be a primitive stochastic matrix from a Markov process and let be the stochastic eigenvector for the dominant eigenvalue 1 (as in the Ergodic Theorem above). Then the iterations Tk converge to the limit matrix T^,, whose columns all equal to x^. Proof. Columns in the matrix Tk are images of the vectors of the standard basis under the corresponding iterated linear mapping. But these are images of the stochastic vectors and thus they all converge to x^. □ 3.3.7. Final brief remark. Before leaving the Markov processes, we shortly mention their more general versions with matrices which are not primitive. Here we would need the full Frobenius-Perron theory. Without going into technicalities, consider a process with a block wise diagonal or an upper triangular matrix T, T = P R 0 Q and imagine first that P, Q are primitive and R = 0. Here we can again apply the above results block wise. In words, if we start in a stay xq with all probability concentrated in the first block of coordinates, the process converges to the value Xoo which again has all the probability distributed among the first block of coordinates, and the same for the other block. If R > 0 then we can always jump to the states corresponding to the first block from those in the second block with a non-zero probability and the iterations get more complicated: T P2 P-R + R-0 Q2 P3 P2-R + P-R-Q + R 0 Q3 167 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS £. Unitary spaces In the previous chapter we defined the scalar product for real vector spaces (2.3.18). In this chapter we extend its definition to the complex spaces (3.4.1). 3.E.I. Groups 0{n) and U(n). If we consider all linear mappings from R3 to R3 which preserve the given scalar product, that is, with respect to the definitions of the lengths of the vectors and deviations of two vector all linear mappings that preserve lengths and angles. Then these mappings form a group (see 1.1.1) with respect to the operation of composition. The composition of two such mappings is, by definition also a mapping that preserves lengths and angles, the unit element of the group is the identity mapping, and the inverse element for a given mapping is its inverse mapping. Such a mapping exists by the condition on the lengths preservation. The matrices of such mappings thus form a group with the operation of matrix multiplication (see ); it is called the orthogonal group and is denoted by 0(n). It is a subgroup of the group of all invertible mappings from R™ to R™. Moreover, if we require that the matrices have determinant one, then we speak of the special orthogonal group 5*0(71). In general the determinant of a matrix in 0{n) can be either 1 or — 1. Similarly we define the unitary group U(n) as the group of all (complex) matrices that correspond to the complex linear mappings from Cn to Cn which preserve a given scalar product in a unitary space. Analogously, SU (n) denotes the subgroup of matrices in U (n) with determinant one. In general, the determinant of a matrix in U (n) can be any complex unit. 3.E.2. Consider the vector space V of functions R —> C. Determine whether the mapping p from the unitary space V is linear when: i) p(u) = \u where AeC ii) p(u) = u* iii) p(u) = u2(= u.u) iv) p(u) = f For suitable functions V is a unitary space of infinite dimension. The scalar product is then defined by the relation f-9 = f-00f(x)g(x)dx. 3.E.3. Show that if H is a Hermitian matrix, then U = exp(iH) = J2n°=o ~h. (iH)n is a unitary matrix and compute its determinant. An interesting special case is when P = E and R is positive. Then Q — E must be a regular matrix and a simple computation yields the general iteration (notice E and Q commute and thus (E - Q)(E + Q -\-----hQfe_1) = E - Qk) Tk_(E R(E-Q)-\E-Qkj\ [O Qk J' Thus, the entire first block of states is formed by eigenvectors with eigenvalue 1 (so these states stay constant with probability 1), while the behavior on the other block is more complicated. 4. More matrix calculus We have seen that understanding the inner structure of matrices is a strong tool for both computation and analysis. It is even more true when considering numerical calculations with matrices. Therefore we return now to the abstract theory. We introduce special types of linear mappings on vector spaces. We consider general linear mappings whose structure is understood in terms of the lordan normal form (see 3.4.10). In all these cases, complex scalars are essential. So we extend our discussion of scalar product (see 2.3.18-2.3.21) to complex vector spaces. Actually, in many areas the complex vector spaces are the essential platform necessary for introducing the mathematical models. For instance, this is the case in the so-called quantum computing, which became a very active area of theoretical computer science. Many people hope to construct an effective quantum computer soon. 3.4.1. Unitary spaces and mappings. The definitions of scalar product and orthogonality easily extend to the complex case. But we do not mean the complex bilinear symmetric forms a, since there the quadratic expressions a(v, v) are not real in gen- eral and thus we would not get the right definition of length of vectors. Instead, we define: Unitary spaces Unitary space is a complex vector space V along with the mapping V x V —> C, (u, v) 1-» u ■ v called scalar product and satisfying for all vectors u,v,w e V and scalars a e C the following axioms: (1) u ■ v = Tm (the bar stands for complex conjugation), (2) (au) ■ v = a(u ■ v), (3) (u + v) ■ w = u ■ w + v ■ w, (4) if u 7^ 0, then u ■ u > 0 (notice u ■ u is always real). The real number y/v ■ v is called the norm of the vector v and a vector is normalized, if its norm equals one. Vectors u and v are said to be orthogonal if their scalar product is zero. A basis composed of mutually orthogonal and normalized vectors is called an orthonormal basis of V. 168 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Solution. From the definition of exp we can show that exp(A + B) = exp(A). exp(5) just as with the exponential mapping in the domain of real numbers. Because (u + v)* = u* + v* and (cv)* = cv*, we obtain oo 1 oo 1 u* = (£-}(iH)nr = ^2-}(-iH*r n=0 ' n=0 and since H* = H, then oo 1 n=0 Thus U*U = exp(iH) exp(-iH) = exp(O) = 1. det(U) _ trace(iH) □ 3.E.4. Hermitian matrices A, B, C satisfy [A,C] = [B, C] = 0 and [A, B] ^ 0, where [, ] is a commutator of matrices denned denned by the relation [A, B] = AB — BA. Show that at least one eigensubspace of the matrix C must have dimension > 1. Solution. We prove it by contradiction. Assume that all eigensubspaces of the operator C have dim = 1. Then for any vector u we can write u = J2k ckuk where uk are linearly independent eigenvectors of the operator C associated with the eigenvalue \k (and Ck = u.uk) For these eigenvectors 0 = [A, C]uk = ACuk - CAuk = \kAuk - C{Auk). From there it follows that Auk is an eigenvector of the matrix C with the eigenvalue A^. But then that Auk = \kuk for some number A^. Similarly, Buk = \kuk for some number \k . For the commutator of matrices A and B is then obtained [A, B]uk = ABuk - BAuk = \k\kuk - \k\kuk = 0 so that [A, B]u = [A, B] ^ ckuk - ck[A, B]uk = 0. k k Because u is arbitrary, it follws that [A, B] = 0, which is a contradiction. □ 3.E.5. Applications to quantum physics. In quantum f\ly physics we do not use numbers as in classical ^xjVc physics, but a Hermitian operator. This is nothing ?P but a Hermitian mapping, which can (and often does) lead to a linear transformation between unitary spaces of infinite dimension. We can imagine this as a matrix At first sight this is an extension of the definition of Euclidean vector spaces into the complex domain. We will continue to use the alternative notation (u, v) for the scalar product of vectors u and v. As in the real domain, we obtain immediately from the definition the following simple properties of the scalar product for all vectors in V and scalars in C: u ■ u = 0 if and only if u = 0 u ■ (av) = d(u ■ v) u-(y + w)=u-v + u- w u ■ 0 = 0 ■ u = 0 i 3 i,j where the last equality holds for all finite linear combinations. It is a simple exercise to prove everything formally. For instance, the first property follows from (1) since the product u ■ u has to be the complex conjugate to itself. A standard example of the scalar product over the complex vector space Cn is (xi, {Vu- xiyi H-----h xnyn. This expression is also called the standard (positive definite) Hermitian form on C". By conjugation of the coordinates of the second argument, this mapping satisfies all the required properties. The space Cn with this scalar product is called the standard unitary space of dimension n. We can denote this scalar product of vectors x and y with matrix notation as yT ■ x (here the complex conjugation indicated by the bar is performed on all components of y). As usual, those mappings which leave the additional structure invariant are of great importance. Unitary mappings A linear mapping ip : V —> W between unitary spaces is called a unitary mapping, if for all vectors u, v G V u ■ v = ip(u) ■ \u ■ ei|2 + + \u ■ ek\ This property is called the Bessel inequality. (4) If(ei,...,ek) is an orthonormal system of vectors, then u £ span{ei,..., ek} if and only if II l|2 I 2 i i I |2 =|u-ei| + ■ ■ ■ + \u ■ ek\ ■ This is called the Parseval equality. (5) If(e±,..., ej;) is an orthonormal system of vectors and u £ V, then the vector w=(u- e1)e1 H-----h (u ■ ek)ek is the only vector which minimizes the norm \\u — v\\ among all v £ span{ei,.. ., ek}. Proof. The verifications are all based on direct computations: (2): The result is obvious if v = 0. Otherwise, define the vector w = u — ^v, that is, w + v and compute 2 (TFv) m\r - tarnu -v)-^(vu)+ -,+1,4-1+11 (u-v)(u-v) ll^ll4 Hwll2!^!)2 = ||u||2||w||2 - 2(u ■ v)(vTv) + (u ■ v)(v~~v) These are non-negative real values and thus, ||w||2|M|2 > \u ■ v\2 and the equality holds if and only if w = 0, that is, whenever u and v are linearly dependent. (1): It suffices to compute ||u + w||2 = ||u||2 + ||w||2 + U'W + W'U = ||u||2 + \\v\\2 + 2Re(u ■ v) < \\u\\2 + ||w||2 + 2|u-'y| < ||u||2 + |H|2 + 2||u|||H| = (\H\ + \M)2 Since we deal with squares of non-negative real numbers, this means that ||u + w|| < ||u|| + ||w||. Furthermore, equality implies that in all previous inequalities equality also holds. This is equivalent to the condition that u and v are linearly dependent (using the previous part). (3), (4): Let (e1,..., ek) be an orthonormal system of vectors. We extend it to an orthonormal basis (ei,..., en) (that is always possible by the previous theorem). Then, again using the previous theorem, we have for every vector u £ V n n k \\u\\2 = ^2(u ■ et)(u ■ et) = ^2 \u ' ei\2 > \u ' ei\2 i=l i=l i=l But that is the Bessel inequality. Furthermore, equality holds if and only if u ■ e{ = 0 for all i > k, which proves the Parseval equality. (5): Choose an arbitrary v £ sp&n{e1,... ,ek} and extend the given orthonormal system to the orthonormal basis (ei,... ,e„). Let («i,... ,un) and (xu ...,xk, 0,..., 0) be 172 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS U, where X is a lower triangular matrix given by the Gaussian reduction, and U upper triangular. From this equality A = X~x U, which is the desired decomposition. (Thus we have to compute the inverse of X). □ 3.F.3. Find the LU-decomposition of the matrix C -!1 !)■ 0 3.F.4. Ray-tracing. In computer 3D-graphics the image is very often displayed using the Ray-tracing algorithm. The basis of this algorithm is an approximation of the light waves by a ray (line) and an approximation of the displayed objects by polyhedrons. These are bounded by planes and it is necessary to compute where exactly the light rays are reflected from these planes. From physics we know how the rays are reflected - the angle of impact equals the angle of reflection. We have already met this topic in the exercise I.E.10. The ray of light in the direction v = (1,2,3) hits the plane given by the equation x + y + z = 1.1n what direction is it reflected? Solution. The unit normal vector to the plane is n = ^5= (1,1,1). The vector that gives the direction of the reflected ray vr lies in the plane given by the vectors v, n. We can express it as a linear combination of these vectors. Furthermore, the rule for the angle of reflection says that (ii,n) = —(vf>,n). From there we obtain a quadratic equation for the coefficient of the linear combination. This exercise can be solved in an easier, more geometric way. From the diagram we can derive directly that Vfj = v — 2(i>, n)n In our case, vr = (—3, —2, —1). □ 3.F.5. Singular decomposition, polar decomposition, pseudoinverse. Compute the singular decomposition of the matrix A ■ 0 0 -10 0 V 0 0 0 / . Then compute its polar decomposition and find its pseudoinverse. Solution. First compute ATA: A1 A: 0 °\ = 0 0 0 Vo 0 \ coordinates of u and v under this basis. Then ||u—u||2 = \u1-x1\2-\-----\-\uk-xk\2 + \uk+1\2-\-----\-\un\2 and this expression is clearly minimized when choosing the individual vectors to be x\ = u\,..., xk = uk. □ 3.4.4. Unitary and orthogonal mappings. The properties fit °f ormogonal mappings have direct analogues in the complex domain. We can easily formu-SS^issES late them and prove together: Proposition. Consider the linear mapping (endomorphism) p : V —> V on the (real or complex) space with scalar product. Then the following conditions are equivalent. (1) ip is unitary or orthogonal transformation, (2) ip is linear isomorphism and for every u, v G V (2): The mapping p is injective, therefore it must be onto. Also p(u) ■ v = p(u) ■ pip'1^)) = u ■ p'1^). (2) => (3): The standard scalar product is in K". It is given for columns x, y of scalars by the expression x ■ y = yTE x = yx, where E is the unit matrix. Property (2) thus means that the matrix A of the mapping p is invertible and yT Ax (yTA (A-^yf (A-^yf 0 for all x. This means that By substituting the complex conjugate of the expression in the parentheses for x we find that equality is possible only when AT = A-1. (We may also rewrite the expression as yT(A— (A~1)T)x and see the conclusion by substituting the basis vectors for x and y.) (3) => (4): This is an obvious implication. (4) => (5) In the relevant basis, the claim is expressed via the matrix A of the mapping p as the equation AAT = E, which is ensured by (4). (5) => (6): We have\ATA\ = \E\ = \AAT\ = \A\~\A\ = 1, there exists the inverse matrix A-1. But we also have AATA = A, therefore also ATA = E which is expressed exactly by (6). (6) => (1): In the chosen orthonormal basis p(u) ■ p(v) = (Ay) Ax = yTATAx = yTEx = -T y x where x and y are columns of coordinates of the vectors u and v. That ensures that the scalar product is preserved. □ 173 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS to obtain a diagonal matrix. We need to find an orthonormal basis under which the matrix is diagonal and the zero row is the last one. This can be obtained by rotating about the x-axis through a right angle. The y-coordinate then goes to z and z goes to —y. This rotation is an orthogonal transforma- /I 0 0\ tion given by the matrix V = 10 0 1 . By this, we \0 -1 Oj have found the decomposition AT A = VBVT. here, B is diagonal with eigenvalues (1,^,0) on the diagonal. Because B = (AV)T(AV), the columns of the matrix 0 0 -\\ /l 0 0\ AV = -1 0 0 V 0 0 0 0 0 1 ^0 -1 0; o \ o^ -10 0 0 0 0y form an orthogonal system of vectors, which we normalise and extend to a basis. That is then of the form (0,-1,0), (1,0,0), (0,0,1). The transition matrix of changing from this basis to the standard one is then U Finally, we obtain the decomposition A = UVBV '0 0 -I\ -10 0 = , 0 0 0 / ' 0 1 0\ I 0 0\ I 0 -1 0 00 \ 0 0 0 , 0 0 1/ \0 0 0/ \0 1 The geometrical interpretation of decomposition is the following: first, everything is rotated through a right angle by the x-axis, then follows a projection to the xy plane such that the unit ball is mapped on the ellipse with major half-axes 1 and \. The result is then rotated through a right angle about the z-axis. The polar decomposition A = P ■ W can be obtained from the singular one: P :=UVBUt and W := UVT, that is, 1 0 0 0 0 0 '\ 0 °^ 0 1 0 ,0 0 0; Characterizations from the previous theorem deserve J.',, some notes. The matrices A e Mat„(K) with the property A-1 = AT are called unitary matrices for | complex scalars (in the case R we have already used the name orthogonal matrices for them). The definition itself immediately implies that a product of unitary (orthogonal) matrices is again unitary (orthogonal). The same is true for inverses. Unitary matrices thus form a subgroup U(n) c G1„(C) in the group of all invertible complex matrices with the product operation. Orthogonal matrices form a subgroup 0(n) c G1„(R) in the group of real invertible matrices. We speak of a unitary group and of an orthogonal group. The simple calculation 1 = det E = det(AAT) = det Adet A = | det A\2 shows that the determinant of a unitary matrix has norm equal to one. For real scalars the determinant is ±1. Furthermore, if Ax = \x for a unitary or orthogonal matrix, then (Ax) ■ (Ax) = x ■ x = | A |2 (a; ■ x). Therefore the real eigenvalues of orthogonal matrices in the real domain are ±1. The eigenvalues of unitary matrices are always complex units in the complex plane. The same argument as we have seen with the orthogonal mappings imply that orthogonal complements of invariant subspaces with respect to unitary mappings ip : V —> V are also invariant. Indeed, if V be a unitary mapping of complex vector spaces. Then V is an orthogonal sum of one-dimensional eigensubspaces. Proof. There exists at least one eigenvector v e V, since complex eigenvalues always exist. Then the restriction of p to the invariant subspace (w)± is again unitary and also has an eigenvector. After n such steps we obtain the desired orthogonal basis of eigenvectors. After normalising the vectors we obtain an orthonormal basis. □ Now it is possible to understand the details of the proof of the spectral decomposition of the orthogonal mapping from 2.4.7 at the end of the second chapter. The real matrix of an orthogonal mapping is interpreted as a matrix of a unitary mapping on a complex extension of Euclidean space. We observe the corollaries of the structure of the roots of the real characteristic polynomial over the complex domain. Automatically we obtain invariant two-dimensional subspaces given by pairs of complex conjugated eigenvalues and hence the corresponding rotation for restricted original real mapping. p(v) ■ p(p 1(u)) = vp 1(u). 174 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS \ 2 0 = 0 1 0 / Vo 0 o From this it follows that /0 0 -I -10 0 \ 0 0 0 The pseudoinverse matrix is then given by the expression /l 0 0\ vl(-i) := VS UT, where ff = 0 2 0 . Thus, \0 0 0/ = A 0 °\ 0 -i °\ 0 2 0 1 0 0 Vo 0 o Vo 0 1 □ 3.F.6. QR decomposition. The QR decomposition of a matrix A is very useful when we are given a system of linear equations Ax = & which has no solution, but an approximation as good as possible is needed. That is, we want to minimize \\Ax — b\\. According to the Pythagorean theorem, \\Ax — b\\2 = \\Ax — &u ||2 + ||&±||2, where b is decomposed into &n which belongs to the range of the linear transformation A, and into b±, which is perpendicular to this range. The projection on the range of A can be written in the form QQT for a suitable matrix Q. Specifically for this matrix we obtain it by the Gram-Schmidt orthonormalisation of the columns of the matrix A. Then Ax - &y = Q(QTAx - QTb). The system in the parentheses has a solution, for which \\Ax — b\\ = ||&±||, which is the minimal value. Furthermore, the matrix R := QT A is upper triangular and therefore the approximate solution can be found easily. Find an approximate solution of the system x + 2y = 1 2x + Ay = 4 Solution. Consider the system Ax = b with A = and b - which evidently has no solution. We orthonor- malise the columns of A. We take the first of them and divide it by its norm. This yields the first vector of the orthonormal A basis But the second is twice the first and thus it will be after orthonormalisation. Therefore Q = ( ^ i. The projector on the range of A is then 3.4.5. Dual and adjoint mappings. When discussing vec-J.i,, tor spaces and linear mappings in the second chapter, we mentioned briefly the dual vector space V* of all | linear forms over the vector space V, see 2.3.17. This duality extends to mappings: Dual mappings For any linear mapping > •: I • U , the expression (1) {v,il>*(a)) = {il>(v),a), where ( , } denotes the evaluation of the linear forms (the second argument) on the vectors (the first argument), while v e V and a e W* are arbitrary, defines the mapping ip* : W* —> V* called the dual mapping to ip. Choose bases vmV,wmW and write A for the matrix of the mapping t/j in these bases. Then we compute the matrix of the mapping t/j* in the corresponding dual bases in the dual spaces. Indeed, the definition says that if we represent the vectors from W* in the coordinates as rows of scalars, then the mapping t/j* is given by the same matrix as t/j, if we multiply by it the row vectors from the right: (ip(v),a) = (qi, ... ,a„) ■ A ■ This means that the matrix of the dual mapping ip* is the transpose AT, because a ■ A = (AT ■ aT)T. Assume further that we have a vector space with scalar product. Then we can naturally identify V and V* using the scalar product. Indeed, choosing one fixed vector w e V, we substitute this vector into the second argument in the scalar product in order to obtain the identification V ~ V* = Hom(V; K) V 3 w i-> (v i-> (v,w)) e V*. The non-degeneracy condition on the scalar product ensures that this mapping is a bijection. Notice it is important to use w as the fixed second argument in the case K = C in order to obtain linear forms. Since factorizing complex multiples in the second argument yields complex conjugated scalars, the identification V ~ V* is linear over real scalars only. It is clear that the vectors of an orthonormal basis are mapped to forms that constitute the dual basis, i.e. the orthonormal basis are selfdual under our identification. Moreover, every vector is automatically understood as a linear form, by means of the scalar product. How does the above dual mapping W* —> V* look in terms of our identification? We use the same notation ip* '■ W —> V for the resulting mapping, which is uniquely given as follows: 175 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Next, and Adjoint mapping 1 71 _9_ (5 9) The approximate solution then satisfies Rx = QTb, and here that means 5x + 9y = 9. (The approximate solution is not unique). The QR decomposition of the matrix A is then □ /2 -1 -1\ 3.F.7. Minimise &|| for A = -1 2 -1 and V-l -1 2y/ b = 10 . Hence write down the QR decomposition of the w matrix A Solution. The normalised first column of the matrix A is ei = J — 1 J. From the second column, subtract its com- -1, ponent in the direction e1. Then By this we have created an orthogonal vector, which we nor- malise to obtain e2 = -4? I 1 . The third column of the matrix A is already linearly dependent (verify this by computing the determinant, or otherwise). The desired column-orthogonal matrix is then , / 2 0 \ V6 -1 y/3 v-l -V3J Next, R = and Ve ^0 3^ -3^/ For every linear mapping %p : V —> between spaces with scalar products, there is the adjoint mapping tp* uniquely determined by the formula (2) (yj(u),v) = (u,r(v))- The parentheses means the scalar products on W or V, respectively. Notice that the use of the same parenthesis for evaluation of one-forms and scalar products (which reflects the identification above) makes the denning formulae of dual and adjoint mappings look the same. Equivalently we can understand the relation (2) to be the definition of the adjoint mapping tp*. By substituting all pairs of vectors from an orthonormal basis for the vectors u and v we obtain directly all the values of the matrix of the mapping tp*. Using the coordinate expression for the scalar product, the formula (2) reveals the coordinate expression of the adjoint mapping: (i(j(v),w) = (»1, (v,ip*(w)}. It follows that if A is the matrix of the mapping t/j in an orthonormal basis, then the matrix of the adjoint mapping tp* is the transposed and conjugated matrix A - we denote this by A* = AT. The matrix A* is called the adjoint matrix of the matrix A. Note that the adjoint matrix is well denned for any rectangular matrix. We should not confuse them with algebraic adjoints, which we used for square matrices when working with determinants. We can summarise. For any linear mapping t/j: V —> W between unitary spaces, with matrix A in some bases on V and W, its dual mapping has the matrix AT in the dual basis. If there are scalar products on V and W, we identify them (via the scalar products) with their duals. Then the dual mapping coincides with the adjoint mapping tp* : W —> V, which has the matrix A*. The distinction between the matrix of the dual mapping and the matrix of the adjoint mapping is thus in the additional conjugation. This is of course a consequence of the fact that our identification of the unitary space with its dual is not a linear mapping over complex scalars. 3.4.6. Self-adjoint mappings. Those linear mappings which coincide with their adjoints: tp* = t/j, are of particular interest. They are called 1 self-adjoint mappings. Equivalently we can say that they are the mappings whose matrix A satisfies A = A* in some (and thus in all) orthonormal basis. 176 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS The solution of the equation Rx = QTb is x = y = z. Thus, multiples of the vector (1,1,1) minimize \\Ax — b\\. The mapping given by the matrix A is a projection on the plane with normal vector (1,1,1). □ 3.F.8. Linear regression. The knowledge obtained in this chapter can be successfully used in practice for solving problems with linear regression. It is about finding the best approximation of some functional dependence using a linear function. Given a functional dependence for some points that is, f{a\,. .., a}n) = yi,..., f(ak, a\,. .., akn) = yk, k > n (we have thus more equations than unknowns) and we wish to find the "best possible" approximation of this dependency using a linear function. That is, we want to express the value of the property as a linear function f(xi,..., xn) = b\x\ + b2x2 + ■ ■ ■ + bnxn + c. We choose to define "best possible" by the minimisation of k I n \ 2 i=l \ j=\ j with regard to the real constants bi,... ,bn, c. The goal is to find such a linear combination of the columns of the matrix A = (aj) (with coefficients &i,..., bn\ that is closest to the vector (yi,..., yk) in Rfe. Thus it is about finding an orthogonal projection of the vector (yi,..., yk) on the sub-space generated by the columns of the matrix A. Using the theorem 3.5.7 this projection is the vector (pi,..., bn)T = 3.F.9. Using the least squares method, solve the system 2x + y + 2z = 1 x + y + 3z = 2 2x + y + z = 0 x + z = —1 Solution. The system has no solution, since its matrix has rank 3, and the extended matrix has rank 4. The best approximation of the vector b = (1,2, 0, —1) can thus be obtained using the theorem 3.5.7 by the vector A^^b. AA^^b is then the best approximation - the perpendicular projection In the case of Euclidean spaces the self-adjoint mappings are those with symmetric matrices (in orthonormal basis). They are often called symmetric mappings. In the complex domain the matrices that satisfy A = A* are called Hermitian matrices or also Hermitian symmetric matrices. Sometimes they are also called self-adjoint matrices. Note that Hermitian matrices form a real vector subspace in the space of all complex matrices, but it is not a vector sub-space in the complex domain. Remark. The next observation is of special interest. If we multiply a Hermitian matrix A by the imaginary unit, we obtain the matrix B = i A, which has the property B* =iAT = -B. Such matrices are called anti-Hermitian or Hermitian skew-symmetric. Every real matrix can be written as a sum of its symmetric part and its anti-symmetric part, A=\(A + AT) + \{A-AT). In the complex domain we have analogously A=±(A + A*)+iji(A-A*). In particular, we may express every complex matrix in a unique way as a sum A = B + iC with Hermitian symmetric matrices B = ^(A + A*) and C = ^[(A — A*). This is an analogy of the decomposition of a complex number into its real and purely imaginary component and in the literature we often encounter the notation B = reA= ^(A + A*), C = imyl= j(A-A*). In the language of linear mappings this means that every complex linear automorphism can be uniquely expressed by means of two self-adjoint mappings playing the role of the real and imaginary parts of the original mapping. 3.4.7. Spectral decomposition. Consider a self-adjoint mapping t/j : V —> V with the matrix A in some orthonormal basis. Proceed similarly as in 2.4.7 when we diagonalized the matrix of orthogonal mappings. Again, consider arbitrary invariant subspaces of self-adjoint mappings and their orthogonal complements. If a self-adjoint mapping t/j : V —> V leaves a subspace W C V invariant, i.e. ip(W) C W, then for every v e W±, w e W (tp(v),w) = (v,ip(w)} = 0. Thus also, ^(W/±) C W±. Next, consider the matrix A of a self-adjoint mapping in an orthonormal basis and an eigenvector x e Cn, i.e. A ■ x = \x. We obtain \{x, x) = {Ax, x) = {x, Ax) = {x, Xx) = \{x, x). Ill CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS of the vector b on the space generated by the columns of the matrix A. Because the columns of the matrix A are linearly independent, its pseudoinverse is given by the relation (ATA)~1AT. Hence The desired x is A^b = (-6/5,7/3,l/3)T. The projection (the best possible approximation to the column of the right side) is then the vector (3/5,32/15,4/15,-13/15). □ The positive real number (x,x) can be cancelled on both sides and thus A = A, and we see that eigenvalues of Hermitian matrices are always real. The characteristic polynomial det(vl — XE) has as many complex roots as is the dimension of the square matrix A (including multiplicities), and all of them are actually real. Thus we have proved the important general result: Proposition. The orthogonal complements of invariant sub -spaces of self-adjoint mappings are also invariant. Furthermore, the eigenvalues of a Hermitian matrix A are always real. The very definition ensures that restriction of a self-adjoint mapping to an invariant subspace is again self-adjoint. Thus the latter proposition implies that there always exists an orthonormal basis of V composed of eigenvectors. Indeed, start with any eigenvector vi, normalize it, consider its linear hull Vi and restrict the mapping to . Consider next another eigenvector v1 e V2±, take V2 = span(VI U {v2}), which is again invariant. Continue and construct the sequence of invariant subspaces V\ C V2 C ... Vn = V, building the orthonormal basis of eigenvectors, as expected. Actually, it is easy to see directly that eigenvectors associated with different eigenvalues are perpendicular to each other. Indeed, if ip(u) = Aw, i/j(v) = fiv then we obtain A(u, v) = (4>(u),v) = (u, ip{v)) = [i{u, v) = n(u, v). Usually this result is formulated using projections onto eigensubspaces. Recall the properties of projections along subspaces, as discussed in 2.3.19. A projection P : V —> V is a linear mapping satisfying P2 = P. This means that the restriction of P to its image is the identity and the projector is completely determined by choosing the subspaces Im P and KerP. A projection P : V —> V is called orthogonal if Im P _L Ker P. Two orthogonal projections P, Q are called mutually perpendicular if Im P _L Im Q. Spectral decomposition of self-adjoint mappings Theorem (Spectral decomposition). For every self-adjoint mapping ip : V —> V on a vector space with scalar product there exists an orthonormal basis composed of eigenvectors. If Ai,..., Afc are all distinct eigenvalues of ip and if Pi,..., Pk are the corresponding orthogonal and mutually perpendicular projectors onto the eigenspaces corresponding to the eigenvalues, then V> = XiPi + ■ ■ ■ + xkPk- The dimensions of the images of these projections Pi equal the algebraic multiplicities of the eigenvalues Xi. 178 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.4.8. Orthogonal diagonalization. Linear mappings which allow for orthonormal bases as in the i, latter theorem on spectral decomposition are called orthogonally diagonalizable. Of course, they are exactly the mappings for which we can find an orthonormal basis in which the matrix of the mapping is diagonal. We ask what they look like. In the Euclidean case, this is simple: diagonal matrices are first of all symmetric, thus they are the self-adjoint mappings. As a corollary we note that an orthogonal mapping of an Euclidean space into itself is orthogonally diagonalizable if and only if it is self-adjoint.They are exactly the self-adjoint mappings with eigenvalues ±1. The situation is much more interesting on unitary spaces. Consider any linear mapping p : V —> V on a unitary space. Let p = -p + if] be the (unique) decomposition of p into its Hermitian and anti-Hermitian part. If p has diagonal matrix D in a suitable orthonormal basis, then D = Re D + i Im D, where the real and the imaginary parts are exactly the matrices of -p and 7]. This follows from the uniqueness of the decomposition. Knowing this in the particular coordinates, we conclude the following computation relations at the level of mappings ip o r\ = r\ o ip (i.e. the real and imaginary parts of p commute), and pop* = p* o p (since this clearly holds for all diagonal metrices). The mappings p : V —> V with the latter property are called the normal mappings. A detailed characterization is given by the following theorem (stated in the notation of this paragraph): Theorem. The following conditions on a mapping p : V —> V on a unitary space V are equivalent: (1) p is orthogonally diagonalizable, (2) p* o p = p o p* (p is a normal mapping), (3) ip o rj = rj o ip (the Hermitian and anti-Hermitian parts commute), (4) ifA= (ciij) is the matrix ofp in some orthonormal basis, and Xi are the m = dim V eigenvalues of A, then m m i,j = l i=l Proof. The implication (1) => (2) was discussed above. (2) (3): it suffices to calculate ip o p* = (-p + ir])(%p — irf) = -p2 + rf + i(rj-p — -prf) ip* o p = (-p — iT])(t/j + irf) = -p2 + rf + i(-prj — rj-p) Subtraction of the two lines yields pp* — p*p = 2i(rj-p — iprf). (2) => (1): If p is normal, then (p(u),p(u)} = (p*p(u),u) = (pp*(u),u) = (p*(u),p*(u)) thus \p(u) \ = \p*(u)\. Next, notice (p - A id V)* = (p* - A id V). Thus, if p is normal, then (p — A id V) is normal too. 179 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS If ip(u) = Aw, then u is in the kernel of p — A idy. Thus the latter equality of norms of values for normal mappings and their adjoints ensures that u is also in the kernel of ip* — A idy. It follows that p* (u) = Aw. We have proved, under the assumption (2), that p and p* have the same eigenvectors and that they are associated to conjugated eigenvalues. Similarly to our procedure with self-adjoint mappings, we now prove orthogonal diagonalizability. The latter procedure is based on the fact that the orthogonal complements to sums of eigenspaces are invariant subspaces. Consider an eigenvector u e V with eigenvalue A, and any v e (u)±. We have (p(v),u) = (v, p*(u)} = (v, Am) = X(u, v) = 0. Thus p(y) e (u)±. The same occurs if u is replaced by a sum of eigenvectors instead. (1) => (4): the expression J2i j \aij\2 *s the trace of the matrix AA*, which is the matrix of the mapping pop*. Therefore its value does not depend on the choice of the or-thonormal basis. Thus if p is diagonalizable, this expression equals exactly J2i 12 ■ (4) => (1): This part of the proof is a direct corollary of the Schur theorem on unitary triangulation of an arbitrary linear mapping V —> V, which we prove later in 3.4.15. This theorem says that for every linear mapping p : V —> V there exists an orthonormal basis under which p has an upper triangular matrix. Then all the eigenvalues of p appear on its diagonal. Since we have already shown that the expression J2i j \aij\2 does not depend on the choice of the orthonormal bases, all elements in the upper triangular matrix, which are not on the diagonal must be zero. □ Remark. We can rephrase the main statement of the latter theorem in terms of matrices. A mapping is normal if and only if its matrix A satisfies A A* = A* A in some orthonormal basis (and equivalently in any orthonormal basis). Such matrices are called normal. Moreover, we can consider the last theorem as a generalization of standard calculations with complex numbers. The linear mappings appear similar to complex numbers in their algebraic form. The role of real numbers is played by self-adjoint mappings, and the unitary mappings play the role ofthe complex units cos t+i sin t e C. The following consequence of the theorem shows the link to the property cos21 + sin2 t = l. Corollary. The unitary mappings on a unitary space V are exactly those normal mappings p on V for which the unique decomposition p = ip + if] into Hermitian and anti-Hermitian parts satisfies ip2 + rj2 = idy. Proof. If p is unitary, then pp* = idy = p*p and thus pp* = (ip + if])(ip — if]) = ip2 + 0 + rj2 = idy. On the other hand, if p is normal, we can read the latter computation backwards which proves the other implication. □ 180 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.4.9. Roots of matrices. Non-negative real numbers are exactly those which are squares of real numbers (and thus we may find their square roots). At the same time, their positive square roots are uniquely denned. Now we observe a similar behaviour of matrices of the form B = A* A. Of course, these are the matrices of the compositions of mappings p with their adjoints. By definition, (1) (Bx,x) = (A*Ax,x) = (Ax, Ax) > 0 for all vectors x. Furthermore, we clearly have B* = (A* A)* = A* A = B. Hermitian matrices B with the property (Bx,x) > 0 for all x are called positive semidefinite matrices. If the zero value is attained only for x = 0, they are called positive definite. Analogously, we speak of positive definite and positive semi-definite (self-adjoint) mappings ip : V —> V. For every mapping p : V —> V we can define its square root as a mapping ip such that ip oip = p. The next theorem completely describes the situation when restricting to positive semidefinite mappings. Positive semidefinite square roots Theorem. For each positive semidefinite square matrix B, there is the uniquely defined positive semidefinite square root \/~B. If P is any matrix such that P~1BP = D is diagonal, then \/B = P\/rDP~1, where D has got the (non-negative) eigenvalues of B on its diagonal and \/D is the matrix with the positive square roots of these values on its diagonal. Proof. Since B is a matrix of a self-adjoint mapping p, there is even an orthonormal P as in the theorem (cf. Theorem 3.4.7) with all eigenvalues in the diagonal of D non-negative. Consider C = \fB as defined in the second claim and notice that in- deed C2 = PVIĎP^PVIĎP-1 = PDP'1 = B. Thus the mapping ip given by C must have the same eigenvectors as p and thus these two mappings share the decompositions of K" into mutually orthogonal eigenspaces. In particular, both of them will share the bases in which they have diagonal matrices and thus the definition of VT5 must be unique in each such basis. This proves that the definition of \fB does not depend on our particular choice of the diag-onalization of p. □ Notice there could be a lot of different roots, if we relax the positivity condition on \fB, see ??. 181 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.4.10. Spectra and nilpotent mappings. We return to the behavior of linear mappings in full generality. 'f We continue to work with real or complex vector spaces, but without necessarily fixing a scalar product there. Recall that the spectrum of a linear mapping f : V —> V is a sequence of roots of the characteristic polynomial of the mapping /, counting multiplicities. The algebraic multiplicity of an eigenvalue is its multiplicity as a root of the characteristic polynomial. The geometric multiplicity of an eigenvalue is the dimension of the corresponding subspace of eigenvectors. A linear mapping / : V —> V is called nilpotent, if there exists an integer k > 1 such that the iterated mapping fk is identically zero. The smallest k with such a property is called the degree of nilpotency of the mapping /. The mapping / : V —> V is called cyclic, if there exists the basis ..., un) of the space V such that /(ui) = 0 and = ui_1 for all i = 2,..., n. In other words, the matrix of / in this basis is of the form A- (Q 1 0 0 0 1 If f(v) = a v, then jk(y) = ak ■ v for every natural k. Note that, the spectrum of nilpotent mapping can contain only the zero scalar (and this is always present). By the definition, every cyclic mapping is nilpotent. Moreover, its degree of nilpotency equals the dimension of the space V. The derivative operator on polynomials, D(xk) = kxk~x, is an example of a cyclic mapping on the spaces K„ [x] of all polynomials of degree at most n over the scalars K. Perhaps surprisingly, this is also true the other way round - every nilpotent mapping is a direct sum of cyclic mappings. A proof of this claim takes much work. So we formulate first the results we are aiming at, and only then come back to the technical work. In the resulting theorem describing the Jordan decomposition, the crucial role is played by vector (sub)spaces and linear mappings with a single eigenvalue a given by the matrix (1) j = fx 1 0 ... 0\ 0 a 1 ... 0 \0 0 0 ... Xj These matrices (and the corresponding invariant subspaces) are called Jordan blocks.4 Camille Jordan was a famous French Mathematician working in Analysis and Algebra at the end of the 19th and the beginning of the 20th centuries. 182 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Jordan canonical form Theorem. Let Vbe a real or complex vector space of dimension n. Let f : V —> V be a linear mapping with n eigenvalues (in the chosen domain of scalars), counting algebraic multiplicities. Then there exists a unique decomposition of the space V into the direct sum of subspaces V = Vi ffi ■ ■ ■ ffi Vk where not only f(Vi) C Vi, but the restriction of f to each Vi has a single eigenvalue A, and the restriction f — \ idys on Vi is either cyclic or is the zero mapping. In particular, there is a suitable basis in which f has a block-diagonal matrix J with Jordan blocks along the diagonal. We say that the matrix J from the theorem is in Jordan canonical form. In the language of matrices, we can rephrase the theorem as follows: Corollary. For each square matrix A over complex scalars, there is an invertible matrix P such that A = P~x J P and J is in canonical Jordan form. The matrix P is the transition matrix to the basis from the theorem above. Notice that the total number of ones over the diagonal in J equals the difference between the total algebraic and geometric multiplicity of the eigenvalues. The ordering of the blocks in the matrix corresponds to the chosen ordering of the subspaces Vi in the direct sum. Thus, the uniqueness of the matrix J is true up to the ordering of the Jordan blocks. There is therefore freedom in the choice of the basis for such a Jordan canonical form. 3.4.11. Remarks. The existence of the Jordan canonical form is clear for the cases when all eigenvalues are either distinct or when the geometric and algebraic multiplicities of the eigenvalues are the same. In particular, this is the case for all unitary and self-adjoint mappings on unitary vector spaces, while the definition of normal mappings requires eaxactly this behavior. In particular, the Jordan canonical form of a mapping is diagonal if and only if the mapping is normal. A consequence of the Jordan canonical form theorem is that for every linear mapping /, every eigenvalue of / uniquely determines an invariant subspace that corresponds to all Jordan blocks with this particular eigenvalue. We shall call this subspace the root subspace corresponding to the given eigenvalue. We mention one useful corollary of the Jordan theorem (which is already used in the discussion about the behavior of Markov chains). Assume that the eigenvalues of our mapping / are all of absolute value less than one. Then repeated application of the linear mapping on every vector v e V leads to a decrease of all coordinates of fh(v) towards zero, without bounds. Indeed, assume / has only one eigenvalue A on all the complex space V and that / — A idy is cyclic (that is, we consider only one Jordan block separately). Let v1,..., vi be 183 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS the corresponding basis. Then the theorem says that /(i^) = Xv2 + vi, f2{v2) = X2v2 + A«i + \vi = \2v2 + 2Xvi, and similarly for other v{'s and higher powers. In any case, the iteration of / results in higher and higher powers of A for all non-zero components. The smallest of them can differ from the largest one only by less than the dimension of V. The coefficients are bounded too. This proves the claim. The same argument can be used to prove that for the mapping with all eigenvalues with absolute value strictly greater than one leads to unbounded growth of all coordinates for the iterations jk(y). The remainder of this part of the third chapter is devoted to the proof of the Jordan theorem and a few necessary lemmas. It is much more difficult than anything so far. The reader can skip it, until the beginning of the fifth part of this chapter in case of any problems with reading it. 3.4.12. Root spaces. We have already seen by explicit examples that the eigensubspaces completely describe geometric properties for some linear mappings only. Thus we now introduce a more subtle tool, the root subspaces. Definition. A non-zero vector u £ Vis called a root vector of the linear mapping p : V —> V, if there exists an a e K and an integer k > 0 such that (p — a idy)fe(w) = 0. This means that the fc-th iteration of the given mapping sends u to zero. The set of all root vectors corresponding to a fixed scalar A along with the zero vector is called the root subspace associated with the scalar A e K. We denote it by 7l\. If u is a root vector and the integer k from the definition is chosen as the smallest possible one for u, then (p — a idy)fe_1(w) is an eigenvector with the eigenvalue a. Thus we have 7l\ = {0} for all scalars A which are not in the spectrum of the mapping p. Proposition. Let p : V —> V be a linear mapping. Then (1) TZ\ C V is a vector subspace for every A G K, (2) for every A, \i G K, the subspace TZ\ is invariant with respect to the linear mapping (p — fi idy). Inparticular TZ\ is invariant with respect to p, (3) if fi 7^ A, then (p — \i idy)|-R.A is invertible, (4) the mapping (p — A idy)|-R.A is nilpotent. Proof. (1) Checking the properties of the vector vector subspace is easy and is left to the reader. (2) Assume that (p — A idy)fe(u) = 0 and put v = (p — [i idy)(u). Then (p-\idv)k(v) = = (p - A idv)k((p - A idy) + (A - fi) idy)(u) = (p - A idy)fe+1(u) + (A - fi) ■ (p - A idy)fe(u) = 0 (3) If u e Ker(p — \i idy) |-^A, then (

V/U, V be a linear mapping whose spectrum contains n elements (that is, all roots of the characteristic polynomial lie in K and we count their multiplicities). Then there exists a sequence of invariant subspaces {0} = lo C li C ■■■ C fn = V with dimensions dim Vi = i. Consider a basis ui,...,un of the space V such that Vi = span{ui,..., Ui}. In this basis, the matrix of the mapping p is an upper triangular matrix: with the spectrum Xi,..., Xn on the diagonal. Proof. The subspaces Vi are constructed inductively. Let {Ai,..., A„} be the spectrum of the mapping p. Thus the characteristic polynomial of the mapping p is of the form ±(A - Ai)---(A - A„). We choose V0 = {0}, Vi = span{ui}, where u\ is an eigenvector with eigenvalue Ai. According to the previous theorem, the characteristic polynomial of the mapping Pv/Vi is of the form ±(A — A2) ■ ■ ■ (A — A„). Assume that we have already constructed linearly independent vectors ui,..., uk and invariant subspaces Vi = span{ui..., Ui}, i = 1,..., k < n such that the characteristic polynomial of pv/vk is of the form ±(A - Afc+i) • • • (A - A„) and p{ut) e (Xi ■ u{ + Vi-i) for alii = 1,..., k. 186 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS We want to add one more vector u^+i with analogous properties. There exists an eigenvector u^+i +14 G V/Vk of the mapping pv/vk with the eigenvalue A^+i. Consider the space 14+i = span{ui,..., Ufc+i}. If the vector Ufc+i is a linear combination of the vectors ui,...,uk then Uk+i + 14 would be the zero class in V/14. But this is not possible. Thus dim 14+i = k + 1. It remains to study the induced mapping py/vk+1 ■ The characteristic polynomial of this mapping is of degree n — k — 1 and divides the characteristic polynomial of the mapping p. But completing the vectors ui,..., Uk+i to the basis of V yields a block matrix of the mapping ip with an upper triangular submatrix b in the left upper corner and zero in the left lower corner. The diagonal elements are exactly the scalars \±,..., \k+i- Therefore the roots of the characteristic polynomial of the induced mapping have the required properties. □ Remark. If V decomposes into the direct sum of eigensub-spaces for p, the latter results do not say anything new. But their significance consists in the fact, that only the existence of dim V roots of the characteristic polynomial (counting multiplicities) is assumed. This is ensured whenever the field K is algebraically closed, for instance the complex numbers C. As a direct consequence we see that the determinant and the trace of the mapping p are always the product and the sum of the elements in the spectrum, respectively. This can be also used for all real matrices. Just consider them to be complex, calculate the determinant or the trace as the product or sum of eigenvalues and because both determinant and the trace are algebraic expressions in terms of the elements of the matrix, the results will be correct. 3.4.15. Orthogonal triangulation. If we are given a scalar product on a vector space V and U C V is a subspace, then clearly V/U ~ U1- where v e U1- is identified with v + U. Moreover, each class of the quotient space V/U contains exactly one vector from U1- (the difference of two such vector is in U n U±). We can exploit this observation in every inductive step of the proof of the theorem above. Choose the representative Uk+i £ of the eigenvector of pv/vk ■ This modification leads to the orthogonal basis with the properties required in the claim about triangulation in the corollary above. Therefore there exists such an orthonormal basis, and we arrive at a very important theorem: Schur's orthogonal triangulation theorem Theorem. Let p : V —> V be a linear mapping on a vector space with scalar product. Let there be m = dim V eigenvalues, counting multiplicities. Then there exists an orthonormal basis of the space V such that the matrix of p in this basis is upper triangular with eigenvalues Ai,..., Xm on the diagonal. 3.4.16. Theorem. Let p : V —> V be a linear mapping and Ai,..., Afc be all distinct eigenvalues. Then the sum of the 187 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS root spaces TZ\1,..., 7Z\k is direct. Furthermore, for every eigenvalue X the dimension of the subspace 7Z\ equals the algebraic multiplicity of X. Proof. We prove first the independence of nonzero vectors from different root spaces. We proceed by induction over the number k of root spaces. The claim is obvious if k = 1. Assume that the theorem holds for cases with less than k > 1 root spaces and assume that vectors «i G 7Z\1,uk e 7l\k satisfy ui + ■ ■ ■ + Ufc = 0. Then, (p — Xk idy)-7 (u^) = 0 for suitable j, and moreover all = (p — Xk idy)J(uj) are non-zero vectors in 71 a,, i = 1,... ,k — 1, whenever u{ are non-zero by Proposition 3.4.12. But at the same time Vi H-----h Vk-i = (p - Afc ■ idyy I ^ Ui j =0 and, according to the inductive assumption, all are zero. But then also all u{, 1 < i < k must vanish and thus uk = 0, too. This proves the first claim. It remains consider the dimensions of the root spaces 71 \. Consider an eigenvalue A of p, use the same notation p for the restriction p\nx and write ip '■ V/7Z\ —> V/7Z\ for the mapping induced by p on the quotient space. Assume that the dimension 7Z\ is strictly smaller than the algebraic multiplicity of the root A of the characteristic polynomial. In view of lemma 3.4.14, A is also an eigenvalue of the mapping ip. Let (v + 7Z\) e V/7Z\ be the corresponding eigenvector, that is, ip(v + 7Z\) = X (v + 7Z\). Then v <£ 7Z\ and p(y) = Xv + w for suitable w e 7Z\. Thus w = (ip — X idy)(w) and (

V. □ Combining the latter theorem with the triangulation result from Corollary 3.4.14, we can formulate: Corollary. Consider a linear mapping p : V —> V on a vector space V over scalars K, whose entire spectrum is in K. Then V = 7Z\1 ffi ■ ■ ■ ffi 7Z\n is the direct sum of the root subspaces. If we choose suitable bases for these subspaces, then under this basis p has block-diagonal form with upper triangular matrices in the blocks and eigenvalues Xi on the diagonal. 3.4.17. Nilpotent and cyclic mappings. Now almost everything is prepared for the discussion about canonical forms of matrices. It only remains to clear the relation between cyclic and nilpotent mappings and combine already proved results. Theorem. Let p : V —> V be a nilpotent linear mapping. Then there exists a decomposition of V into a direct sum of subspaces V = V\ ffi ■ ■ ■ ffi Vk such that the restriction of p to each summand Vi is cyclic. 188 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Proof. We provide a straightforward construction of a basis of the space V such that the action of the mapping p on the basis vectors directly shows the de-composition into the cyclic mappings, 'i Let k be the degree of nilpotency of the mapping p and write Pi = lm(pl), i = 0,..., k. Thus, {0} = Pk c Pfc_i c ■ ■ ■ c Pi c P0 = V. Choose a basis ek~1,..., ek~^ of the space Pk-i, where pk-i > 0 is the dimension of Pk-i. By definition, Pk-i C Ker p, i.e. p(ek~1) = 0 for all j. Assume that Pk-i ^ V. Since Pk-i = p(Pk-2), there necessarily exist the vectors ek~2, j = 1,... ,pk-i in Pk-2, such that p(ek~2) = e^-1. Assume aiei_1 + ' ■■ + aPk_1ekp;_\ + b1e\~2 + -- ■ + bPk_1ek;2i = 0. Applying p on this linear combination yields frie^-1 + • • • + bPkl ep~\ = 0. This is a linear combination of independent vectors, therefore all bj = 0. But then also a, = 0. Thus the linear independence of all 2pk_1 chosen vectors is established. Next, extend them to a basis (1) 1 '•••'e^- k 2 A: 2 A: 2 A: 2 el ' ' ' ' ' ePfc-l' Pfc-l+1'' ' ' ' Pfc-2 of the space Pk-2- The images of the added basis vectors are in Pk-\. Necessarily they must be linear combinations of the basis elements ek_1,..., ek~\ ■ We can thus adjust the chosen vectors ek~2i+1,..., ek~22 by adding the appropriate linear combinations of the vectors ek~2,..., ek~2i with the result that they are in the kernel of p. Thus we may assume our choice in the scheme (1) has this property. Assume that we have already constructed a basis of the subspace Pk-i such that we can directly arrange it into the scheme similar to (1) pk-l el '•••'ePfc-i pk—2 pk—2 pk—2 pk—2 el ' ' ' ' ' ePk-l' epfc_i + l' ' ' ' ' ePfc-2 A—3 k—3 k—3 k—3 pA:—3 k—3 cl ' ' ' ' ' cpfc-i' cpfc-i + l' ' ' ' ' cpfc-2' cpfc-2 + l' ' ' ' ' cPfc-3 k-e k-e k-e k-e k-e k-e where the value of the mapping p on any basis vector is located above it. The value is zero if there is nothing above that basis vector. If Pk-i ^ V, then again there must exist vectors ek-e-\e^;1 which map to ek~e,ek~^. We can extend them to a basis Pk-i-±, say, by the vectors k-e-i k-e-i Pk-i+i>'''' Pk-i-i• Again, exactly as when adjusting (1) above, we choose the additional basis vectors from the kernel of p. and analogically as before we verify that we indeed obtain a basis for Pk-e-i- 189 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS After k steps we obtain a basis for the whole V, which has the properties given for the basis of the subspace Pk-i. Individual columns of the resulting scheme then generate the subspaces Vi. Additionally we have found the bases of these subspaces which show that corresponding restrictions of p are cyclic mappings. □ 3.4.18. Proof of the Jordan theorem. Let Ai,..., A^ be all *v\\ the distinct eigenvalues of the mapping p. From the assumptions of the Jordan theorem it follows thatK = ftAl (B---(BTZXk. W The mappings p{ = {p\nK -\ id-RAi) are nilpotent and thus each of the root spaces is a direct sum Tlx, =Pi,Xi Pj„x, of spaces on which the restriction of the mapping p — A, idy is cyclic. Matrices of these restricted mappings on PTtS are Jordan blocks corresponding to the zero eigenvalue, the restricted mapping p\pTi3 has thus for its matrix the Jordan block with the eigenvalue A,. For the proof of Jordan theorem it remains to verify the claim about uniqueness (up to reordering the blocks). Because the diagonal values A, are given as roots of the characteristic polynomial, their uniqueness is immediate. The decomposition to root spaces is unique as well. Thus, without loss of generality we may assume that there is just one eigenvalue A and we are going to express the dimensions of individual Jordan blocks using the ranks rk of the mapping (p — A idy )k. This will show that the blocks are uniquely determined (up to their order). On the other hand, changing the order of the blocks corresponds to renumbering the vectors of basis, thus we can obtain them in any order. If ip is a cyclic operator on an m-dimensional space, then the defect of the iterated mapping i\jk is k for 0 < k < m, while the defect is m for all fc > m. This implies that if our matrix J of the mapping p on the n-dimensional space V (remind we assume V = Tlx) contains dk Jordan blocks of the order k, then the defect De = n — of the matrix (J — \E)e is Df = d1 + 2d2 H-----\-£de+ £de+1 H----. Now, taking the combination 2Dk — Dk-i — Dk+i we cancel all those terms in the latter expression which coincide for £ = k — 1, k, k + 1 and we are left with 2-Dfc — P>k-\ — Dk+i = dk-Substituting for De's, we finally arrive at dk = 2n-2rk-n + rk_1-n + rk+1 = rfc_i -2rk + rk+1. This is the requested expression for the sizes of the Jordan blocks and the theorem is proved. 190 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 3.4.19. Remarks. The proof of the theorem about the exis-*Sw tence of the lordan canonical form was construe-/>i>~v.--p. tive> but it does not give an efficient algorithmic ''^jr"-- approach for the construction. Now we show .. how our results can be used for explicit computation of the basis in which the given mapping p : V —> V has its matrix in the canonical lordan form.5 (1) Find the roots of the characteristic polynomial. (2) If there are less than n = dim V roots (counting multiplicities), then there is no canonical form. (3) If there are n linearly independent eigenvectors, there is a basis of V composed of eigenvectors under which ip has diagonal matrix. (4) Let A be the eigenvalue with geometric multiplicity strictly smaller than the algebraic multiplicity and vi,...,Vk be the corresponding eigenvectors. They should be the vectors on the upper boundary of the scheme from the proof of the theorem 3.4.17. We need to complete the basis by application of iterations p — A idy. By doing this we also find in which row the vectors should be located. Hence we find the linearly independent solutions w{ of the equations (p — X id)(wi) = Vi from the rows below it. Repeat the procedure iteratively (that is, for w{ and so on). In this way, we find the "chains" of basis vectors that give invariant subspaces, where p — X id is cyclic (the columns from the scheme in the proof). The procedure is practical for matrices when the multiplicities of the eigenvalues are small, or at least when the degrees of nilpotency are small. For instance, for the matrix A = we obtain the two-dimensional subspace of eigenvectors span{(l,0,0)T,(0,l,0)T}, but we still do not know, which of them are the "ends of the chains". We need to solve the equations (A — 2E)x = (a, b, 0)T for (yet unknown) constants a, b. This system is solvable if and only if a = b, and one of the possible solutions is x = (0,0,1)T, a = b = 1. The entire basis is then composed of (1,1,0)T, (0,0,1)T, (1,0,0)T. Note that we have free choices on the way and thus there are many such bases. There is a beautiful purely algebraic approach to compute the Jordan canonical form efficiently, but it does not give any direct information about the right basis. This algebraic approach is based on polynomial matrices and Weierstrass divisors. We shall not go into details in this textbook. 191 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 5. Decompositions of the matrices and pseudoinversions Previously we concentrated on the geometric description matrix calculus in general. Even when computing effectively with real numbers we use decompositions into products. The simplest one is the unique expression of every real number in the form that is, as a product of the sign and the absolute value. Proceeding in the same way with complex numbers, we obtain their polar form. That is, we write z = (cosp + i sinip)\z\. Here the complex unit plays the role of the sign and the other factor is a non-negative real multiple. In the following paragraphs we list briefly some useful decompositions for distinct types of matrices. Remind, we met suitable decompositions earlier, for instance for positive semidefinite matrices in paragraph 3.4.9 when finding the square roots. We shall start with similar simple examples. 3.5.1. LU-decomposition. In paragraphs 2.1.7 and 2.1.8 we transformed matrices over scalars from any field into row echelon form. For this we used elementary row transformations, based on successive multiplication of our matrix by invertible lower triangular matrices Pi. In this way we added multiples of the rows above the currently transformed one. Sometimes we also interchanged the rows, which corresponded to multiplication by a permutation matrix. That is a square matrix in which all elements are zero except exactly one value 1 in each row and column. To imagine why, consider a matrix with just one non-zero element in the first column but not in the first row. When we used the backwards elimination to transform the matrix into the blockwise form (remind Eh stays for the unit matrix of rank h) then we potentially needed to interchange columns as well. This was achieved by multiplying by a permutation matrix from the right hand side. For simplicity, assume we have a square matrix A of size m and that Gaussian elimination does not force a row interchange. Thus all matrices Pi can be lower triangular with ones on diagonal. Finally we note that inverses of such Pi are again lower triangular with ones on the diagonal (either remember the algorithm 2.1.10 or the formula in 2.2.11). We obtain of the structure of a linear mapping. Now we translate our results into the language of matrix decomposition. This is an important topic for numerical methods and a = sgn(a) ■ a U = P ■ A = Pk ■ ■ ■ P1 ■ A where U is an upper triangular matrix. Thus A = L-U 192 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS where L is lower triangular matrix with ones on diagonal and U is upper triangular. This decomposition is called LU-decomposition of the matrix A. We can also absorb the diagonal values of U into a diagonal matrix D and obtain the LDU-decomposition where both U and L have just ones along the diagonal, A = L D U. For a general matrix A, we need to add the potential permutations of rows during Gaussian elimination. Then we obtain the general result. (Think why we can always put the necessary permutation matrices to the most left and most right positions!) LU-decomposition Let A be any square matrix of size m over a field of scalars. Then we can find lower triangular matrix L with ones on its diagonal, upper triangular matrix U and permutation matrices P and Q, all of size m, such that A = P-L-U-Q. 3.5.2. Remarks. As one direct corollary of the Gaussian tT r^< elimination we can observe that, up to a choice of suitable bases on the domain and codomain, CmJ*dr::_ every linear mapping / : V —> W is given by a matrix in block-diagonal form with unit matrix of the size equal to the dimension of the image of /, and with zero blocks all around. This can be reformulated as follows: every matrix A of the type m/n over a field of scalars K can be decomposed into the product where P and Q are suitable invertible matrices. Previously (in 3.4.10) we discussed properties of linear mappings / : V —> V over complex vector spaces. We showed that every square matrix A of dimension m can be decomposed into the product A = P- J-P~\ where J is a block-diagonal with Jordan blocks associated with the eigenvalues of A on the diagonal. Indeed, this is just a reformulation of the Jordan theorem, because multiplying by the matrix P and by its inverse from the other side corresponds in this case just to the change of the basis on the vector space V (with transition matrix P). The quoted theorem says that every mapping has Jordan canonical form in a suitable basis. Analogously, when discussing the self-adjoint mappings we proved that for real symmetric matrices or for complex Hermitian matrices there exists a decomposition into the product A = P ■ D ■ P*, where D is the diagonal matrix with all (always real) eigenvalues on the diagonal, counting multiplicities. Indeed, we 193 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS proved that there is an orthonormal basis consisting of eigenvectors. Thus the transition matrix P reflecting the appropriate change of the basis must be orthogonal. In particular, p~1 — p* For real orthogonal mappings we derived analogous expression as for the symmetric ones, i.e. A = P ■ B ■ P*. But in this case the matrix B is block-diagonal with blocks of size two or one, expressing rotations, mirror symmetry and identities with respect to the corresponding subspaces. 3.5.3. Singular decomposition theorem. We return to general linear mappings / : V —> W between vector spaces (generally distinct). We assume that scalar products are defined on both spaces and we restrict ourselves to orthonormal bases only. If we want a similar decomposition result as above, we must proceed in a more refined way than in the case of arbitrary bases. But the result is surprisingly similar and strong: Singular decomposition Theorem. Let Abe a matrix of the type m/n over real or complex scalars. Then there exist square unitary matrices U and V of dimensions m and n, and a real diagonal matrix D with non-negative elements of dimension r,r < min{m, n}, such that A = USV*, S=(^ °qJ and r is the rank of the matrix AA*. The matrix S is determined uniquely up to the order of the diagonal elements in D. Moreover, are the square roots of the positive eigenvalues of the matrix A A*. If A is a real matrix, then the matrices U and V are orthogonal. Proof. Assume first that m < n. Denote by if : I" —> Km the mapping between i, real or complex spaces with standard scalar products, given by the matrix A in the standard bases. We can reformulate the statement of the theorem as follows: there exists orthonormal bases on K" and Km in which the mapping f is given by the matrix S from the statement of the theorem. As noted before, the matrix A* A is positive semidefinite. Therefore it has only real non-negative eigenvalues and there exists an orthonormal basis w of K" in which the corresponding mapping f* o f is given by a diagonal matrix with eigenvalues on the diagonal. In other words, there exists a unitary matrix V such that A* A = V B V* for a real diagonal matrix B with non-negative eigenvalues (di, d2,..., dr, 0,..., 0) on the diagonal, d{ ^ 0 for alii = 1,..., r. Thus B = \ A A \ (AV)*(AV). 194 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS This is equivalent to the claim that the first r columns of the matrix AV are orthogonal, while the remaining columns vanish because they have zero norm. Next, we denote the first r columns of AV as vi,...,vr £ Km. Thus, (vi,Vi) = di, i = l,...,r, and the normalized vectors u{ = ~^Vi form an orthonormal system of non-zero vectors. Extend them to an orthonormal basis u = ui,..., um for the entire Km. Expressing the original mapping p in the bases w of I" and u of Km, yields the matrix VB. The transformations from the standard bases to the newly chosen ones correspond to the multiplication from the left by a unitary (orthogonal) matrix U and from the right by V~x = V*. This is the claim of the theorem. If m > n, we can apply the previous part of the proof to the matrix A* which implies the desired result. All the previous steps in the proof are also valid in the real domain with real scalars. □ This proof of the theorem about singular decomposition is constructive and we can indeed use it for computing the unitary (orthogonal) matrices U and V and the non-zero diagonal elements of the matrix S. The diagonal values of the matrix D from the previous theorem are called singular values of the matrix A. 3.5.4. Further comments. When dealing with real scalars, the singular values of a linear mapping ip : Rn —> Rm have a simple geometric meaning: Let K G W1 be the unit ball in the standard scalar product. The image 11 that for every mapping ip : K" —> Km with the ma-"!y>3^ trix A in the standard bases we can choose a new or-As^F thonormal basis on Km for which p has upper trian-?w gular matrix. Consider the images ABA = A and we obtain -(I SW? SWT I Consequently B ' D~x P 2 R for suitable matrices P, Q and R. Next, BA- 'D~x P\(P> 0\ _ f E 0^ 3 R) \0 OJ ~ \QD 0y is Hermitian. Thus QD = 0 which implies Q = 0 (the matrix D is diagonal and invertible). Analogously, the assumption that AB is Hermitian implies that P is zero. Finally, we compute B = BAB = D-1 0\ (D 0\ (D-1 0 0 R J I 0 01 \ 0 R On the right side in the right-lower corner there is zero, and thus also R = 0 and the claim is proved. (4): Consider the mapping p :Kn ^ Km, x i-> Ax, and direct sums I" = (Keri^)±ffiKeri^,Km = Imi^ffi(Imi^)± of the orthogonal complements. The restricted mapping V? := V^Ke-rip)1- '■ (Keri^)± —> Imp is a linear isomorphism. If we choose suitable orthonormal bases on (Ker p)1-and Im p and extend them to orthonormal bases on whole spaces, the mapping p will have matrix S and p the matrix D from the theorem about the singular decomposition. In the next section, we shall discuss in detail that for any given b e Km, there is the unique vector which minimizes the distance ||& — 211 among all z e Imp (in analytic geometry we shall say that the point z realises the distance of b from the affine subspace Im p), see 4.1.16). The properties of the norm proved in theorem 3.4.3 directly imply that this is exactly the component z = b\ of the decomposition b = b\ +b2, &i £ Imp, &2 £ (Imp)±. Now, in our choice of bases, the mapping p^ is given by the matrix 5^ from the singular decomposition theorem. In 199 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS particular, ip^(lmip) = (Kerip)±, D 1 is the matrix of the restriction m^ ^[(im^)-1- *s zer0, I^eed, if o n. For instance, an experiment gives many measured real values bj, j = 1,..., m. We want to find a linear combination of only a few fixed functions fi, i = 1,..., n which approximates the values bj as good as possible. The actual values of the fixed functions at the relevant points y.j e R define the matrix a{j = fi(yj). The columns of the matrix are given by values of the individual functions fi at the considered points. The goal is to determine the coefficients x{ e R 200 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS so that the sum of the squares of the deviations from the actual values j=l i=l j=1 i=l is minimized. By the previous theorem, the optimal coefficients are A^b. As an example, consider just three functions fo(y) = 1, /i(y) = V< h{y) — y2- Assume that the "measured values" of their unknown combination g(y) = x0 + x\y + x2y2 in integral values for y between 1 and 5 are bT = (1,12,6,27,33). (This vector b arose by computing the values 1 + y + y2 at the given points adjusted by random integral values in the range ±10.) This leads in our case to the matrix A = (bj{) /l 1 1 1 1\ AT= 1 2 3 4 5 . \1 4 9 16 25/ The requested optimal coefficients for the combination are A^ -b 9 5 0 37 23 35 70 1 1 7 14 0.600\ 0.614 1.214/ _4 5 6 7 _ 1 7 _3 5 37 70 _ J_ 14 A\ 12 6 27 \33/ The resulting approximation can be seen in the picture, where the given values b are shown by the diamonds, while the dashed curve stays for the resulting approximation g(y) = xi + x2y + x3y2. / / V / ♦ / The computation was produced in Maple and taking 15 values yi = I + i + i2, with a random vector of deviations from the same range added produced the following picture: 201 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS 4. 202 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS G. Additional exercises for the whole chapter 3.G.I. Solve the following LP problem minimize {7x — 5y + 3z} 0 0. Dropping the primes at y and z, we obtain an initial LP tableaux: x y z t s r Objective 7 -5 3 0 0 0 0 t 1 0 0 1 0 0 6 s 0 ® 0 0 1 0 9 r 0 0 1 0 0 1 13 The second and final tableaux, is x V z t s r Objective 7 0 3 0 5 0 45 t 1 0 0 1 0 0 6 y 0 1 0 0 1 0 9 r 0 0 1 0 0 1 13 which provides the solution x = 0, y' = 9, z' = 0. In the original notation this means x = 0, y = 7, z = —4. This solution can be easily guessed from the beginning. □ 3.G.2. Solve the following LP problem: A small firm specializes in making five types of spare automotive parts. Each part is first cast from iron in the casting shop and is then sent to the finishing shop where holes are drilled, surfaces are turned, and edges are ground. The required worker-hours (per lOO.units), for each type of parts in each of the two shops, are shown below: Part type 1 2 3 4 5 Casting 2 13 3 1 Finishing 2 2 1^ 1^ F The profits from the five parts are $30, $20, $40, $25 and $10 (per 100 units), respectively. The capacities of the casting and finishing shops over the next month are 700 and 1000 worker-hours respectively. Determine the quantities of each type of spare part to be made during the month, so as to maximize the firm's profit. Assume that there is sufficient demand for the firm to sell whatever it is capable of producing. 203 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS Solution. Let Xj be the number of produced units (in multiples of ten thousand) of the part of the type j = 1,..., 5. Then the LP problem can be formulated as maximize {30xi + 20^2 + 40^3 + 25^4 + 10xs} 2x1 + x2 + 3x3 + 3x4 + x5 + t = 7 2x1 + 2x2 + x3 + x4 + x5 + s = 10 Xj > 0, j = 1,... ,5. The first, second, and third LP tableaux are respectively xl X2 X3 X4 X5 t S Objective -3 -2 -4 -f -1 0 0 0 t s 0 13 3 110 2 2 1 1 10 1 xl X2 X3 X4 X5 t s 7 10 Objective 0 -i j 2 i | 0 21 2 X\ s I I I 3 1 1 o 1 2 2 2 2 2 u 0 ® -2 -2 0 -1 1 xl X2 X3 X4 X5 t s 7 2 3 Objective 0041 j 1 i 24 2 X2 1 0 § § i 1 -1 0 O -2 -2 0-1 1 3 Xl X2 X3 X4 X5 t s 2 Objective I ri ri 3 3 6 3 R U U 2 5 R in 62 R X3 x2 I 1 0 0 1 -1 5 4 2% 5 The final tableau provides the solution. 23 4 x2 = y, x3 = -, Xj = 0, j = 1,4,5. Thus, optimal profit is achieved by producing 46,000 parts of type 2 and 8,000 parts of type 3. □ 3.G.3. Model of spreading of annual plants. Consider some plants that blossom at the beginning of summer, then produce seeds and die at the peak of summer. Some of the seeds burst into flowers at the end of the autumn. Some survive the winter in the ground and burst into flowers at the start of the spring. The flowers that burst out in autumn and survive the winter are usually larger in the spring, and usually produce more seeds. After this, the whole cycle repeats itself. The year is thus divided into four parts and in each of these parts we distinguish between some "forms" of the flower: Part Stage beginning of spring small and big seedlings beginning of summer small, medium and big blossoming flowers peak of summer seeds autumn seedlings and seeds Denote by x\(t) and by x2(t) the number of small and large seedlings respectively at the start of the spring in year t. Denote by yi(t), y2 (t) and y3 (t) the number of small, medium and large flowers respectively in the summer of that year. From the 204 CHAPTER 3. LINEAR MODELS AND MATRIX CALCULUS small seedlings either small or large flowers grow. From the large seedlings either medium or big flowers grow. Each of the seedlings can of course die (weather, be eaten by a cow, etc.) and nothing grows out of it. Denote by b{j the probability that the seedling of the j-th size, j = 1,2 grows into a flower of the i-fh size, i = 1,2,3. Then we have 0 < &n < 1, &i2 =0, 0 < &2i < 1, 0 < b22 < 0, b3i = 0, 0 < b32 < 1,&h + b21 < 1, b22 + b32

■ h+^h2+l2 h+Vh? 2h I ■ Thus we have shown that for the initial speed from (4), the player is able to score. During the free throw, supposing the player lets the ball go at the height of 2 m, we have h = 1.05 m, 1 = 4.225 m, g = 9.80665 m ■ s"2, and so the minimal initial speed of the ball is v0 = y^9.80665 [l.05 + ^(1.05)2 + (4.225)2Jm ■ s"1 = 7.28 m- s"1. The corresponding angle is then * = ^8 9.806 61.4.225 = °-907 fad « 52 °- Let us think for a while about the obtained value of the angle p for the initial speed vq. According to the picture, we have 2/3 + (it — a) = tt and a + 7 whence it follows that 2' 5.3.7. Derivatives of the elementary functions. Consider the exponential function f(x) = ax for any fixed real a > 0. If the derivative of ax exists for all x, then f\x) = lim 5x-s-0 Sx = ax lim adx - 1 <5x-s-0 Sx = f'(0)ax On the other hand, if the derivative at zero exists, then this formula guarantees the existence of the derivative at any point of the domain and also determines its value. At the same time, the validity of this formula for one-sided derivatives is also verified. Unfortunately, it takes some time to verify that the derivatives of exponential functions indeed exist (see 5.4.2, i, and 6.3.7). There is an especially important base e, sometimes known as Euler's number, for which the derivative at zero equals one. Remember the formula (ex)' = ex for a while and draw on its consequences. For the general exponential function, (using standard rules of differentiation), (axY = (e111^)' = ln(a)(eln(a^) = ln(a) ■ ax. Thus exponential functions are special since their derivatives are proportional to their values. Next, we determine the derivative (lne(a;))'. The definition of the natural logarithm as the inverse to ex, 1 1 1 (ex)' ex y So it holds that allows the calculation: (1) (ln)'(y) = (In) V) The formula (2) (xa)' = ax11'1 for differentiating a general power function can also be derived using the derivatives of the exponential and logarithmic functions: (xa)' = (ealnxY = ealnx(alnx)' = a— = axa x 5.3.8. Mean value theorems. Before continuing the journey of finding new interesting functions, we derive several simple statements about derivatives. The meaning of all of them is intuitively clear from the diagrams. The proofs follow the visual imagination. 319 CHAPTER 5. ESTABLISHING THE ZOO V = 5-0 = 5 + i = Hf +7) = Hi +arctgf). We have obtained that the elevation angle corresponding to the throw with minimal energy is the arithmetic mean of the right angle and the angle at which the rim is seen (from the ball's position). Rolle's theorem The problem of finding the minimal speed of the thrown ball was actually solved by Edmond Halley as early as in 1686, when he determined the minimal amount of gunpowder necessary for a cannonball to hit a target which lies at greater height (beyond a rampart, for instance). Halley proved (the so-called Halley's calibration rule) that to hit a target at the point [/, h] (shooting from [0,0]) one needs the same minimal amount of gunpowder as when hitting a horizontal target at distance h + \/h2 + I2 (at the angle p = 45 °). Halley also demonstrated that the value of ip is stable with regard to small difference of the amount of used gunpowder and insignificant errors in estimating the target's distance. □ 5.F.8. A bullet is shot at angle p from a point at height h A01// above ground at initial speed v0. It will fall on the fr ground at distance R from the point of shot (see the picture). Determine the angle p for which the value of R is maximal. Solution. We will express the bullet's position in time by the points [x(t), y(t)]. We assume that it was shot at time t = 0 from the point [0,0] and it will fall on the ground at the point [R,—h] at certain time t = t0, i. e. x(0) = 0, y(0) = 0, x(t0) = R, y(t0) = —h. Similarly to Halley's problem, we will consider the equations x1 (t) = vq cos p, y1 (t) = vq sin p — gt, £ £ (0, to) for the horizontal and vertical speeds of the bullet, where g is the gravity of Earth. We can continue as when solving the previous problem: by integrating these equations (taking x(0) = y(0) =0 into consideration), we get Theorem. Assume that the function f : R —> R is continuous on a closed bounded interval [a, b] and differentiable inside this interval. If f(a) = f(b), then there is a number c £ (a, 6) such that /'(c) = 0. t-0LLt!& TttEDZBM Proof. Since the function / is continuous on the closed interval (i.e. on a compact set), it attains its maximum and its minimum there. Either its maximum value is greater than f(a) = f(b), or the minimum value is less than f(a) = f(b), or / is constant. If the third case applies, the derivative is zero at all points of the interval (a, 6). If the second case applies, then the first case applies to the function —/.If the first case applies, it occurs at an interior point c. If /'(c) ^ 0 then the function / would be either increasing or decreasing at c (see 5.3.2), implying the existence of larger values than /(c) in a neighbourhood of c, contradicting that /(c) is a maximum value. □ 5.3.9. The latter result immediately implies the following corollary. Lagrange's mean value theorem Theorem. Assume the function f : R —> R is continuous on an interval [a, b] and differentiable at all points inside this interval. Then there is a number c G (a, 6) such that fib) - f(a) f(c) = M£AN VMMC- TtteoZEH 1 ^The French mathematician Michel Rolle (1652-1719) proved this theorem only for polynomials. The principle was perhaps known much earlier, but the rigorous proof comes from the 19th century only. 320 CHAPTER 5. ESTABLISHING THE ZOO x(t) = Vgt COS p, y(t) = Vgt sill if — | gt2, i 6 ((Mo), and from the conditions limt^to_ x(t) = x(t0) = R, limty(t) = y(t0) = —h, we then have that R = i>nin cos p, — h = i>oin sin 1,2 — ^ gig. From the first equation, it follows that to = r Vq cos if ' so we can express the previous two equations by the single equation gR2 (1) -h = R\na.p- 2 2vq cosz p where p G (0,7r/2). Unlike with Halley's problem, the value of v0 is given and R is variable (dependent on p). So, actually, there is a function R = R(p) (in variable p) which must satisfy (1) (it is determined by the equation (1)). Thus, this function is given implicitly. The equation (1) can be written as (R is substituted by R(p)) R(p) tan p ■ 2vg cos2 p — gR2(p) + h ■ 2v$ cos2 p = 0. Using the relation 2 tan p cos2 p = sin 2p, we can transform (1) into the form (2) R( - gR2(p) + 2hv\ cos2 p = 0. Differentiating with respect to p now gives R1 (p)vl sin2p + 2R{ip)vlcos2p - 2gR(p)R'(p) -2IiVq (2 cos p sin p) = 0, i. e. R'(p) [v2sm2p-2gR(p)] = —2R(p)vq cos 2p + 2hvg sin 2p. Thus we have calculated that It suffices to verify that sin2i,s — 2gR(p) ^ 0 for every p G (0,7r/2). Let us suppose the contrary and substitute R into (1), obtaining _ vQ sin 2(p _ v0 sin (p cos 29 9 _^ _ vl sin ip cos y _ fffo sin2 ip cos2 y g ^ 2g2t;2 cos2 (y9 Simple rearrangements lead to , _ vl sin2 tp which cannot happen (the left side is surely negative while the right one is positive). Proof. The proof is a simple statement of the geomet-itySS^ r'ca' meaning of the theorem: The secant line ^Bf// between the points [a, f(a)] and [b, /(&)] has a tangent line which is parallel to it (have a look at the diagram). The equation of the secant line is y = 9(x) = f(a) +-^ZTa-(x ~ a>- The difference h(x) = f(x) — g(x) determines the (vertical) distance of the graph and the secant line (in the values of y). Surely h(a) = h(b) and h'(x) = f\x) - f(b) - fa) b — a By the previous theorem, there is a point c at which h'(c) = 0. □ The mean value theorem can also be written in the form: (1) f(b) = f(a) + f(c)(b-a). In the case of a parametrically given curve in the plane, i.e. a pair of functions y = j(t), x = g(t), the same result about the existence of a tangent line parallel to the secant line going through the boundary points is described by Cauchy's mean value theorem: Cauchy's mean value theorem Corollary. Let functions y = f(t) and x = g(t) be continuous on an interval [a, b] and differentiate inside this interval, and further let g'(t) =/ 0 for all t G (a,b). Then there is a point c G (a, 6) such that fb)-f(a) =f(c) g(b)-g(a) g'(c)- Proof. Put h(t) = (/(b) - fa))g(t) - (g(b) - g(a))ft). Now h(a) = f(b)g(a) - f(a)g(b), h(b) = f(b)g(a) -f(a)g(b), so by Rolle's theorem, there is a number c G (a, b) such that ti(c) = 0. Finally, the function g is either strictly increasing or decreasing on [a, b] and thus g(b) =/ g(a). Moreover, g'(c) ^ 0 and the desired formula follows. □ 5.3.10. A reasoning similar to the one in the above proof leads to a supremely useful tool for calculating limits of quotients of functions. 321 CHAPTER 5. ESTABLISHING THE ZOO So we were able to determine R'( 0+ and for <£> —> 7r/2— the value of R decreases) and is differentiable at every point of this interval, it has its maximum at the point where its derivative is zero. This means that R(p) can be maximal only if (3) R(p) = hum2ip. Let us thus substitute (3) into (2). We obtain h tan 2p Vq sin 2p — gh2 tan2 2p + 2hvg cos2 p = 0, and let us transform this equation: tan 2p v2, sin 2p + 2vq cos2 p = gh tan2 2p, «o2 + uo (cos 2p + 1) = gh v2, sin2 2p + vl cos2 2p + cos 2p = gh cos 2 Uo2 + Uo2cos2^ = 5/l1^^, «2 (1 + cos 2p) = gh (1-cos2c^(21y+cos2y), i>q cos 2o r--ip- Jv%+2ghv% i/l — , 9 , ^ = —7-r-u—, we have \/"o+2g'"'o R (po) = h tan 2y>o = /i ■ ^ug+2gto2 Let, for instance, javelin thrower Barbora Spotakova give ajavelin the speed v0 = 27.778 m/s = 100 km/hat the height /i = 1.8 m (with g = 9.806 65 m ■ s~2). Then the javelin can fly up to the distance R (Vo) = 92806765 V27.7782+2 ■ 9.806 65 ■ 1.8 m= 80.46m. This distance was achieved for 9.806 65-1.8 0.774 2 rad « 44.36c However, the world record of Barbora Spotakova does not even approach 80 m although the impact of other phenomena (air resistance, for example) can be neglected. Still we must not forget that from 1 April 1999, the center of gravity of the women's javelin was moved towards its tip upon the decision of IAAF (International Association of Athletics Federation). This reduced the flight distance by around 10 %. Theorem. Suppose f and g are functions differentiable on some neighbourhood of a point x0 G R, yet not necessarily at x0 itself. Suppose lim f(x) = 0, lim g(x) = 0. X—YXq X^Xq If the limit exists, then the limit lim x-*x0 g'yx) lim /(*) x^x0 g(x) also exists, and the two limits are equal. i t j ' ' l'HosrmL mifi-11 / ' - Proof. Without loss of generality, the functions / and g are zero at the point x0. The quotient of the values then corresponds to the slope of the secant '^--^ line between the points [0,0] and [j(x),g(x)]. At the same time, the quotient of the derivatives corresponds to the slope of the tangent line at the given point. Thus it is necessary to verify that the limit of the slopes of the secant lines exists from the fact that the limit of the slopes of the tangent lines exists. Technically, we can use the mean value theorem in Cauchy's parametric form. First of all, the existence of the expression /' (x) jg' (x) on some neighbourhood of the point x0 (excluding xq itself) is implicitly assumed. Thus especially for points c sufficiently close to x0, g'(c) 4 0.13 By the mean value theorem, lim lim f(x) - f(x0) lim f'(Cx) x^xo g(x) x^x0 g(x) - g(x0) x^x0 g>(cx) 12Guillaume Francois Antoine, Marquis de l'Hopital, (1661-1704) became famous for his textbook on Calculus. This rule was first published there, perhaps originally proved by one of the famous Bernoulli brothers. 13This is not always necessary for the existence of the limit in a general sense. Nevertheless, for the statement of l'Hospital's rule, it is. A thorough discussion can be found (googled) in the popular article 'R. P. Boas, Counterexamples to L'Hospital's Rule, The American Mathematical Monthly, October 1986, Volume 93, Number 8, pp. 644-645.' 322 CHAPTER 5. ESTABLISHING THE ZOO The original record (with "correctly balanced" javelin) was 80.00 m. The performed reasoning and the obtained result can be applied to other athletic disciplines and sports. In golf, for instance, h is close to 0, and thus it is just the angle Po = lim i arccos , = A arccos 0=4 rad = 45 ° at which the ball falls at the greatest distance R = hl™0\ f ^° + 2gh = I1' Let us realize that our calculation cannot be used for h = 0 (ip0 = 7r/4) since then we would get the undefined expression tan (7r/2) for the distance R. However, we have solved the problem for any h > 0, and therefore we could get a helping hand form the corresponding one-sided limit. □ 5.F.9. Regiomontanus' problem, 1471. In the museum, there is a painting on the wall. Its lower -0T> edge is a meters above ground and its upper edge b meters, then (its height thus equals b — a). A tourist is looking at the painting, her eyes being at height h < a meters above ground. (The reason for the inequality h < a can, for instance, be to allow more equally tall visitors to view the painting simultaneously in several rows.) How far from the wall should the tourist stand if she wants to maximize her angle of view at the painting? Solution. Let us denote by x the distance (in meters) of the tourist from the wall and by p her angle of view at the painting. Further, let us set (see the picture) the angles a, /3 e (0,7r/2) by tana = ^, tan/3 = Our task is to maximize ip = a — (3. Let us add that for h > b, one can proceed analogously and for h e [a, b], the angle ip increases as x decreases (p = it for x = 0 and h e (a, &)). where cx is a number lying between x0 and x, dependent on x. From the existence of the limit lim -77-^, x-s-x0 g'{x) it follows that this value will be shared by the limit of any sequence created by substituting the values x = xn approaching xq into f'(x)/g'(x) (cf. the convergence test 5.2.15). Especially, we can substitute any sequence cXn for xn —> x0, and thus the limit x^xo g'(cx) exist, and the last two limits are equal. Hence the desired limit exists and has the same value. □ From the proof of the theorem, it is true for one-sided limits as well. 5.3.11. Corollaries. L'Hospital's rule can easily be extended for limits at the improper points ±00 and for the case of infinite values of the limits. If, for instance, we have lim f(x) = 0, lim g(x) = 0, x—soo x—soo then limx^0+ f0-/x) = 0 and limx^0+ 90-/x) = 0. At the same time, from existence of the limit of the quotient of the derivatives at infinity, lim Wm - lim x-0+ (5(1/2;))' x-o+ g'(l/x)(-l/x2) lim f'0-M lim x-s-o+ g'(\/x) x-s-oo g'(x) Applying the previous theorem, the limit lim /(*) lim fO-M lim x-s-oo g(x) X-S-0+ g(\/x) x-s-oo g'(x) exists in this case as well. The limit calculation is even simpler in the case when lim f(x) = ±00, lim g(x) = ±00. X—YXq X—YXq Then it suffices to write lim lim x-s-xo g(x) x-s-xo l//(a;)' which is already the case of usage of l'Hospital's rule from the previous theorem. It can be proved that l'Hospital's rule has the same form for infinite limits as well: Theorem. Let f and g be functions differentiable on some neighbourhood of a point xq G R, not necessarily at xq itself. Further, let the limits Ym\x^Xo f(x) = ±00 and ]imx-tXo g(x) = ±00 exist. If the limit /'(*) exists, then the limit lim x->x0 g'(x) ii„ M x-s-xo g(x) also exists and they equal each other. 323 CHAPTER 5. ESTABLISHING THE ZOO From the condition h < a it follows that the angle

0 for a e (O, y(b- h)(a- /i)) , f'(x)<0 for x e (V(&-/i)(a-/i),+oo) . Hence the function / has its global maximum at the point x0 = y7'(6 — h)(a — h) (let us remind the inequalities h < a < b). The point x0 can, of course, be determined by other means. For instance, we can (instead of looking for the maximum of the positive function / on the interval (0, +oo)) try to find the global minimum of the function g(x) + _ 1 _ x2 + (b-h)(a-h) _ f(x) x(b—a) (b-h)(a-h) x(b—a) x e (o, +oo) with the help of the so-called AM-GM inequality (between the arithmetic and geometric means) Proof. Apply the mean value theorem. The key step is to express the quotient in a form where the derivative arises: f(x) _ f(x) /(s) - f(y) g(s) - g(y) g(x) f(x)-f(y) g(x)-g(y) g(x) where y is fixed, from a selected neighbourhood of xq and x is approaching x0. Since the limits of / and g at x0 are infinite, we can surely assume that the differences of the values of both functions at x and y, having fixed y, are non-zero. Using the mean value theorem, replace the fraction in the middle with the quotient of the derivatives at an appropriate point c between x and y. The expression of the examined limit thus gets the form g(x) i - g(y) g(x) 1 - 9'(cY where c depends on both x and y. As x approaches x0, the former fraction converges to one. If y is simultaneously moved towards x0, the latter fraction becomes arbitrarily close to the limit value of the quotient of the derivatives. □ 5.3.12. Example. By making suitable modifications of the examined expressions, one can also apply l'Hopital's rule on forms of the types oo — oo, 1°°, 0 ■ oo, and so on. Often one simply rearranges the expressions or uses some continuous function, for instance the exponential one. For an illustration of such a procedure, we show the connection between the arithmetic and geometric means of n non-negative values Xi. The arithmetic mean M (x1,...,xn) =- n is a special case of the power mean with exponent r, also known as the generalized mean: xr,+ + xi Mr(Xl,...,xn) = The special value M-1 is called the harmonic mean. Calculate the limit value of Mr for r approaching zero. For this purpose, determine the limit by l'Hopital's rule (we treat it as an expression of the form 0/0 and differentiate with respect to r, with x{ as constant parameters). The following calculation uses the chain rule and knowledge of the derivative of the power function, must be read in reverse. The existence of the last limit implies the existence of the last-but-one, and so on. lim \n(Mr(Xl,... , aj) = lim m^K + '■■ + <)) In X±-\-----\~xn ln xn = lim r " r- r—>0 rxn n lnai + ■ ■ ■ + lna„ In ýxx.....xn. Hence lim Mr(xi, ...,i„) = Á/ai ...xr, v—so 324 CHAPTER 5. ESTABLISHING THE ZOO ^^VvTm, 2/1,2/2 >0, where the equality occurs iff yi = y2. The choice 2/1 (z) = bf^, V2{x) = (b~'ll'La~)h) then gives d{x) = yi(x) + y2(x) > 2 \Jyi{x) y2(x) = bhV(b-h) (a-h). Therefore, if there is a number x > 0 for which y± (x) = y2(x), then the function g has the global minimum at x. The equation yi(x)=y2(x), i.e. ^=(-^1, has a unique positive solution x0 = \J(6 — h)(a — h). We have determined the ideal distance of the tourist from the wall in two different ways. The angle corresponding to x0 is *a(b-a) _ „„„t„„ _b-a — 1, b 0, and a natural number n > 2, (1 + b)n > 1 + nb. Proof. For n = 2, (1 + b)2 = l + 2b + b2 > l + 2b. Proceed by induction on n, supposing b > — 1. Assume that the proposition holds for some k > 2 and calculate: (1 + b)k+1 = (1 + b)k{\ + b)>{\ + kb){\ + b) = l + (k + l)b + kb2 >l + (k + 1)6 The statement is, of course, true for 6 = —1 as well. □ 325 CHAPTER 5. ESTABLISHING THE ZOO point R (given by the value x) where the ray refracts. The distance between the points A and R is \Jh\ + x2, between points R and B it is \Jh\ + (d — x)2, then. The total time of the transmission of energy between the points A and B is thus given by the function T(x) = + ^hl+{d-x)2 in the variable x £ [0,d\. Let us emphasize that we want to find the point x £ [0, d] at which the value T(x) is minimal. The derivative T'(x) =-t2=--, d~x is a continuous function on the interval [0, d], so its sign can be easily described by its zero points. From the equation T'(x) = 0, i. e. it follows that V2 ' \/h| + (d-*)2 This expression is useful for us because (see the picture) d—x sin ipi l , S1T3. (fin — I yjhl+x* ^Jhl + (d-xY Thus there is at most one stationary point; it is determined by (i) sinyi = ^i siny2 v2 Let us realize that as fi\ £ [0, tt/2] increases (when x increases), the angle p2 £ [0, tt/2] decreases. The sine is non-negative and increasing on the interval [0, tt/2], so the quotient (sin fi{)/(sin p2) is increasing with respect to x. Since T'(0) < 0 and T'(d) > 0, there is exactly one stationary point x0. From the inequalities T'(x) < 0 for x £ [0,x0) and T'(x) > 0 for a; £ (x0, d], it follows that there is the global minimum at the stationary point x0. Let us summarize the preceding: The ray is given by the point R of refraction (i. e. the value xq), and the point R is given by the identity (1), which is called Snell's law in physics. The quotient of vi and v2 is constant for the given homogeneous spaces and determines an important quantity which describes the interface of optical spaces. It is called a refractive index and denoted by n. Usually, the first space is vacuum, i. e. vi = c, and v2 = v, thus obtaining the (absolute) index of refraction n = c/v. For vacuum, we get n = 1, of course. This value is also used for air since its refractive index at the standard conditions (i. e. pressure of 101 325 Pa, temperature of 293 K and absolute humidity of 0.9 g m~3) is Now (! + £)" = {n2 - l)"w (l + ^i)™-1 ~ n2"(n-l) n >(i-I)-^ = l. \ n2) 71 — 1 v 71 ^ 71 — 1 by using Bernoulli's inequality with b = —^2). So an > a„_ 1 for all natural numbers, and it follows that the sequence an is indeed increasing. The following similar calculation (also using Bernoulli's inequality) verifies that the sequence of numbers 71 / V 71 is decreasing. Notice that bn > an. Also On n I — \ 71 f n2 + 2n + 1 71 + 1 \ 712 + 2ti n+2 > 71 (> + 1 71+1 71(71 + 2) 71 (- 71 + 2 71+1 71(71 + 2) Thus the sequence an is increasing and bounded from above, so the set of its terms has a supremum which equals the limit of the sequence. At the same time, this value is also the limit of the decreasing sequence bn because lim bn = lim (1 H--)a„ = lim an. n—too n—too 71 n—too This Umit determines one of the most important numbers in mathematics (besides the numbers 0, 1, and tt), namely Euler's number14 e. Thus e = lim ( 1 H-- 71 5.4.2. Power series for ex. The exponential function has been defined as the only continuous function satisfying /(l) = e and f(x + y) = j(x) ■ f(y). The base e is now expressed as the limit of the sequence an, thus necessarily ex = lim (an)x. n—too Fix a real number x ^ 0. If we replace n with n/x in the numbers an from the previous paragraph, we arrive again at the same limit. (Think this out in detail!) Hence lim (1 + ^)' lim (1 + ^)' 14-r The ingenious Swiss mathematician, physicist, astronomer, logician and engineer Leonhard Euler (1707-1783) was behind extremely many inventions, including original mathematical techniques and tools. 326 CHAPTER 5. ESTABLISHING THE ZOO n = 1.000272. Other spaces have n > 1 (n = 1.31 for ice, n = 1.33 for water, n = 1.5 for glass). However, the refractive index also depends on the wave length of the electromagnetic radiation in question (for example, for water and light, it ranges from n = 1.331 to n = 1.344), where the index ordinarily decreases as the wave length increases. The speed of light in an optical space having n > 1 depends on its frequency. We talk about the dispersion of light. The dispersion causes rays of different colors to refract at different angles. (The violet ray refracts the most and the red ray refracts the least.) This is also the origin of a rainbow. We can further remind the well-known Newton's experiment with a glass prism from 1666. Eventually, let us remark that our task always has a solution because we can choose the point R arbitrarily. If, together with the speeds v1 and v2, the angle p± were given as well (our task could then be to calculate where the ray going from the point A intersects the line y = c for a certain c < 0 when the interface of optical spaces is on the a;-axis), then the angle p2 £ (0, tt/2) satisfying (1) might not exist. This corresponds to the total reflection (there is no refracted light at all). □ Further miscellaneous problems concerning extrema of functions of a real variable can be found at ?? Denote the n-th term of this sequence by un(x) = (1 + and express it by the binomial theorem: (1) , . x n(n — l)a un(x) = l + n-+ ' n 2\nz x2 (, 1 = 1 + X+2i{1-n _l-----1_ . \\xn „3 1 3! \ n 1--I + • n , 1 n-1 Look at un(x) for very large n. It seems that many of the first summands of un(x) will be fairly close to the values -k\xk, k = 0,1,____ Thus it is plausible that the numbers un(x) should be very close to vn(x) = Y^=o J\x^ mus both these sequences should have the same limit. The following theorem is perhaps one of the most important results of Mathematics: The power series for ex Theorem. The exponential function ex equals, for each x £ R, the limit linifc-s-oo Vk(x) of the partial sums in the expression 1 1 °° 1 ex = 1 + x + -x2 H-----h —rxn H----= V —xn. 2 n\ ^ n\ i=0 The function ex is differentiate at each point x and its derivative is (exy = ex. 5.F.11. Prove, that the polynomial P(x) = x5 — x4 +2x3 — x2 + x + 1 has exactly one real root. Solution. Any odd degree polynomial has at least one real root, since for big (in absolute value) negative x are the values P(x) big (in absolute value) negative, for big positive x are the values P(x) big positive and since P(x) is a continuous function, it must attain the zero value. One can also argue with the fundamental theorem of algebra (12.2.8). We know, that the polynomial P(x) has to have five roots over the field of complex numbers and that complex roots of the polynomial with real coefficients come in pairs of conjugated numbers. Therefore the polynomial has to have at least one real root. If there were at least two roots, then according to the mean value theorem (it suffices the Rolle's version, cf. 5.3.8) there must be a c £ (a,b), such that P'(c) = 0. But P'(x) = 5x4 - 4a;3 + 6a;2 - 2a; + 1 = 2a;2(a; - l)2 + 3a;4 + 3a;2 + (x — l)2 > 0. Thus the polynomial has exactly one real root. □ Proof. The technical proof makes the above idea pre-(Ji., cise. Fix x and recall that vn(x) is the sequences defined as the sums of the first n terms of the formal infinite expression 3=0 3=0 J While vn(x) is strictly increasing if x > 0, for all m > n \vm(x) - vn(x)\ < m ^ k=n+l i(\x\) - vn(\x\). Hence vn(x) is a Cauchy sequence, and thus convergent, if vn (x) is always bounded (and thus convergent) for a; > 0. The quotient of adjacent terms in the series is Cj+1 /Cj = x/(n + l). Thus for every fixed x, there is a number N £ N such that \cj+1/cj\ < 1/2 for all j > N. However, such large indices j satisfy |cJ+i| < |c,-| <2-0-JV+1) \cn\ Recall that sums of a geometric series is computed from the equality (1 - g)(l + q H-----h qk) = 1 - qk+1. This means that the partial sums of the first n > N terms of our formal sum with x > 0 can be estimated as follows \vn(x) - ^ ya;J| < 3=0 1 AT E 3=0 1 2~y 327 CHAPTER 5. ESTABLISHING THE ZOO 5.F.12. Let / : R —> (0, oo) be a continuously differen-tiable function. Prove that there exists £ G (0,1) such that e/(«)/(0)/(*0 = f(l)™. Solution. We can equivalently transform the equation: ef(o-(m) ~\f(0)J no M) ln/(l)-ln/(0). The existence of such a a; is guaranteed by the Lagrage's mean value theorem (cf. 5.3.9) for the function g(x) = ln(/(a)) (there is g' (a) = □ G. L'Hospital's rule 5.G.I. Verify that the limit (a) (b) (c) (d) (e) (f) (g) sin (2a) — 2 sin a 0 lim--——- is of the type -; x->o 2tx - a2 - 2a - 2 JF 0 In x oo lim - is of the type —; x-s-o+ cot a oo lim ( ——-— r— I is of the type oo — oo; x-s-i+ \ x — 1 lna lim (In (a — 1) ■ In a) is of the type 0 ■ oo; X—Yl + lim (cot a) i» * is of the type oo x—Y0+ / sin a lim ( - ) is of the type 1°°; lim ( cos — ) is of the type 0°. x—yi— v 2 / Then calculate it using 1'Hospital's rule. Solution. We can immediately assert that (a) lim (sin (2a) - 2 sin a) = 0 - 0 = 0, lim (2ex - a2-2a-2) = 2- 0- 0- 2 = 0; x-s-0 (b) In particular, the limit of the expressions on the right-hand side for n approaching infinity surely exists, and so the limit of the increasing sequence vn also exists. Now examine the sequence of numbers un, whose limit is ex. Consider n > N for some fixed N (imagine already N is very large) and choose a fixed number k < N. Write un,k for the sum of the first k summands in the expression (1) for un. Having fixed a and some e > 0, we can choose k big enough to ensure un,k — un < e (remind un is a convergent and thus Cauchy sequence). Indeed, the absolute values of the omitted terms are less than those in vn. If N is large enough, \un^ — vk\ < e for all n > N. Indeed, there is only a fixed number of the brackets in the summands of un^ and they will all be arbitrarily close to 1, if n is large enough. Summarizing, for each e > 0 the choices of k and n lead to the estimate \vk — un\ < \vk — un^ \ + \un^ — un \ < 2e. Choosing a sequence of = l/2i, we find subsequences vkz and un% satisfying \vk% — un% \<\. Thus the two convergent sequences must both have the same limit: lim vk = lim un. k—yoo n—soo This is the first claim we had to prove. Remind we already know that (ex)' = ex if and only if the derivative equals 1 at the origin, see 5.3.7. Thus, it remains to compute lim x-s-0 (l + a + |a2 + . 1 lim x-s-0 X ~\~ ^x ~\~ qX ~\~ ■ This seems to be tricky, since there are two limit expressions there (notice that an a may be cancelled since this is a constant in the inner limit): lim ( lim \^ ^-xk 1 ] = 1 + lim ( lim ^-xk x—yO V n—soo —' k\ I x—yO V n—Yoo ^—' k\ V k=l ' V k=2 Next, for each e lim„_ > 0 we can find N such that X^n =-rfe l^k=N k\X < lim,, En k-. 1_ k=N k\ < e for all a G [—1,1]. Next we restrict the interval for a enough to ensure that the remaining first terms yield N-l i_ k=2 k\- too. This shows that the limit expression on the right-hand side must be zero. Thus the derivative exists and equals to one, as expected. □ Readers who skipped the preceding paragraphs (it doesn't matter whether on purpose or in need) can stay calm - we deduce all the results on the exponential function later again, using more general tools. In particular, we will see that all power series are always differentiable and can be differentiated term by term. We see later that the conditions /'(a) = /(a) and /(0) = 1 determine a function uniquely. lim lna = —oo, lim cot a = +oo; X-S-0+ X-S-0+ 328 CHAPTER 5. ESTABLISHING THE ZOO (c) (d) (e) (f) lim - = +00, lim-- = +00; x-s-i+ x — 1 x-s-i+ In a; lim lnx = 0, lim In (a; — 1) = —00; X-S-1 + X-S-1 + lim cot a; = +00, lim-- = 0; X-S-0+ x-s-o+ lnx smx 1 lim- = 1, lim —^ = +00; x-s-0 X x-s-0 X1 (g) lim cos— = 0, lim In a; = 0. X —Yl— 2 X—Vl — The case (a). Applying l'Hospital's rule transforms the limit into the limit 1 (2a;) — 2 sin a; x^o 2ex - x2 - 2x - 2 lim lim 2 cos (2a;) — 2 cos a; x^o 2ex - 2x - 2 which is of the type 0/0. Two more applications of the rule lead to —4 sin (2a;) + 2 sin x lim--—-- x^o 2ex - 2 and (the above limit is also of the type 0/0) -8 cos (2x) + 2 cos x -8 + 2 lim--—--=-= —3. x^o 2ex 2 Altogether, we have (returning to the original limit) sin (2x) — 2 sin a; lim-1——- = —3. x^o 2ex - x2 - 2x - 2 Let us remark that multiple application of l'Hospital's rule in an exercise is quite common. From now on, we will set the limits of quotients of derivatives obtained by l'Hospital's rule equal the limits of the original quotients. We can do this if the gained limits on the right sides really exist, i. e. actually we will make sure that what we write is senseful only afterwards. The case (b). This time, differentiation of the numerator and the denominator gives lnx ,. \ ,. — sin2x lim - = lim —^z— = lim -. X-S-0+ COtX X-S-0+ -rJ 5.4.3. Number series. When deriving the previous theo-1 rems about the function ex, we have automatically used several extraordinarily useful concepts and tools. Now, we come back to them in detail: Infinite number series Definition. An infinite series of numbers is an expression 00 an = a0 + ai + a2 H----+ ak + . .., 71 = 0 where the a„'s are real or complex numbers. The sequence of partial sums is given by the terms sk = }Zn=o a«- The series converges and equals s if the limit s = lim s„ k—Yao of the partial sums exists and is finite. If the sequence of partial sums has an improper limit, the series diverges to 00 or —00. If the limit of the partial sums does not exist, the series oscillates. X-S-0+ X The last limit can be determined easily (we even know it). From sin x lim — sinx = 0, lim - = 1, X-S-0+ X-S-0+ X 5.4.4. Properties of series. For the sequence of partial sums sn to converge, it is necessary and sufficient that it is a Cauchy sequence; that is \Sm Sn\ I^ti-)-! + ' ' ' + Clri must be arbitrarily small for sufficiently large m > n. Since \an+iI + ' ' ' + \am\ > Pn+l + ' ' ' + an the convergence of the series J^fcLo a™ imPues the convergence of the series Y^k=o a™- Absolutely convergent series A series J3fcLo a™ *s absolutely convergent if the series 127=0 Kl converges. Absolute convergence is introduced because it is often tjfm. much easier verified. The following theorem shows that all simple algebraic operations be-- ""f£k^y«: nave "very well" for series that converge absolutely. 329 CHAPTER 5. ESTABLISHING THE ZOO the result 0 = 01 follows. We could also have used l'Hos-pital's rule again (now for the expression 0/0), obtaining the result — sin2 x — 2 ■ sin x ■ cos x — 2 ■ 0 ■ 1 hm - = hm -;- = -;- = 0. Properties of series 1 1 X-S-0+ X X-S-0+ The case (c). By mere transforming to a common denominator: lim x 1 \ x lna; — (x — 1) = hm x-s-n- \x — 1 Inx) x-s-i+ (a; —l)lna; we have obtained the type 0/0. We have that x\wx — (x — X) lnx + f —1 lim —t-^-- = lim ——j—-- x^i+ (x — l)mx x^i+ S—L-i-inx = lim In a; *-*■!+ 1- ± + lnx X We have the quotient 0/0, which (again by l'Hospital's rule) satisfies lim In a; 1 1 — lini--- _ _ x'Ti+ 1 - I + Ina; *-n+ -L + I 1 + 1 2' r x Returning to the original limit, we write the result 1 \ 1 lim . . x->i+ \x — 1 ma;/ 2 The case (d). We transform the assigned expression into the type oo/oo (to be precise, into the type — oo/oo) by creating the fraction lim In (x — 1) ■ In x = lim ^ ^ X-S-1 + By l'Hospital's rule. lim ——:-- = lim X-S-1 + 1 x-1 -x In2 x X-S-1 + l X-S-1+ — 1 . 1 In2 x x lim X-S-1 + x — 1 This indeterminate form (of the type 0/0) can once again be determined by l'Hospital's rule: lim —x In x = lim -In2 a; - 2a; In a; ■ ± 0 + 0 x-¥l+ X — 1 X-S-1+ 1 The cases (e), (f), (g). Since lim )£ N of integers. Proof. Both the first and the second statements are a straightforward consequence of the corresponding properties of limits. The third statement is not so simple. Write k=0 From the assumptions and from the rule for the limit of a product, k \ / k an ■ (YlK)(I]a«) ' (Ebn Vn=0 / \n=0 J \n=0 J \n=0 Thus it suffices to prove that \ \n=0 / \n=0 / n=0 Consider the expressions Ea" ) " ( E&" ) = E Vn=0 / \n=0 / 0fc 00+ cos2 x — sin'' x 1 - 0 Yji sin x lim-2— i-s-0 xz type ■ X _ X cos x-sm X x cos x — sin x lim . i-s-o 2xzsmx i-s-0 type 2a; cos x — x sin a; — cos x lim--- x^>o Ax sin x + 2xz cos a; — sin a; = lim- i-s-o 4 sm a; + 2a; cos x — cos a; = lim type -1 1 x-s-o 4 cos a; + 2 cos x — 2x sin a; 4 + 2 — 0 6' hence E i+j>k 0 0 we can find a common bound a such that for all N > a both estimates are true: E \Y an - S\ < lim (cot a;)1,11 =e 1 = -; x->o+ e lim i-s-0 1 x J ve We can proceed similarly when determining the last limit. We have that TlX s lim (In x) ■ In (cos x—vi— V 2 In cos2f = lim —--— X—Yl — 1 In x type ■ —00 00 —00 00 1 = lim X—Yl — 1 _ 1 In2 x x 7r x sin y'In 1 = 7T lim -Sfi-• 2 mi- cos Now, consider any permutation tr of the indices and write Ia = {o--1(0), • • • ,o--1(a)}. Then, for each iV > max JCT clearly n n n E a<*(n)~ ^1 =E a<*(n)~ s + E atj(n) n=0 ne/0 - Ea™~ + E i^w «=0 njt/,, Since this form is of the type 0/0, we could continue by using l'Hospital's rule; instead, we will go from lim X—Yl — ■In2 a; over to the product of limits f . TTX lim x sm — x-n- \ 2 In2 x ■ lim i-s-l- cos ■ = 1 ■ lim In2 x i-s-l- COS ■ Only now we apply l'Hospital's rule for 2 In a: ■ 1 In2 a; lim x-s-l- COS ■ 0 type- lim Next, notice that n <£ Ia means a(n) > a. Thus, the latter term is at most equal to J2^+i an and thus the entire express-sion is bounded by 2e. This shows that the rearranged series converges to the same value S again. □ mi- (-1) sin TTX 2 ■ = 0. 5.4.5. Simple tests. The following theorem collects some useful conditions for deciding on the convergence of series. 331 CHAPTER 5. ESTABLISHING THE ZOO Altogether, we have Convergence tests l. e. lim (In x ■ In ( cos — x-n- v v 2 lim (cos — x-n- v 2 1-0 = 0, = eu = 1. □ 5.G.2. As we have implicitly mentioned, using l'Hospital's rule can lead to a non-existing limit even though the original limit exists: Determine the limit x + sin x lim - x—too x Solution. The limit is of the type ^, by l'Hospital's rule, we get that x + sin x 1 + cos x lim - = hm -, x—too x x—too 1 and since the limit lim^oo cos x does not exist, nor does the limit lim^oo 1 + cos a;. However, the original limit exists because x — 1 x + sin x x + 1 x ~ x ~ x ' and by the squeeze theorem, x + sin x x + sin x x + 1 1 = lim - < lim - < lim - = 1. x—too X x—too X x—too X □ 5.G.3. Determine lim -, lim x In —, lim x ex , I-S-+00 x x-tO+ X x-tO+ lim x e 01, lim i-s-0 x—7-0 X^® ' x—7+00 lim (In x — x) , lim lim -r==, lim i-»+oo a; + Iim ■ cos a;' x-t+oo y*x + 3' a-s-+oo y'x2 + 1 Solution. It can easily be shown (for instance, by n-fold use of l'Hospital's rule) that for any n e N, it holds that lim — = 0, i. e. lim — = +oo. x—7+00 qx x—7+00 xn The squeeze theorem implies the following generalization for real numbers a > 0: xa ex lim — = 0, i. e. lim — = +oo. x—7+00 qx x—7+00 xa Taking into account that the graphs of the functions y = ex and y = In x (the inverse function to y = ex) are symmetric with regard to the line y = x, we further see that In a; a; lim -= 0, i. e. lim--= +oo. i-s-+oo x i-s-+oo In a; Theorem. Let S = YLn°=o an be an infinite series of real or complex numbers. Let T = Y^=o be another series with all bn > 0 real. (1) If the series S converges, then limn_j.00 an = 0. (2) (The comparison test) IfT converges and \an\ < bn, then S converges absolutely. If bn < \an\ and T diverges, then S does not converge absolutely. (3) (The limit comparison test). If both an and bn are posi- tive real numbers and the finite limit lim^-^ ■ r > 0 exists, then S converges if and only ifT converges. (4) (The ratio test) Suppose that the limit of the quotients of adjacent terms of the series exists and lim n—too -»n+l Then the series S converges absolutely for \q\ < 1 and does not converge for \ q\ > 1. If\q\ = 1 the series may or may not converge. (5) (The root test) If the limit exists, then the series converges absolutely for q < 1. It does not converge for q > 1. If q = 1, the series may or may not converge. Proof. (1) The existence and the potential value of the limit of a sequence of complex numbers l-t-li'/, is given by the limits of the real parts and the imaginary parts. Thus it suffices to prove the first proposition for sequences of real numbers. If linijj^oo an does not exist or is non-zero, then for a sufficiently small number e > 0, there are infinitely many terms ak with \ak\ > e. There are either infinitely many positive terms or infinitely many negative terms among them. But then, adding any one of them into the partial sum, the difference of the adjacent terms sn and s„+i is at least e. Thus the sequence of partial sums cannot be a Cauchy sequence and, therefore, it cannot be convergent. (2) The result is a straightforward consequence of the squeeze theorem, cf. 5.2.12. (3) Since limit r = lim^oo ^ exists, for any given e > 0 and sufficiently big n > Ne, (r-e)bn 0. Suppose q < r < 1 for a real number r. From the existence of the limit of the quotients, for every j greater than a sufficiently large N, < r • a, < rV-N+1> 332 CHAPTER 5. ESTABLISHING THE ZOO Thus we have obtained the first result. That could also be derived from 1'Hospital's rule because lim -= lim — = lim — = 0. x—>-+oo x x—>-+oo 1 x—>-+oo x Let us point out that l'Hospital's rule can be used to calculate all of the following five limits. However, it is possible to determine these limits by much simpler means. For instance, the substitution y = 1 jx leads to 1 my lim x In — = lim - = 0; x-s-0+ X j/-)-+oo y lim x e1 X-S-0+ Jim — = +oo. j/-s-+oo y Of course, x —> 0+ gives y = 1/x —> +oo (we write l/ + 0 = +oo). By the substitutions u = —l/x,v = 1/x2 we get that, respectively, lim a; e x-s-0- lim lim-- u—s-+oo u „50 x-s-o a; 100 = lim v—y+oo e = 0, where x —> 0— corresponds to u = — 1/a; —> +oo (we write —1/ — 0 = +oo) and then a;^0tow = l/a;2^ +oo (again 1/ + 0 = +oo). We have also clarified that lim (In a; — x) = lim —x = — oo. x—>-+oo x—>-+oo Potential doubts can be scattered by the limit In x — x ( x lim —■- = Jim I 1 —-- x-s-+oo In a; x-s-+oo V In a; which proves that even when decreasing the absolute value of the considered expression (without changing the sign), the absolute value of the expression remains unbounded. We can equally easily determine lim x x lim — = 1; x-s-+oo x + In X ■ COS X x-s-+oo x Jim —r== = Jim —— = +oo; x-s-+oo y'x + S x-s-+oo yx lim x lim x 1. x^+oo Va;2 + 1 I^-f°° va;z We have seen that the l'Hospital's rule may not be the best method for calculating limits of types 0/0, oo/oo. The three preceding exercises illustrate that it even cannot be applied in all cases (for indeterminate forms). If we had applied it to the first problem, we would have obtained, for x > 0, the quotient 1 x where the last equality follows from the general equality (1 — r)(l + r2 + + rK 1 — rk+1. But this means that the partial sums sn are, for large n > N, bounded from above by the sums N n-N N sn < ^ aj + aN ^ r7 = ^ a,j + aN- j=o 3=0 1-r 1 + — In x ■ sin x x + cos x — x In x ■ sin x' Since 0 < r < 1, the set of all partial sums is an increasing sequence bounded from above, and thus its limits equals to its supremum. In the case q > r > 1, a similar technique can be used. However, this time, from the existence of the limit of the quotients, aj+1 >r-aj> r{]-N+1) aN > 0. This implies that the absolute values of the particular terms of the series do not converge to zero, and thus the series cannot be convergent, by the already proved part (1) of the theorem. (5) The proof is similar to the previous case. From the existence of the limit q < 1, it follows that for any r,q < r < 1, there is an 7Y such that for all n > N, \/\an\ < r holds. Exponentiation then gives \an\ < r", there is a comparison with a geometric series. Thus the proof can be finished in the same way as in the case of the ratio test. □ In the proofs of the last two statements of the theorem, a much weaker assumption is used than the existence of the limit. It is only necessary to know that the examined sequences of non-negative terms are, from a given index on, either all larger or all less than a given number. For this purpose however, it suffices to consider, for a given sequence of terms bn, the supremum of the terms with index higher than n. These suprema always exist and create a non-increasing sequence. Its infimum is then called upper limit of the sequence and denoted by lim sup bn. n—too The advantage is that the upper limit always exists. Therefore, we can reformulate the previous result (without having to change the proof) in a stronger form: Corollary. Let S = J_^o a« ^e an infinite series of real or complex numbers. (1) V «71+1 q = Jim sup - , 71—>-00 «77 then the series S converges absolutely for q < land does not converge for q > 1. For q = 1, it may or may not converge. (2) If q = lim sup \/|an|, 77—>-00 the series converges absolutely for q < 1 while it does not converge for q > 1. For q = 1, it may or may not converge. 333 CHAPTER 5. ESTABLISHING THE ZOO which is more complicated than the original one. The limit for x —> +00 does not even exist, so one of the prerequisites of l'Hospital's rule is not satisfied. In the second case, any number of multiple uses of l'Hospital's rule leads to indeterminate forms. For the last problem, l'Hospital's rule sends us back to the original limit: first it gives the fraction 1 Vx2 + 1 and then 1 Vx^TT' From here, we can deduce that the limit equals 1 (we are looking for a non-negative real number aeR such that a = a-1) only if we have already shown it exists at all. □ Other examples concerning calculation of limits by l'Hospital's rule can be found at page 354. H. Infinite series Infinite series naturally appear in a series (of problems). 5.H.I. Sierpinski carpet. The unit squares is divided into nine equal squares and the middle one is removed. Each of the eight remaining squares is again divided into nine equal subsquares and the middle subsquare (of each of the eight squares) is removed again. Having applied this procedure ad infinitum, determine the area of the resulting figure. Solution. In the first step, a square having the area of 1/9 is removed. In the second steps, eight squares (each having the area of 9~2, i. e. totaling to 8 ■ 9~2) are removed. Every further iteration removes eight times more squares than in the previous steps, but the squares are nine times smaller. The sum of areas of all the removed squares is 9 + 92 + 93 + n=0 The area of the remaining figure (known as Sierpinski carpet) thus equals = 1 - = 0. 71 = 0 77=0 □ 5.H.2. Koch snowflake, 1904. Create a "snowflake" by the following procedure: At the beginning, consider an equilateral triangle with sides of length 1. With each of its three sides, do the following: Cut it into three equally long parts, build another equilateral triangle above (i. e. pointing out 5.4.6. Alternating series. The condition an —> 0 is a necessary but not sufficient condition for the convergence of the series J2n-. However, there is the Leibniz criterion of convergence. Leibniz criterion for alternating series The series £^L0(—l)"an> wnere an is a non-increasing sequence of non-negative real numbers, is called an alternating series. Theorem. An alternating series converges if and only if limn_j.00 an = 0. Its value a = £^L0(—l)™a™ differs from the partial sum s2k by at most a2k+i- Proof. By the definition the partial sums Sk of an alternating series satisfy s2(fc+l)+l = S2k+1 + a2k+2 — &2k+3 > S2fc+1 s2(fc+i) = s2k — a2k+i + a2k+2 < s2k s2k+i — s2k = —a2k+i —> 0 s2 > s2k > s2k+1 > s1. Thus, the even partial sums are a non-decreasing sequence, while the odd ones are non-increasing. The last line reveals that the bounded sequence of the odd partial sums converges to its supremum, while the even ones converge to the inn-mum. The previous line says they coincide, if and only if liiiijj^oo an = 0 which proves the first claim. At the same the limit value a of the series is always at most S2fc+i and at least s2k. Thus, the latter partial sums cannot differ by more than a2k+i. □ Remark. As obvious from the latter theorem, convergent alternating series are often not convernging absolutely. This phenomenon is called conditionally converging series. Unlike the independence on the order in which we sum up the terms of an absolutely convergent series (cf. (4) of Theorem 5.4.4), there is the famous Riemann series theorem saying that conditionaly convergent series can be braught to any finite or infite value by appropriate rearrangement of the terms in the sum. We shall not go into the proof here. 5.4.7. Convergence rate. The proofs of the tests derived in the previous two paragraphs allow also for straightforward estimates of the speed of the convergence. Indeed, both the tests for the absolute convergence are based on the comparison with the geometric series either for q = lim sup^^ | or q = limsup^^ \/\an\, and 0 < q < 1. In the estimate of the error of approximation of the limit by the 71-th partial sum s„ \an\^1 \aN\r 1 3=0 1-r Crn where N and q < r < 1 are the two related choices from the proof of the test and C is the resulting constant non dependent of n. Thus the convergence rate is quite fast, in particular if 334 CHAPTER 5. ESTABLISHING THE ZOO from, not into, the original triangle) the middle part and remove the middle part. This transforms the original equilateral triangle into a six-pointed star. Once again, repeat this step ad infinitum, thus obtaining the desired snowflake. Prove that the created figure has infinite perimeter. Then determine its area. Solution. The perimeter of the original triangle is equal to 3. In each step, the perimeter increases by one third since three parts of every line segment are replaced with four equally long ones. Hence it follows that the snowflake's perimeter can be expressed as the limit dn = 3 (4)™ and lim dn = +oo. The figure's area is apparently increasing during the construction. To determine it, it thus suffices to catch the rise between two consecutive steps. The number of the figure's sides is four times higher every step (the line segments are divided into thirds and one of them is doubled) and the new sides are three times shorter. The figure's area thus grows exactly by the equilateral triangles glued to each side (so there is the same number of them as of the sides). In the first iteration (when creating the six-pointed star from the original triangle), the area grows by the three equilateral triangles with sides of length 1/3 (one third of the original sides' length). Let us denote the area of the original equilateral triangle by So. If we realize that shortening an equilateral triangle's sides three times makes its area decrease nine times, we get 5*o + 3 for the area of the six-pointed star. Similarly, in the next step we obtain the area of the figure as So 9 ' So + 3 ■ ^ + 4 ■ 3 Now it is easy to deduce that the area of the resulting snowflake equals the limit lim (So + 3 ■ + 4 ■ 3 ■ §■ + ■ ■ ■ -J- ^ ■ ^ ■ So lim (1 + | + | ■ I + ■ ■ ■ + 77—s-oo v ° ° a J S0 [l + ± lim (1 + | + • • • + L ° n—s-oo v a + 4™ ■ 3 ■ 1 (4\r K9) So 1 + 3 J™ = S0 fc=0 :i)k - 5S0- ' k=0 So [l + § ■ TTT The snowflake's area is thus equal to 8/5 of the area of the original triangle, i. e. k is much smaller than 1 (and we can get k as close to q as necessary). On the other hand, the proof of the alternating series test shows that the convergence rate is at least as fast as the convergence of the terms an. 5.4.8. Power series. If we consider not a sequence of num-... bers an, but rather a sequence of functions fn(x) sharing the same domain A, we can use the definition of addition of ■u//' ■ series "point-wise", thereby obtaining the concept of the series of functions S{x) = YJfn{x) 71=0 Power series A power series is a series of functions given by the expression S(x) = *nxn 71 = 0 with coefficients an e C, n = 0,1,.... S(x) has the radius of convergence p > 0 if and only if S(x) converges for every x satisfying \x\ < p and does not converge for |x| > p. 5.4.9. Properties of power series. Although a significant part of the proof of the following theorem is postponed until the end of the following chapter, formulation of the basic properties of the power series can be considered now. Notice that the upper limit r = limsup^^ \/\an\ equals the limit linijj^oo \J\an\, whenever this limit exists. Actually, we should also notice that our argument on the converengence of power series works exactly the same way for complex values of z and even now, the reader may enjoy a direct simple proof of all the claimed properties in the complex setting in 9.4.2 on page 676 in Chapter 9. Convergence and differentiation Theorem. Let S(x) = X^o anXn be a power series and let r = lim sup \J | an |. 77—>-00 Then the radius of convergence of the series S is p = r_1. Tfte power series S(x) converges absolutely on the whole interval of convergence and is continuous on it (including the boundary points, supposing it is convergent there). Moreover, the derivative exists on this interval, and ■Sr _ 8 %/3 _ 2 y/3 s(x) = j2'> Proof. To verify the absolute convergence of the series, use the root test from theorem 5.4.5(3), for every value of x. 335 CHAPTER 5. ESTABLISHING THE ZOO Let us notice that this snowflake is an example of an infinitely long curve which encloses a finite area. □ 5.H.3. Show that the so-called harmonic series Calculate (if the limit exists) oo 1 diverges. Solution. For any natural number k, the sum of the first 2k terms of this series is greater than k/2: 1 11 1111 ,1+2,+ ,3 + 4,+,5 + 6 + 7 + 8,+ ->i >i_i_i=i >i + i + i_i_i = i as the sum of the terms from 2l + 1 to 2l+1 is always greater than 2(-times (its number) l/2( (the least one of them), which sums to 1/2. □ 5.H.4. Determine whether the following series converge, or diverge: oo i) E v n=l oo n=l oo iii) E ^ 100000 n=l oo iv) E 71=1 Solution. (i+j i) We will examine the convergence by the ratio test: 2(ti + 1) lim v ' = 2 > 1, ^77+1 2"+1 lim = lim n+1 271 77—SOO 77—SOO n so the series diverges. ii) We will bound the series from below: we know that ^ < -J= for any natural number n. Thus the sequence of the partial sums sn of the examined series and the sequence of the partial sums s'n of the harmonic series satisfy: 77 ^ n 1 s77=—= >y^— = s'n. i = l V 7 = 1 Since the harmonic series diverges (see the previous exercise), by definition, the sequence of its partial sums {sjj}^! diverges as well. Therefore the sequence of its partial sums {sn}^^ also diverges and so does the examined sequence. iii) This series is divergent since it is a multiple of the harmonic series. lim \/\anxn\ = r\x\. 77—SOO Either the series converges absolutely, or it does not converge if this limit is different from 1. It follows that it converges for \x\ < p and diverges for |x| > p. If the limit does not exist, use the upper limit in the same way. The statements about continuity and the derivatives are proved later in a more general context, see 6.3.7-6.3.9. □ 5.4.10. Remarks. If the coefficients of the series increase ' rapidly enough, (for example an = nn), then r = oo. Then the radius of convergence is zero, and the series converges only at x = 0. Here are some examples of convergence of power series (including the boundary points of the corresponding interval): Consider S(x) = Y- T(x) OO - 77 = 0 The former example is the geometric series, which is already discussed. Its sum is, for every x, \x\ < 1, S(x) = l-x1 while |a;| > 1 guarantees that the series diverges. For x = 1, we obtain the series 1 + 1 + 1 + ..., which is divergent. For x = —1, the series is 1 — 1 + 1 — ..., whose partial sums do not have a limit. The series oscillates. Theorem 5.4.5(2) shows that the radius of convergence of the series T(x) is 1 because 77+1 lim 77+1' lim 71 + 1 1 k + - For x = 1, the series 1 + \ + \ + ..., is divergent: By summing up the 2k~1 adjacent terms l/2fe_1,..., l/(2fe -1) and replacing each of them by 2~k (thus they total up to 1/2), the partial sums are bounded from below by the sum of these 1/2's. Since the bound from below diverges to infinity, so does the original series. On the other hand, the series T (—1) = — 1 + -converges although of course, it cannot converge absolutely. Of course, this is true since we deal here with an alternating series. Notice that the convergence of a power series is relatively fast near x = 0. It is slower near the boundary of the convergence interval. 5.4.11. Trigonometric functions. Another important obser- plane vation is that a power series is a series of numbers for each fixed x and the individual terms make sense for complex numbers x e C. Thus the domain of convergence of a power series is always a disc in the complex 2 centered at the origin. 336 CHAPTER 5. ESTABLISHING THE ZOO iv) The examined series is geometric, with common ratio Y^r. Such a sequence is convergent if and only if the absolute value of the common ratio is less than one. We know that V2 More generally, we can write power functions centered at an arbitrary (complex) point x0, 77 = 0 1 1 ■ 1 1 . 2 ~ 2* 1 1 4 + 4 1+j '2' '2 21 V4 4 2 hence the series converges, and we are even able to calculate it: ~ 1 1 1+2 v1 < ^ which converge absolutely again on the disc of radius p, l □ 5.H.5. Calculate the series (a) E Br /tt+1 (b) E £; oo (c) E (3^ + jL n=l 00 (d) E 71=1 (ß) E (3n+l)(J 77=0 (3t7+1)(3t7+4) 1 Solution. The case (a). From the definition, the series is equal to 00 E 71=1 1 /Ti+1 lim 77—SOO = lim 77—soo = 1. 1 vT %/2 + 1 %/2 1 _l-----1_ 1 i+l-TS + TS )+■ + 1 + s/u /77+1 1 /77+T The case (b). Apparently, this sequence is a quintuple of the standard geometric series with the common ratio q = 1/3, hence E £ = 5E = 5 • l IS 2 ' 77 = 0 77 = 0 The case (c). We have that (with the substitution m = n-1) OO OO 2^T + 427r) — I (42^-2 ) + Ye ^ ^ _ E 77=1 (I 4 Z^ U2"-2, 77=1 OO OO + Tg) E ~ ig (to) 777 = 0 777 = 0 16 V42"-2/ 77 = 1 14 16 14 15 ' The series of linear combinations was expressed as a linear combination of series (to be more precise, as a sum of series with factoring out the constants), which is a valid modification supposing the obtained series are absolutely convergent. The case (d). From the partial sum s - I + -2- + -2- + . 6™ — 3 + 32 + 33 + we immediately get that + neN, p lim supn_ , but this time centered at x0. Earlier we proved explicitly (by a simple application of the ratio test) that the exponential function series converges everywhere. Thus this defines a function for all complex numbers x. Its values are the limits of values of (complex) polynomials with real coefficients and each polynomial is completely determined by finitely many of its values. In particular, the values of each series are completely determined on the complex domain by their values at the real input values x. Therefore, the complex exponential must also satisfy the usual formulae which we have already derived for the real values x. In particular, x e C ex+y =e*.ey^ see (2) and the theorem 5.4.4(3). Substitute the values x = it, where i e nary unit, ieE arbitrary. ' is the imagi- = 1 + it - 1 2 • 1 3 ■t - i—t3 + t4+i t»- 2" 3! The conjugate number to z = erf is the number z = e~rf. Hence \z\2 = z-z = eu-e-u=e° = l. All the values z = erf lie on the unit circle centered at the origin, in the complex plane. The real and imaginary parts of the points lying on the unit circle are named as the trigonometric functions cos 6 and sin 6, where 6 is the corresponding angle. Differentiating the parametric description of the points of the circle t i-> erf, gives the vectors of "velocities" which are easily computed. Differentiating the real and imaginary parts separately (assuming that the real power series can be differentiated term by term) gives : (euy = (1 - -t2 + V. ' 2 4! y + i(t --t3 +-t\ ' K 3! 5! 1 , 1 , 3! +5! .)+.(! 1 2 1 4 2 +4! which means (erf)' = i ■ elt. So the velocity vectors all have unit length. Hence the entire circle is parametrized if t is moved through the interval [0,27r], where 27r stands for the 337 CHAPTER 5. ESTABLISHING THE ZOO sn _ _i_ _i_ _2_ _i_ _i_ n — 1 I 3 ~~ 32 + 33 "1 h 3" + " 3„+i , neN. Therefore, in g Since lim 3 + 32 + 33 + + 3" 3„+i, neN. 3„+i 0, we get that 2 ^ \3) k=l 3 _L 2 I 1- = | lim J2^ = ll The case (e). It suffices to use the form (this is the so-called partial fraction decomposition) (3n+iK3n+4) = 3 ' 3nTT ~~ 3 ' 57^4' e N U {0}, which gives oo S (3t7+1)(3t7+4) lim i + + 77=0 I , I _ I , I 4^4 7^7 10T ^ 3n+l 3n+4 = lim 1(1-^) =i 5.H.6. Verify that oo oo J2 TP < J2 2^' 77=1 77 = 0 Solution. We can immediately see that 1 < 1 □ 22 ^ 32 22 A2 + + h + < 4- l 2' 1 " 4' or, in general: _l-----1_ _ _1_ ^ q77 1 _ (2^J2 -I T (2"+i-l)2 ^ z ' (2"TTJ2 — 2" Hence (by comparing the terms of both of the series) we get the wanted inequality, from which, by the way, it follows that the series Y^n°=i ^? converges absolutely. Eventually, let us specify that 00 2 00 £ tT2 = IT < 2 = J2 2^- 77=1 77 = 0 4r, neN. □ 5.H.7. Examine convergence of the series 77=1 77+1 Solution. Let us try to add up the terms of this series. We have that 00 £ln=±i= (mf + m§+mf + ...+m=±!) hm In 2-3-4--_(n+i) „ 1-2-3---77 lim In (n + 1) = +00. Thus the series diverges to +00. □ length of the circle (a thorough definition of the length of a curve needs integral calculus, which we will develop in the next chapter). In particular, this procedure of parameterizing the circle can be used to define the number it, also called Archimedes' constant or the Ludolphian number,15 half the length of the unit circle in the Euclidean plane R2. It can be found by computing the first positive zero point of the imaginary part of erf. For example, use the 10th order approximation cosi ~ 1 - (l/2)i2 + (l/24)i4 - (1/720)/6 + (1/40320)/8 -(1/3628800)/10. Ask Maple to find the first positive root. The result is tt ~ 3.14159172323226, for which the first 5 decimal points are correct. Compare the result 3.184900868 from the approximation of order 4. The explicit representation of trigonometric functions in terms of the power series is now apparent: cos t = Re eJ 1-h 2 + (-1)* 2 + 4Tf4- ■t2k + 1 (2*0 sini = line1' = t- -^t ... + (-!)* + 5!^ 7! t7+ 1 (2k+ 1) ,t2k+l + The following diagram illustrates the convergence of the series for the cosine function. It is the graph of the corresponding polynomial of degree 68. Drawing partial sums shows that the approximation near zero is very good. As the order increases, the approximation is better further away from the origin as well. The well-known formula eu e~lt = sin2 t + cos2 t = 1 15This number describes the ratio of the circumference to the diameter of an (arbitrary) circle. It was known to the Babylonians and the Greeks in ancient times. The term Ludolphian number is derived from the name of German mathematician Ludolph van Ceulen of the 16th century, who produced 35 digits of the decimal expansion of the number, using the method of inscribed and circumscribed regular polygons, invented by Archimedes. 338 CHAPTER 5. ESTABLISHING THE ZOO 5.H.8. Prove that the series E arctg 77 = 0 do not converge. Solution. Since 77^+277+3^+4 . 77+1 ' lim arctg 772+277 + 3^/77 + 4 _ 77+1 \ 3" + l 773+772—77 lim arctg ^- and 77—SOO n 'n n 77—SOO n lim the necessary condition lim an = 0 for the series 77—SOO J2n°=n0 an to converge does not hold in either case. □ 5.H.9. What is the series oo y 1 n 77 = 2 Solution. From the inequalities (consider the graph of the natural logarithm) 1 < Inn < n, ii > 3, n £ N, it follows that \/T < Vinn < tfn, n > 3, n G N. By the squeeze theorem, lim \/Inn = 1, i. e. lim = 1. lim -A= 77-s-oo Vinn Thus the series does not converge. As its terms are non-negative, it must diverge to +oo. □ 5.H.10. Determine whether the series (a) E (n+!)-3"' 77 = 0 OO „ (b) E n=l oo rc) V _-_ V 7 77 —In 77 77=1 converge. Solution. All of the three enlisted series consist of non-negative terms only, so the series is either finite (i. e. converges), or diverges to +oo. We have that oo oo (a) E < E (§)" = T^x < +oo; 71 = 0 V ' 71 = 0 3 OO 2 OO „ OO (b) E ^ > E % = E i = +oo; 77 = 1 OO. (c) E > E 1 = + V ' t-^ 77 —In 77 - t-^ 77 77=1 77=1 Hence it follows that the series (a) converges; (b) diverges to +oo; (c) diverges to +oo. □ More interesting exercises concerning series can be found at page 355. is immediate. From the derivative (erf)' = ielt it follows that (siní)' = cos í, (cosi)' = — siní by considering real and imaginary parts. Let io denote the smallest positive number for which e~lt° = — elt°. t0 is the first positive zero point of the function cosi. According to the definition of it, t0 = \n. Squaring yields eJ (e" So 7t is a zero point of the function sin t. Of course, for any t, pi{4kt0+t) _ ^giio-)4fc _ pit ■ e" = 1 • e" Therefore, both trigonometric functions sin and cos are periodic, with period 2ir. This is their prime period. Now the usual formulae connecting the trigonometric functions are easily derived. For illustration, we introduce some of them. First, the definition says that (1) (2) cos í - siní : i(e*ť + e-ť) 2i (e* Thus the product of these functions can be expressed as sin t cos t = — (eu - e~u) (eu + e~u) Ai I( 72t — sin 2t. 2 Further, by utilizing our knowledge of derivatives: cos2i = (—sin2i)' = (sin t cos ť)' The properties of other trigonometric functions siní cos t' can easily be derived from their definitions and the formulae for derivatives. The graphs of the functions sine, cosine, tangent, and cotangent are displayed on the diagrams (they are the red one and the green one on the left, and the red one and the green one on the right, respectively): taní = -, cot í = (taní) Cyclometric functions are the functions inverse to trigonometric functions. Since the trigonometric functions all have period 27r, their inverses can be defined only inside one period, and further, only on the part where the given function is either increasing or decreasing. Two inverse trigonometric functions are arcsin = sin-1 with domain [—1,1] and range [—tt/2, it/2] and arccos = cos-1 339 CHAPTER 5. ESTABLISHING THE ZOO I. Power series In the previous chapter, we examined whether it makes sense to assign a value to a sum of infinitely many numbers. Now we will turn our attention to the problem what sense the sum of infinitely many functions may have. 5.1.1. Determine the radius of convergence of the following power series: oo i) E 2i*n 71=1 OO ii) E 71=1 Solution. (l+i) i) From we get that lim sup Thus the power series converges exactly for the real numbers x e (—§,§) (alternatively, the complex numbers |a; | < |). Let us notice that the series diverges for a; = \ (it is harmonic), but on the other hand, it converges for x = — \ (alternating harmonic series). To determine the convergence for any x lying in the complex plane on the circle of radius \ is a much harder question which goes beyond our lectures. ii) r = lim sup f/ 1 i = lim sup n—too V (i+O" l + i V2 2 ' □ 5.1.2. Determine the radius r of convergence of the power series ~ (-ir^_ „. (a) E ^ 71=1 OO (b) E Mnfa;™; 71=1 OO 2 (0 E (! + £)" 71=1 oo 5 (d) E (2-H-i)-)" 71=1 Solution. It holds that (a) lim \/| an | lim -,L-= I; (b) lim \/| an | = lim An — +00; n—¥oo n^oo (c) lim \/| a77 = lim (l + 7)™ = e; with domain [—1,1] and range [0, ir]. See the left-hand illustration.. V --— —___ 3 \ \ \ J v -- ' —'--- 5 H The remaining functions are (displayed in the diagram on the right) arctan = tan-1 with domain R and range (—tt/2, tt/2), and finally arccot = cot-1 with domain R and range (0, it). The hyperbolic functions are also of some importance. Two basic ones are sinha; = ^(ex — e x), cosha; = ^(ex +e x). The name indicates that they should have something in common with a hyperbola. From the definition, (cosha;)2 - (sinha;)2 = 2^(ex e~x) = 1. The points [cosh t, sinhí] G R2 parametrically describe a hyperbola in the plane. For hyperbolic functions, one can easily derive identities similar to the ones for trigonometric functions. By substituting into (1) and (2), one can obtain for example cosha; = cos(ia;), i sinha; = sin(ia;). 5.4.12. Notes. (1) If a power series S(x) is expressed with the variable x moved by a constant offset x0, we arrive at the function T(x) = S(x — x0). If p is the radius of convergence of S, then T will be ^ well-defined on the interval (a;0 — p, x0 + p). We say that T is a power series centered at x0. The power series can be defined in the following way: OO 77 = 0 where x0 is an arbitrary fixed real number. All of the previous reasonings are still valid. It is only necessary to be aware of the fact that they relate to the point x0. Especially, such a power series converges on the interval (xq —p, xq +p), where p is its radius of convergence. Further, if a power series y = T (x) has its values in an interval where a power series S(y) is well-defined, then the values of the function S o T are also described by a power series which can be obtained by formal substitution of y = T (x) foryinto%). (2) As soon as a power series with a suitable center is available, the coefficients of the power series for inverse functions can be calculated. We do not introduce a list of formulae 340 CHAPTER 5. ESTABLISHING THE ZOO (d) lim sup \/\ an = Um sup 2+(-iy 2+(-l) lim sup „\, ;N„ = 1. Therefore, the radius of convergence is (a) r = 8, (b) r = 0, (c) r = l/e, (d) r = 1. □ E„in \M3+re-3" / _ n\n 5.1.3. Calculate the radius r of convergence of the power series pin ^n3 +77-3" 77=1 Solution. The radius of convergence of any power series does not change if we move its center or alter its coefficients while keeping their absolute values. Therefore, let us determine the radius of convergence of the series E\/773+77-3" 77 , ^774 + 2773+l-7r™ 77 = 1 Since lim \/ri°" = ( lim \/n) =1 for a > 0, 77—>-00 \77—>-00 / we can move to the series oo {—* 7Vn 77=1 with the same radius of convergence r = tt/3. □ 5.1.4. Give an example of a power series centered at the origin which, on the interval (—3,3), determines the function i Solution. As x2-x-12 0-4)0+3) I (_i___L_ 7 V x-i x+3 and 1___!_ x-4 1-5 x+3 l-(-f) we get 3 V1 3 + 32 H h ' 3" + • l___i_ x_2___i_ x2-x-12 — 28 2-i 4" 21 A 77 = 0 77 = 0 '(-D"+1 1 " = E 77 = 0 21-3" 28-4" 5.1.5. Approximate the number sinl° with error thanlO"10. □ less Solution. We know that 1 ^3 , 1 „.5 1 sin x = x — gj- a; + gj- a; a;' + 2-1 (277 „277+1 77=0 (277 + 1) x e. here. It is easily obtained in Maple, for instance, by the procedure "series". For illustration, here are two examples: Begin with ex = 1 + x + -x2 + -x3 + —x4 +____ 2 6 24 Since e° = 1, we search for a power series centered at x = 1 for the inverse function In x. So assume In a; = ao+ax(x—l)+<22(a;—l)2+<23(a;—l)3+<24(a;—1)4+... Apply the equality x = elnx, regroup the coefficients by the powers of x and substitute. The result is: ( 1 2 1 3 1 4 x = a0 + ax i x + -x + -x + —x + . + a2 { x + ^x2 + . .. j + a3(x + ^x2 + . + ■ a0 + axx + [ ^ai + a2jx2 + (^ai + a2 + a3)x3 1 1 2 + — fli + —I— \a2-\— a3 + aA \x + \24 V46/2 Comparing the coefficients of the corresponding powers on both sides, gives 1 1 1 a0 = 0, ax = 1, a2 = --, a3 = -, a4 = --,---- This corresponds to the valid expression (to be verified later): Inz^^—(z-1)". Similarly, we can begin with the series 1 3 1 5 1 7 and the (unknown so far) power series for its inverse centered at zero (since sin 0 = 0) arcsini = ag + a\t + a2t2 + a3*3 + axtA + .. .. Substitution gives 1, 1 t = a0 + ax [t - —ť + —ť + ... ) + 3! 5! a2{ t - \t3 + i-í5 + 1 3! 5! + = a0 + ait + a2t + ( — -ax + a3 ] Í3+ --a2 + a4 ]i" + ( -^ax - ^a3 + a5 )ť + 1 3 3 , arcsmí = t + -t3 H--ŕ + . hence 6 40 (3) Notice that if it is assumed that the function ex can be expressed as a power series centered at zero, and that power series can be differentiated term by term, then the differential equation for the coefficients an is easily obtained since (xn+1)' = (n + l)xn. Therefore, from the condition that 341 CHAPTER 5. ESTABLISHING THE ZOO Substituting x = 7r/180 gives us that the partial sums on the the exponential function has its derivative equal to its value right side will approximate sin 1°. It remains to determine at every point, the sufficient number of terms to add up in order to provably an+i = ^pjan, ao = 1 get the error below 10"10. The series and hence it is clear that an = ^. _k___1_ (_7t_\3 I J_ (-e-\5 _ J. (-z—\7 _1_ 180 3! k 180/ + 5! V180/ 7! 1180/ ' '' 2-i (2n+l)! 1180/ 71 = 0 is alternating with the property that the sequence of the absolute values of its terms is decreasing. If we replace any such convergent series with its partial sum, the error we thus make will be less than the absolute value of the first term not included in the partial sum. (We do not give a proof of this theorem.) The error of the approximation sin 1° « T 1803-3! is thus less than < in-10 1805-5! ^ iU S vnw+in (x - 2)™ converges. O □ 5.1.6. Determine the radius r of convergence of the power oo 2rl series E w^' O 77 = 0 ^ '' 5.1.7. Calculate the radius of convergence for J2n°=i 2^ x71- o 5.1.8. Without calculation determine the radius of conver- oo gence of the power series n-3"-1 1:71-1 ■ O 77=1 5.1.9. Find the domain of convergence of the power series oo _ E^1^- O 77=1 5.1.10. Determine for which x e K the power series (-3) 71=1 5.1.11. Is the radius of convergence of the power series oo oo 71 = 0 71=1 common to all sequences {an}^=0 of real numbers? O 5.1.12. Decide whether the following implications hold: (a) If the limit lim 3\/aJ exists and is finite, then the power m-oo series oo J2 an(x - x0)n 77=1 converges absolutely at at least two distinct points x. (b) Conditional convergence of series J2n°=i a«> J2n°=i °n implies that the series J2n°=i(®an ~ converges as well. (c) If a series J2n°=o a™ satisfies 342 CHAPTER 5. ESTABLISHING THE ZOO lim an = 0, 71—>-00 then it is convergent, (d) If a series J2n°=i an converges, then the series y ^ 77 77=1 converges absolutely. o 5.1.13. Approximate cos with error less than 10~5. O 5.1.14. For the convergent series £ n \AT+100' 77 = 0 bound the error of its approximation by the partial sum sg ggg. o 5.1.15. Express the function y = ex, denned on the whole real line, as an infinite polynomial whose terms are of the form an(x — \)n. Then express the function y = 2X denned on R as an infinite polynomial with terms anxn. O 5.1.16. Find the function / to which, for 168, the sequence of functions converges. Is this convergence uniform on R? O 5.1.17. Does the series oo E kde 77=1 converge uniformly on the real line? Q 5. 5.1.18. Approximate (a) the cosine of ten degrees with accuracy of at least 10" (b) the definite integral J^2 -^-^ with accuracy of at least 10"3. o 5.1.19. Determine the power series centered at x0 = 0 of the function X f(x) = J e*2 dt, i6R. o O 5.1.20. Using the integral test, find the values a > 0 for which the series oo E 4r 77=1 converges. O 5.1.21. Determine for which i£l the series oo 1 V_-_x3n ^ 2n ■ n ■ ln(n) i=l v ' converges. O 343 CHAPTER 5. ESTABLISHING THE ZOO 5.1.22. Determine all x G R for which the power series oo 2n J2 is convergent. O i=l 5.1.23. For which i£l does the series y~ ln(n!) 77=1 converge? O 5.1.24. Determine whether the series oo J2 (-l)n_1tan -±= 77=1 converges absolutely, converges conditionally, diverges to +oo, diverges to —oo, or none of the above, (such a series is sometimes said to be oscillating). O 5.1.25. Calculate the series oo 77=1 with the help of an appropriate power series. O 5.1.26. For a; G (-1,1), add x - Ax2 + 9x3 - 16x4 H---- 1 ™3n+l 2^ 2"-n! X o o 51.27. Supposing \ x\ < 1, determine the series oo (a) £ n=l oo (b) J2 n2xn-x. 77=1 5.1.28. Calculate oo E277-1 (-2)—1 n=l using the power series oo J2 (-1)™ (2n + l)x2n 77=0 for some x G (—1,1). O 5.1.29. For 168, calculate the series 2"-n! 77=0 o 344 CHAPTER 5. ESTABLISHING THE ZOO J. Additional exercises for the whole chapter 5.J.I. Determine a polynomial P(x) of the least degree possible satisfying the conditions P(l) = 1, P(2) = 28, P(0) = 2, P'(0) = 1,P'(1) = 9. O 5.J.2. Determine a polynomial P(x) of the least degree possible satisfying the conditions P(0) = 0, P(l) = 4, P(—1) = —2, P'(0) = 1,P'(1) = 7. O 5.J.3. Determine apolynomialP(a;) of the least degree possible satisfying the conditions P(0) = —1, P(l) = —1, P'(—1) = 10,P'(0) = -1,P'(1) = 6. O 5.J.4. From the definition of a limit, prove that lim (x3 - 2) = -2. x—y — oo 5.J.7. Determine both one-sided limits lim arctanf—), lim arctanf—). x^0+ X x^O- X Knowing the result, decide the existence of the limit lim arctan (—). x^O X 5.J.8. Do the following limits exist? sin a; 5a;4 + 1 lim —lim- x-s-0 X6 ai-t-0 X 5.J.9. Calculate the limit ,. tan x — sin x lim — 5.J.10. Determine 2 sin3 x + 7 sin2 x + 2 sin x — 3 lim -5-^-• x-s-ir/6 2 sin a; + 3 sin x — 8 sin a; + 3 5.J.11. For any m, n G N, determine a;m - 1 lim -. x-s-l Xn — 1 o 5.J.5. From the definition of a limit, determine lim (1 + ^2-3, X^r-l 2 i. e. write the S(e)-formula as in the previous exercise. O 5.J. 6. From the definition of a limit, show that ,. 3(a;-2)4 lim--- = +00. o o o o o o 345 CHAPTER 5. ESTABLISHING THE ZOO 5.J.12. Calculate lim ( v x2 + x — : x—>-+oo V 5.J.13. Determine lim ( x \J\ + x2 — x2 ) x—>+oo V / 5.J.14. Calculate \/2 — \/l + cos x lim 5.J.15. Determine sin (Ax) lim _ a^o y+Tl - 1 5.J.16. Calculate .. VI + tana; — y/1 — tana; lim -. x-s-o- sin a; 5.J.17. Calculate 2X + Vl + x2 - a;9 - 7a;5 + 44a;2 lim - =-. M-oo 3a _|_ ^qx6 + j.2 _ 18a,5 _ 592^ 5.J.18. Let lim-r^-oo /(a;) = 0. Is it true that limx^_00(/(a;) ■ g(x)) = 0 for every increasing function g o 5.J.19. Determine the limit / x 2n-l ,. / n > lim n + 5 5J.20. Calculate sin x — x lim -5-. 5 J.21. For a; > e, determine the sign of the derivative of the function In 2 1+1 f(x) = arctan (^t~x) 5.J.22. Determine all local externa of the function y = x In2 x 346 o o o o o o re? o o o CHAPTER 5. ESTABLISHING THE ZOO denned on the interval (0, +00). O 5 J.23. Is there a real number a such that the function j(x) = ax + sin x has a global minimum on the interval [0,2ir] ata;0 = 57r/4? O 5 J.24. Find the absolute minimum of the function y = ea: — In 2, x > 0 on its domain. O 5.J.25. Determine the maximum value of the function y=^3xe~x, xeR. o 5 J.26. Find the absolute externa of the polynomial p(x) = x3 — 3a; + 2 on the interval [—3,2]. O 5 J.27. Let a moving object's position in time be given as follows: s(t) = -(i-3)2 + 16, ££[0,7], where t is the time in seconds, and the position s is measured in meters. Determine (a) the initial (i. e. at the time t = 0 s) velocity of the object; (b) the time and position at which its velocity is zero; (c) its velocity and acceleration at time t = 4 s. Note. The object's velocity is the derivative of its position and acceleration is the derivative of its velocity. O 5.J.28. From the definition of a derivative /' of a function / at the point x0, calculate /' for j(x) = ^fx at any point x0 > 0. o 5 J.29. Determine whether the derivative of the function f(x) =xarctan(i), i£8\{0), /(0) = 0 exists at 0. O 5.J.30. Does the derivative of the function y = sin ^arctan ^| 12a;21 + 11 | • e _^^i2 + sin(sin(sin(sina;))), x £ R at the point x0 = it3 + 3*" exist? O 5.J.31. Determine whether the derivative of the function f(x) = (x2-l)sin(^-1), x^-l(xeR), /(-1) = 0 at the point x0 = — 1 exists. O 5 J.32. Give an example of a function / : R —> R which is continuous on the whole real axis, but does not have derivatives at the points x\ = 5, x2 = 9. O 5.J.33. Find functions / and g which are not differentiable anywhere, yet their composition / o g is differentiable everywhere on the real line. O 5 J.34. Using basic formulae, calculate the derivative of the function (a) y = (2 — a;2) cos x + 2x sin x, x £ R; (b) y = sin (sin x), x £ R; (c) y = sin (in (a;3 + 2x)) , x £ (0, +00); (d) y = i±f^, a; £ R. 347 CHAPTER 5. ESTABLISHING THE ZOO 5.J.35. Determine the derivative of the function (a) y = \JxsJx y/x, x e (0, +00); (b) y = In I tan § I , \ {nir; n e Z}. 5.J.36. Write down the derivative of the function y = sin (sin (sina;)) , x £ 5.J.37. For the function j(x) = arccos (^-#) + cvTTä? + ex (a;2 - 2x + 2) 5J.40. Calculate/'(l) if f(x) = (x-l)(x-2)2(x-3)3, xe 5.].41. Determine the derivative of the function I a; I ^ 1 (a; e : 5 J.42. Differentiate (with respect to the real variable x) O o o V2 with maximum possible domain, calculate /' on the largest subset of R where this derivative exists. Q 5.J.38. At any point x {nu; n e Z}, determine the first derivative of the function y = \/sJax. O 5.J.39. For x e R, differentiate O o o x In2 (x + Vl + x2) - 2Vl + a;2 In (a; + Vl + a;2) + 2a; at all points where the derivative exists. Simplify the obtained expression. O 5 J.43. Determine /' on a maximal set if j(x) = \ogx e. O 5 J.44. Express the derivative of the product of four functions if(x)g(x)h(x)k(x)]' as a sum of products of their derivatives and themselves, supposing all of these functions are differentiable. O 5.J.45. Determine the derivative of the function y — (x+3)2 for x > 0. O 348 CHAPTER 5. ESTABLISHING THE ZOO 5.J.46. The rainbow. Why is the rainbow circular? Solution. In the exercise called Snell's law we explained what causes a rainbow. It is created by sunlight being refracted while entering a droplet of water. We continue with this problem, examining how the rays behave when going through the droplets. (See the illustration.) The ray dropping onto a droplet's surface at the point A "splits". Some part of the light reflects (at the angle ipi from the normal line) and the other part refracts inside the droplet at the marked angle ipr. The ray, inside the droplet, reflects off the droplet's surface at the point B. Since | OA | = | OB , the angle of reflection equals ipr. Part of the light refracts out of the droplet. The reflected ray then meets the droplet's surface again at the point C and refracts towards the observer at the angle ipi from the normal line. We omit the case of the secondary rainbow arc, which occurs when the ray reflects twice inside the droplet before refracting out of it. Write a := LAIC. Since IOAI = and IOAB = ipr, it follows that IBAI = ipt - yr. Then ABIA = 7r - (ZABI) - {ABAI) = tt - (tt - 0. Thus it makes sense to analyze the function (see (1) and (2)) a(y) =4arcsin(^) -2arcsin(^), y G [0,R]. Select an appropriate unit of length so that R = l. Consider the function a(x) = 4arcsin(^) — 2arcsina;, i£ [0,1]. The derivative is "'(*) = ~i=t - 77i=f' ze(0,i) The equation a'(x) =0 has a unique solution *o = y^e(0,l), if n2G(l,4). Set 7i = 4/3 (which is approximately the refractive index of water). Further, a'(x)>0, xe(0,xo), a'(x)<0, x e (x0,l). At the point -o = y^F = !v/f = 0.86, 349 CHAPTER 5. ESTABLISHING THE ZOO the function a has a global maximum a(x0) = 4arcsin(^|) - 2arcsin(|^§) = 0.734 rad « 42 °. It follows that the peak of the rainbow cannot be above the level of approximately 42 ° with regard to the observer, the values Q(0.74) = 39.4°, q(0.94) = 39.2°, a(0.8) = 41.2 °, q(0.9) = 41.5 °. suggest (a is increasing on the interval [0, xq] and decreasing on the interval [xq, 1]) that more than 20 % of the values a lie in the band from around 39 ° to around 42 °, and that 10 % lie in a band thinner than 1 °. Furthermore, for Q(0.84) = 41.9 °, q(0.88) = 41.9 °, the rays for which a is close to 42 ° have the greatest intensity. This is an instance of the principle of minimum deviation: the highest concentration of diffused light happens at the rays with minimum deviation since the total angle deviation of the ray equals the angle S = it — a. The droplets from which the rays creating the rainbow for the observer come, lie on the surface of a cone having central angle equal to 2a(x0). The part of this cone which is above ground then appears as the rainbow arc to the observer (see the illustration). Thus when the sun is setting, the rainbow has the shape of a semicircle. The rainbow exists only with regard to its observer - it is not anchored in space. The circular shape of the rainbow was examined as early as 1635-1637 by René Descartes. □ 5.J.47. L'Hospital's pulley. A rope of length r is tied at one of its ends to the ceiling at a point A. A small pulley is attached to its other end. w-ct y A point B is also on the ceiling at distance d, d > r, from A. Another rope of length / > Vd2 + r2, is tied to B at one end, passes over the pulley, and has a weight is attached to its other end. Omit the mass and the size of the ropes and the pulley. In what position does the weight stabilize so that the system is in a stationary position? See the illustration. Solution. The system is in stationary position when its potential energy is minimized. This is when the distance between the weight and the ceiling is maximal. Let x be the distance between A and the point P on the ceiling vertically above the weight and the pulley. Then by the Pythagorean theorem, the distance between the pulley and the ceiling is yV2 — x2. Similarly, the distance between the weight and the pulley is / — \J(d — x)2 + r2 — x2. Hence if f(x) is the distance between the weight and the ceiling, then f(x) = \/r2 — x2 + I — \J(d — x)2 + r2 — x2. The state of the system is fully determined by the value x e [0, r] (see the illustration). So it suffices to find the global maximum of the function / on the interval [0, r]. The derivative is f(x\ = __-(d-x)-x = -x d x G ({) r) y > -Jr2-x2 ^J(d-x)2+r2-x2 Vr2-x2 ^ ^(d-xy+r2-x2 ' ' >' Square the equation f(x) = 0 to obtain 2 t2 x _ d r2—x2 {d—x)2-\-r2—x2' and hence 2dx3 - (2d2 + r2) x2 + d2r2 = 0, i£(0,r). One solution to this equation is x = d, hence the polynomial on the left side factors into (x - d) (2dx2 - r2x - dr2) , or 2d(x-d) (x - r2+rVrf+W^ (x - r2-rVr^+W^ ^ 350 CHAPTER 5. ESTABLISHING THE ZOO Hence the equation f'(x) = 0 has three solutions. The solution x = d is outside the interval [0, r] since d > r. The solution x = x1 = r ~rV^+ML js outside the interval [0, r] since x1 < 0. The solution r2 + ryV2 + 8d2 X = Xq = — Ad , x0 < only at x0. From the limits is positive, and furthermore, xq < j + ^ = r, since r < d. Since /' is continuous on the interval (0, r), it can change sign it follows that f(x) > 0, x £ (0, x0), /'(x) < 0, 2 £ (x0, r). Thus the function / has its global maximum on the interval [0, r] at x0. □ 5.J.48. A nameless mail company can only transport parcels whose length does not exceed 108 inches, and for which the N\^iv sum of the length and the maximal perimeter is at most 165 inches. Find the largest volume a parcel can have to be ^jVc transported by this company. Solution. Let M denote the length plus perimeter, p the perimeter, and x the parcel's length. Suppose the perimeter p is constant. The wanted parcel has a shape such that for any t £ (0, x), its cross section has constant perimeter p. (the maximal one). The parcel is to have the greatest volume so that the cross section of a given perimeter has the greatest area possible. The largest planar figure of a given perimeter is a disc. Thus the desired parcel has the shape of a cylinder with height equal to x and radius r = p/2n. Its volume is V = 7TT2x = Consequently p + x < M and x < 108. Thus we consider the parcel for which p + x = M. Its volume is V[x) = {M~Afx = -3-2^+M2x where xG (0,108]. Having calculated the derivative V'(x) = 3^-wm2 = 3(*-m(*-¥): XG (0,108), we find that the function V is increasing on the interval (0,55] = (0, M/3] and decreasing on the interval [55,108] = [M/3,min{108,M}]. The greatest volume is thus obtained for a; = M/3, where v(¥) = Wt = 0.011 789 M3 «i 0.8678 m3. If the company also required that the parcel have the shape of a rectangular cuboid (or more generally a right prism of a given number of faces), we can repeat the previous reasoning for a given cross section of area S without specifying what the cross section looks like. Necessarily S = kp2 for some k > 0 which is determined by the shape of the cross section. (If we change only the size of the sides of the polygon which is the cross section, then its perimeter will change by the same ratio. However, its area will change by the square of the ratio.) Thus the parcel's volume is the function V(x) =Sx = kp2x = k (M — xf x, xe (0,108]. The constant k does not affect the point of the global maximum of the function V, so the maximum is again at the point x = M/3. For instance, for the largest right prism having a square base, we have p = M — x = 2M/3, i. e. the length of the square's sides is a = Mj6 and the volume is then V = a2x = |^ = 0.009 259 M3 « 0.681 6 m3. For a parcel in the shape of a ball (when x is the diameter), the condition p + x < M can be expressed as irx + x < M, i. e. x < M/(tt + 1) < 108. Thus for x = M/(tt + 1), the maximal volume is V = (f )3 = 6(^+i)a = °-007370M'3 ~ °-5426m3- 351 CHAPTER 5. ESTABLISHING THE ZOO Similarly, for a parcel in the shape of a cube (when 2 is the length of the cube's edges), the condition p + x < M means x < M/5 < 108. Thus for x = M/5 the maximal volume V = x3 = (f-)3 = 0.008M3 « 0.5889 m3. The length of the edges of the cube which has the same volume as the cylinder is a = M= = 0.227 595 M « 0.953 849 m. Its length and perimeter sum to 5a = 1.138 M. This is more than the company's limit by around 14 %. □ 5.J.49. A large military area (denoted by MA) has the shape of a square, and has area of 100 km2. It is bounded along its perimeter by a narrow path. From the starting point in one corner of MA, a target point inside MA is situated 5 km along the (boundary) path and then 2 km perpendicularly to it. One can travel along the path at 5 kph, but directly through the MA at 3 kph. At what distance should you travel along the path if you wish to get there as soon as possible? Solution. To travel x km along the path (where x e [0,5]), x/5 hours is needed. One way through MA is then xj22 + (5 - x)2 = sjx2 - 102 + 29 kilometers long. This takes \Jx2 — IO2 + 29/3 hours. Altogether, the journey takes f(x) = \x + \s/x2 - 102 + 29 hours. The only zero point of the function J K > 5^3 \/x2-10x+29 is x = 7/2. Since the derivative /' exists at every point of the interval [0, 5] and since /(I) = f| v2, the optimal strategy is to row straight to the target point, which corresponds to 2 = /. The first derivative is and the second derivative is Solve the equation Squaring and rearranging gives t'(x) = * . - 2G(0,0 v ' v 1 v d -\-x v2 ' \ ' / t"(x) = —. d ie(o,i). t'(x) = 0, or x _ v_L Vd2+x2 ~ v2 ' 352 CHAPTER 5. ESTABLISHING THE ZOO If xq < I, then t(x) has a global minimum at xq on the interval [0, /] since limx^o+ t'(x) < 0, t'(l) > 0, and indeed t"(x) > 0 on the same interval. If x0 > I, then t'(x) < 0 on all of the interval [0, /] and so the global minimum of t(x) occurs at /. In the former case, the fastest journey (in hours) takes t (XQ) = Vd2+X0 + l-X0 = dVV2-Vl + J_ In the latter case, the fastest journey takes t (I) = hours. Note. An alternative simpler approach for doing the calculations is to use the variable 6 instead of the variable x where x = dtaxiO. The fastest journey occurs when sin 6 = v\jv2- This is a limiting case of Snell'slaw. □ 5.J.51. A company is looking for a rectangular patch of land with sides of lengths 5a and b. The company wants to enclose it with a fence and then split it into 5 equal parts (each being a rectangle with sides a, b) by further fences. For which values of a, b will the area S = 5ab of the patch be maximal if the total length of the used fences is 2 400 m? Solution. Reformulate the statement of the problem: Maximize 5ab while satisfying the condition (1) 6b + 10a = 2,400, a,b>0. The function , r 2,400-10a a H> 5a ——g- defined for a e [0,240] takes its maximal value at the point a = 120. Hence the result is a = 120 m, b = 200m. The value of b follows immediately from (1). □ 5.J.52. A rectangle is inscribed into an equilateral triangle with sides of length a so that one of its sides lies on one of the triangle's sides and the other two of the rectangle's vertices lie on the remaining sides of the triangle. What is the maximum possible area of the rectangle? 5.J.53. Determine the dimensions of an (open) swimming pool whose volume is 32 m3 and whose bottom has the shape of a square, so that one would use the least amount of paint possible to prime its bottom and its walls. O 5.J.54. Express 28 as a sum of two non-negative numbers such that the sum of the first summand squared and the second summand cubed is as small as possible. O 5.J.55. With the help of the first derivative, find the real number a > 0 for which the sum a + 1/a is minimal. Now solve this problem without using the differential calculus. O 5.J.56. Inscribe a rectangle with the greatest perimeter possible into a semidisc with radius r. Determine the rectangle's perimeter. O 5.J.57. Among the rectangles with perimeter 4c, find the one having the greatest area (if such one exists) and determine the lengths of its sides. O 5.J.58. Find the height h and the radius r of the largest (greatest volume) cone which fits into a ball of radius R. O 5.J.59. From all triangles with given perimeter p, select the one with the greatest area. O 353 CHAPTER 5. ESTABLISHING THE ZOO 5.J.60. A parabola is given by the equation 2x2 — 2y = 9. Find the points of the parabola which are closest to the origin. O 5.J.61. Your task is to create a one-litre tin having the "usual" shape of a cylinder so that the minimal amount of material would be used. Determine the proper ratio between its height h and radius r. O 5.J.62. Determine the distance from the point [3, —1] G R2 to the parabola y = x2 — x + I. O 5.J.63. Determine the distance of the point [—4, —2] e R2 from the parabola y = x2 + x + 1. O 5.J.64. At time t = 0, a car left the point A = [5, 0] at the speed of 4 units per second in the direction (—1, 0). At the same time, another car left B = [—2, —1] at the speed of 2 units per second in the direction (0,1). When will the cars be closest to each other, and what will the distance between them be at that moment? O 5.J.65. At the time t = 0, a car left the point A = [0,0] at 2 units per second in the direction (1,0). At the same time, another car left the point B = [1, —1] at 3 units per second in the direction (0,1). When will they be closest to each other and what will the distance be? O 5.J.66. If a cone has a base of radius r and height h, then its surface area (not including the base) is irrh and its volume is V = ^irr2h. Determine the maximum possible volume of a cone with total surface area (including the base 3ir cm2. O 5.J.67. Suppose you own an excess of funds without the possibility of investing outside your own factory. This acts as a regulated market with a nearly unlimited demand and a limited access to some key raw materials, which allows you to produce at most 10 000 products per day. You know that the raw profit p and the expenses e, as functions of a variable x which determines the average number of products per day, satisfy v(x) = 9x, n(x) = x3 — 6x2 + I5x, x e [0,10]. At what production will you profit the most from your factory? O 5.J.68. Determine 1' lim cot a; — x-s-0 V X Solution. Notice that lim cot a; = +oo, lim — = +oo, X-S-0+ X-S-0+ X lim cot a; = —oo, lim — = —oo, x—>0— x—>0— X we can see that both one-sided limits are of the type oo — oo. Thus we consider the (two-sided) limit. We write the cotangent function as the ratio of the cosine and the sine and convert the fractions to a common denominator. 1 \ , x cos x — sin x lim cot a;--= lim x-t-o \ x J x-t-o xsmx Thus we obtain an expression of the type 0/0 for which we get (by l'Hospital's rule) x cos x — sin x cos x — x sin x — cos x —x sin x lim-:- = lim-:- = lim x-s-o a; sin a; x-s-o sin x + x cos x x-s-o sin x + x cos x By one more use of l'Hospital's rule for the type 0/0, we get —x sin x — sin x — x cos x 0 — 0 lim- = lim- = - = 0. x-s-o sin x + x cos x x-s-o cos x + cos x — x sin x 1 + 1 — 0 □ 5 J.69. Determine the limit 7TX lim (1 — x) tan —. o 354 CHAPTER 5. ESTABLISHING THE ZOO 5.J.70. Calculate lim ( — — xtanx o 5.J.71. Using l'Hospital's rule, determine lim ((z*-2i\x\. x-s-+oo v v / / o 5 J.72. Calculate lim [---7,- I . x-n \ 2\nx x2 - 1 / O 5.J.73. By l'Hospital's rule, calculate the limit r ( 2V2 lim cos — x-s-+oo y x J o 5.J.74. Determine lim (1 — cos x-s-0 x)slnx O 5.J.75. Determine the following limits lim x1^, lim x1^, x—>0+ x—>-+oo where aeRis arbitrary. O 5.J.76. By any means, verify that lim- = 1. x-s-0 X o 5.J.77. By applying the ratio test (also called D'Alembert's criterion; see 5.4.5), determine whether the infinite series (a) E 2"-(n+l) 3" ' n=l oo (b) E S; 77=1 oo n (c) E Trhn 77=1 converge. Solution. Since (an > 0 for all n) (a) lim £-ti _ ljnl 2"+1.(77+2)3.3" _ 2(77+2)3 _ 27T3 _ 2 < 1 . (aj lim — — um o„+i.2„ , , ,n3 — lim „, , ,s3 — lim ^-3- — ^ <- i, 77—SOO 77—SOO ° ^ V'^-lj 77—SOO 0\'i^L! 77—SOO 0/t ° (b) lim = lim f • gl) = lim = 0 < 1; kj 77^00 a" 77^00 v(n+1)! 6/ 77^00 "+1 (C) Um £»±1 = lim (. ' ^) = lim 7~rW ' 1™ = lim 4 • lim (l + ±)" = 1 • e > 1, 77^00 a" 77^00 V(n+1)2'(™+1)! «" / „^oo 77^00 n 77^00 n 77^00 v nl the series (a) converges; (b) converges; (c) does not converge (it diverges to +00). □ 355 CHAPTER 5. ESTABLISHING THE ZOO 5.J.78. By applying the root test (Cauchy's criterion), determine whether or not the infinite series oo (a) E ln"(n+l) ' 77=1 oo (n±±Y (b) E (c) E arcsin™ f£ 71=1 converge. Solution. Consider the series with non-negative terms only, where (a) lim x^ä^ = lim . ,1 - = 0< 1; 77—soo 77—soo "'VT1; (mV lim (1+i)" (b) lim ^/ö~ = lim ^ " 7 = -^ = f < 1; (c) lim = lim aresin = aresin 0 = 0 < 1. 77—soo 77—soo So each of the above series converges. □ 5.J.79. Determine whether or not the series oo (a) £(-!)" ln(l + £); 77=1 (b) E 77=1 (C) E (6+(-l)")" 77=1 converge. Solution. The case (a). By l'Hospital's rule, m(l + -L) 1+j^ + , lim — jj* = lim —2X, 1 - = lim txj~ = 1, X—s+oo 2x x—s+oo x—s+oo l"l"2:c hence 0 77—s-oo an a it follows that the limit 356 CHAPTER 5. ESTABLISHING THE ZOO does not exist. Therefore, the series does not converge since a necessary condition for the convergence is not satisfied. The case (b). When applying the ratio (or root) test, the polynomials in the numerator or in the denominator do not affect the value of the considered limit. Consider the series oo 71=1 for which lim p^. = ± < 1. n^-oo I a" I 4 This means that the original series is also (absolutely) convergent. □ 5.J.81. Does the following series converge? oo £(-l)n+1arctan2j 77=1 Solution. The sequence {2/V3n}neN is decreasing and the function y = arctan x increasing (on the whole real axis). So the sequence {arctan (2/v/3n) }nGN is decreasing. Thus it is an alternating series such that the sequence of the absolute values of its terms is decreasing. Such an alternating series converges if and only if the sequence of its terms converges to zero (the Leibniz criterion), and this is satisfied: lim arctan —2= = arctan 0 = 0, i. e. lim ((—1)n+1 arctan —1= ] = 0. 77-S-OO vj77 77-S-OO v V377 j □ 5. J.82. Determine whether the series oo (a) e 77=1 OO ru\ cosQre) n=l converges absolutely, converges conditionally, or does not converge at all. Solution. The case (a). This series converges absolutely. For instance, oo oo oo e I ^ I < e & < e ^ = 2, 77=1 77 = 1 77 = 0 and the second inequality is already proven. The case (b). cos (im) = (—1)™, n G N. So it is an alternating series such that the sequence of the absolute values of its terms is decreasing. Therefore, from the limit lim -4=^ = 0 it follows that the series is convergent. On the other hand, OO i , i oo oo Ecos im v—v l ^ v—v l . 77=1 1 1 77 = 1 77=1 The series converges conditionally. □ 5.J.83. Calculate the series (a) e (A___L_ (b) e 71 = 0 OO (c) e + 4^ 77=1 (d) e 5^; 77=1 357 CHAPTER 5. ESTABLISHING THE ZOO (e) E -1 (3n+l)(3n+4) ' 71 = 0 Solution. The case (a). By the definition, OO / l-k___L_ n=l v lim (f i i Uf4.-4.H-"+^ n—¥oo The case (b). This is a convergent geometric series with the common ratio q= 1/3, hence E^ = 5E(|)B = 5-I^ = ^. n=0 n=0 3 The case (c). By substituting m = n—l, 00 00 00 E (42^-1 + I2™") = 4 E (42^2") + ig E (42^2") = 71=1 71=1 n=l 00 00 (3 , _2_\ 1 _ 14 \p (J-"!"1 — M 1 _ 14 U "T" 16/ ^ 42m 16 1,16/ 16 ' 1--L 15' m=0 m=0 16 The series of linear combinations was expressed as a linear combination of a series (to be more precise, as a sum of series factoring out the constants). This is a valid modification supposing the obtained series are absolutely convergent. The case (d). From the partial sum sn = k + w + w + ••• + £-, neN, we obtain Thus Since lim = 0, n—too ^ + ^ + ■■■ + ^+3^, neK § + ^ + ^ + ■■■+3^-3^, rieN. 12(^ = 1(^-1) = !- The case (e). It suffices to use partial fraction decomposition (3n+l)(3n+4) = 3" ' 3tT+1 ~ I ' 3t7+4' G N U {0}, which gives OO / V l _ l™ 1 i_I i i_l i i _J___L_ (3n+l)(3n+4) „"oo 3^X 4^4 7^7 10 ^ 3n+l 3n+4 = lim i 1 - _ l n^oo 3 V -1- 3n+4 y 3 ' □ 5.J.84. Verify that Solution. or the general bound E n2 < E 2" 71=1 71 = 0 1 — 22 + 32 ^ 22 — 2' 42 + 52 + 62 + 72 ^ ^ 42—4, (2™)2 1 1 (2"+!-l)2 ^ " (2™)2 — 2' < 2™ ■ t^Y2 = 4-, 71 G N. 358 CHAPTER 5. ESTABLISHING THE ZOO By comparing the terms of both series, we get the desired inequality. It follows that the series E^Li ^ *s absolutely convergent. Note that 00 i 2 00 1 77 = 0 5.J.85. Examine the convergence of the series Solution. Add the terms of this series. y ln-±I. 77 77=1 Vln2+1= lim (In I + In § + In \ + ■ ■ ■ + In 2±I) n—yoo 71=1 lim In 2-3-4---(n+i) lim ln (n + X) = +00_ „ x 1-2-3---77 „ , \ 1 / 77—>-00 77—>-00 do not converge. Solution. Since and 2^ arctan n+1 , 2^ n3+n2_n 71=0 71=1 lim arctan "2+2"+31v^+4 = lim arctan ^ = f 77—S-OO 77—S-OO 71 Z lim a3^^-^ = nm S = 5.J.87. Determine whether or not the series 00 ^ "Tin" 1 ° Win 7, ' 77^+1 . 3 7 77=1 □ Thus the series diverges to +00. □ 5.J.86. Prove that the series 77^+77^—77 the necessary condition lim an = 0 for the series V]°° an to converge does not hold. □ converges. Solution. From the inequalities (consider the graph of the natural logarithm) 1 < Inn < n, 71 > 3, 71 G N it follows that \/T < Vinn < tfn, 71 > 3, 71 G N. By the squeeze theorem (5.2.12), lim Ä/lnn = 1, i. e. lim -,1 = 1. 77—SOO 77—SOO Vln77 Thus the series is not convergent. Since all its terms are non-negative, it diverges to +00. □ 5.J.88. Determine whether or not the series 00 (a) £ (77+1V3™; 77 = 0 00 2 0» £ ^ (c) _1_ v ' 77 —In 77 77=1 converges. Solution. All of the three enlisted series consist of non-negative terms only. So the series either converge, or diverge to +00. 359 CHAPTER 5. ESTABLISHING THE ZOO (a) E in+k^ < E (.)" = jbi < +00; 77 = 0 77 = 0 (b) E > E £ = E i = +00; n=l n=l n=l 00 00 (c) V > V i = +00. v 7 77 — m 77 — 77 77=1 77=1 It follows that (a) converges; (b) diverges to +00; (c) diverges to +00. □ 5.J.89. Begin with a square with sides of length a > 0. Construct a sequence of squares, each of which has as vertices the midpoints of the preceding square. Determine the sum of the areas and the sum of the perimeters of all these (infinitely many) squares. O 5.J.90. Let a sequence of rows of semidiscs be given, such that for each n e N, the n-th row contains 2n semidiscs, each with radius of 2~n. What is the area of an arbitrary figure consisting of all these semidiscs, supposing the semicircles do not overlap? O 5.J.91. Solve the equation 1 - tana; + tan2 x - tan3 x + tan4 x - tan5 x H----= tan^2+1 ■ 5.J.92. Determine EG^t + ä o o 5.J.93. Calculate J2 Vn2 + 2n + l. 77=1 5.J.95. Calculate the series (a) E ^ 71=1 OO (b) E 5.J.96. Sum the series 1-3 ^ 3-5 ^ 5-7 ^ V _i_ 4^ (2t7-1)(2t7+1 5.J.97. Using the partial fraction decomposition, calculate E ^r- 77=2 O 5.J.94. Prove the convergence of the series and find its value. E 3"+2" o o o 360 CHAPTER 5. ESTABLISHING THE ZOO (b) E n=l O 5.J.98. Determine the value of the convergent series E 4n2-l' n=0 o 5.J.99. Calculate the series V 1 n2+3n' n=l O 5.J.100. In terms of * ■ 77 1 2^3 4^5 6^7 8 ^ ' 77=1 express the following two series (!-§-*) + (§-*-§) + ■■■; (! + *-*) + (* + *-*) + ■■■ (both the series contain the same elements as the first one, only in a different order). O 5 J.101. Determine whether the series ~ 2" + (-2)" 77 = 0 converges. O 5.J.102. Prove the following statement: If a series E^Lo a77 converges, then lim sin (3a„ + it) = 0. O 77—SOO oo _ oo oo 5 J.103. For which a G R; /3 G Z; 7 G R \ {0} do the series E ^Tp; E ^r1; E -^converge? 77=120 n 77=240 n 77=360 7 o 5 J.104. Determine whether the series ^ ^~|77 77° — 577°+277 2 77 = 21 converges absolutely, converges conditionally, or does not converge at all. O 5 J.105. Determine whether or not the limit lim (A + A + '-' + ^Sr) n—>-oo is finite. Notice that one cannot use the sums El _ 7T2 71—1 _ _|_~^ n2 — 6 ' 2^ n2 — "h°°-n=l 7i=2 O 5.J.106. Find all real numbers ^4 > 0 for which the series oo X;(-l)nln(l+A2n) 77 = 1 is convergent. O 5.J.107. Recall that the harmonic series diverges. 361 CHAPTER 5. ESTABLISHING THE ZOO Determine whether or not the series 1 ^ ^ 9 ^ 11 ^ T IS T 21 T T2ST ..._|_J__|_..._|_J_j--1--1- ... j--1--1--1--|_ . . . ^ 91 ^ T 99 T 111 T ^ 119 ^ 121 ^ is also divergent. O 5.J.108. Give an example of two divergent series J2n°=i a«> Er^ii with positive numbers for which the series J2n°=i (3fln — 2bn) converges absolutely. O 5.J.109. Determine whether the two series 00 I P\2 00 7 4 El -\\n . W -\\n n —n+n V L) (2n)!' L> n8+2ne+n 71=1 71=1 converge absolutely, converge conditionally, or do not converge at all. O 5.J.110. Does the series OO — c — El _ 1 \n+l -yn+^Ti+l n=l converge? O 5.J.111. Find the values of the parameter p G R for which the series OO E (-l)"sin"£ 71=1 converges. O 5 J.112. Determine the maximal subset of R where the function . cos(^-21+cos *)+*-256*3 y = arctg [xZL + smx) ■ $-2+x^- can be defined. O 5.J.113. Write the maximal domain of the function y = MCCTx2_* ■ O _ arccos (In x) 5 J.114. Determine the domain and the range of the function y ~ 2-3x' Then determine the inverse function. O 5.J.115. Is the function (a) y = cos X . X3 ' (b) y = cos X I -1 . x3 + X' (c) y = cos X . X4 ' (d) y = cos X 1 -| . (e) y = sin a; + tan 1; (f) y = In ±±2l; 1—X ' (g) y = sinha; = e ~2e (h) y = coshai = with the maximal domain odd, even, or neither? O 5.J.116. Is the function (a) y = sep; (b) y = ^ + 1; (c) y = (d) y = + 1; 362 CHAPTER 5. ESTABLISHING THE ZOO (e) y ■ (f) y = in £f; (g) y = sinha; = eX~e *; (h) y = coshx = e*+2e with the maximal domain even, odd or neither? O 5.J.117. Determine whether the function (a) y = sin x ■ In | x |; (b) y = arccotga;; (c) y = xs - {/3x6 + 3a;2 - 6; (d) y = cos (tt — x); (e) v = tanx+x ^ ' " 3+7 cos x with the maximal domain is odd or even. O 5 J.118. Is the function (a) y = In (cos x); (b) y = tan (3x) + 2 sin (Qx) with maximal domain periodic? O 5.J.119. Draw the graphs of the functions f(x) = e' x I, a; eE, g(a;) = In | a; |, a; G R \ {0}. O 5 J.120. Draw the graph of the function y = 2~lxl,a;eR. O 5 J.121. The functions sinh.r = -—j—. cosh ./• = e+2 *, tanha; = ||f, x G R; cotha; = ^f, a; G R x {0} are called hyperbolic functions. Determine the derivatives of these functions on their domains. O 5 J.122. At any point x G R, calculate the derivative of the area hyperbolic sine (denoted arsinh), the function inverse to the hyperbolic sine y = sinh x on R. O Remark. Note, that the inverse functions to the hyperbolic functions y = cosha;, x G [0,+oo), y = tanha;, x G R and y = cotha;, x G (-co, 0) U (0, +oo) are called area hyperbolic functions (y = arsinh a; belongs to them, too). They are denoted arcosh, artanh, arcoth, respectively and are defined for x G [1, +oo), x G (—1,1), and x G (-co, —1) U (1, +oo), respectively. Let us add that (arcosha;)' = %/xl_1 ? a; > 1, (artanha;)' = jz^z, \x \ < 1, (arcoth x) ' = x | > 1. 5.J.123. Calculate the sum of the series: 2 12 12 2 + 1+2! + 3!+4! + 5! + 6!+'" 363 CHAPTER 5. ESTABLISHING THE ZOO Solutions to the exercises SA.7. P(x) = (-§ - f *)x2 + (2 + 3t)x -I-ft. 5A.15. Sought spline differs from the one in 5.A. 14 only in the values of the derivatives at the points —1 and 1. Similarly to the previous task, we get that the parts Si and S2 of our spline have the forms Si (x) = ax3 + bx2 + 1 and S2 (x) = —ax3 + bx2 + 1, respectively, where a, b, c, d are unknown real parameters. Confronting this with the conditions Si(—1) = 0, Si(—1) = 1, &(1) =0, and S2(l) = 1 yields the system -a + 6+1 = 0, 3a - 2b = -1, having the solution a = —3, b = —4. Hence, the wanted spline is the function „, f-3x3 - 4x2 + 1 pros e [-1,0], [X> ~ { 3x3 - 4x2 + 1 pro x e [0,1]. ■ 2x - 4. 5A.17. (2x2 - 5) /3; eg. (fx2 - |)3. 5A.18. a = 1, b = -2, c = 0, d = 1. 5A.19. x3 + x2 - x + 2. 5A.20. Infinitely many. 5A.21. P(x) = x3 - 2x2 + 5x - 3; Q(x) = x3 - 2x2 + 3x - 3. SA.22. x5 - 2x4 -5x + 2. 5A.23. x2. SA.24. x3 - 2x + 5; x3 - x + 6. 5A.25. Infinitely many. 5A.26. Eg. x2 - 3x + 6. 5A.27. Si(x) = \ (x + 1)3 - f (x + 1) + 1, x e [-l,0];S2(x) = -|x3 + f x2, x e [0,1]. 5^4.2«. Si(x) = i (x + 1)3 - f (x + 1) + 1, x e [-l,0];S2(x) = -ix3 + f x2, x e [0,1]. 5^4.2P. Si (x) = x; S2 (x) = x. 5^4.30. Si (x) = l;S2(x) = 1. 5A.31. Si{x) = x + 3, x e [-3 + i - 1, -3 + i\\i e {1, 2}. 5.A.32. Si(x) = 1 - § x + j-0 x3; S2(x) = \ - f (x - 1) + £ (x - l)2 - ^ (x - l)3. 5.B.2. sup+ = 6, inf A = —3; sup B = i, inf B = — 1; supC = 9, infC = -9. 5.B.3. It can easily be shown that 3 sup A = -, inf A = 0. 5.B.4. Clearly infN = l, supAl = 0, inf J = 0, sup J = 5. 5.B.5. We can, for instance, set M := Z \ N; N := N. 5.B.6. Consider any singleton (one-element set) Xcl 5.B.7. The set C must be a singleton. Thus, let us choose C = {0}, for example. Now we can take A = (—1,0), B = (0,1). 5.C.5. We have (1 2 Ti-2 n-l\ ,. /1 + n-l n-l\ 1 lim — + — + • • • +-— +-— = hm ' 1 71^00 V 77 2 772 77 2 772 / n^oo V 772 2 /2 364 CHAPTER 5. ESTABLISHING THE ZOO 5.C.6. It can easily be shown that lim II >'x: y/n3 - lln2 + 2 + \/ri - 2n5 ■ n + sin n f/5n4 + 2n3 + 5 5.C.7. The limit is equal to 1. S.C.8. We can, for instance, set IJn := —n + 1, n e N. 5.C.9. The answer is ±1. 5.C./0. The result is lim sup an = n—>oa 5.C.11. We have liminf ((-!)" fl + lim inf i + sin • 72 2 5D.5. The examined function is continuous on the whole R. 5.D.6. The function is continuous at the points —7r, 0, tt; only right-continuous at the point 2; only left-continuous at the point 3; and continuous from neither side at 1. 5JJ.7. It is necessary to set /(0) := 0. 5JJ.8. The function is continuous iff p = 2. 5D.9. The correct answer is a = 4. 5JJ./0. It holds that lim lim - 0. 5D.13. The only solution is x = — 1. 5.D.14. It has even two roots, because P(-l) > 0 > P(0) < 0 < P(l) and due to 5.2.19 there is one root in (-1,0) and one in (0,1). 5.E.4.f'{x) = 2xlnx-1 -Inx. 5.E.5. (sin x)1+cos x (cot2 x - In (sin x)). 5.E.7. f -^0.003. 5.E.8. a ss f + 0.01; b sa 4.125. j.c.y. d; 2 360 , o) 2 -1- 360 . 5.E.13. (a) 12 ft/s; (b) -59, 5 ft2/s; (c) -1 rad/s. 5.E.14. The slope of the tangent line of the polynomial P is given by the derivative of the polynomial. Consider P(0) and P(2). Yes, there is. ■ ■ - , _ rcn x xo ' = 2x. 5.E.15. 5.E.16. 5.E.17. 5.E.18. y-lJf-5.E.19.[^2\\. 5.E.20. t:y=f + |;n:j/ = -6x + 15 Vl{x + l);y = -^(x + l). ¥ = (if-¥)(--i);j/- 2 j(x-l). 3 |+ 11 5.E.2/. tt/4. 5.E.22. y = 2 — x;y = x. 5.E.23. The inequalities follow, for instance, from the mean value theorem (attributed to Lagrange) applied to the function y = In (1 + t), t e [o,x]. 5.1.6. r = +00. 5.1.7. 1. 365 CHAPTER 5. ESTABLISHING THE ZOO 5.1.8. 3. 5.1.9. [-1,1]. 5.1.10. x e [2-i,2 + i]. 5.1.11. It is. 5.1.12. (a) True. (b) False. (c) False. (d) True. 5.1.14. The error lies in the interval (0,1/200). 5.1.15. yj°° n ± (x - 1)"; r°° . isl^ x". 5.1.16. f{x) = x, x e R;itis. 5.7.77. It does not. 5././S. (a) 1 - + (b) i - ^rj. 5 7/9 r™ _1_x2n+1 o.i.±y. l^n=0 (2n+i) „! x 5.7.20. a > 1. 5.7.27. [--^2, v^). 5.7.22. For x e [-1,1]. 5.7.23. x > 2. 5.1.24. The series is absolutely convergent. 5.7.25. In (3/2). 5.1.26. f^. 5.7.27. (a)ilni±f;(b)^. 5.7.2S. 2/9. 5.7.29. xe^. 5J.7. x4 +2x3 - x2 + x - 2. 5J.2. x4 + 2x3 - 2x2 + x + 2. 5.J.3. x4 + 3x3 - 3x2 - x - 1. 5J.4. For every e > 0, it suffices to assign to the e-neighborhood of the point —2 the ^-neighborhood of the point 0 given by s n- 8, 8 = s, and without loss of generality, we can assume that e < 1. Since if e > 1, we can set 5 = 1. 5J.5. Existence of the limit and the equality (l + x)2-3 3 lim 2--- =-- a;->-l 2 2 follows from the choice 8 := e for e e (0,1). 5.J.6. Since - (x - 2)4 < x for x < 0, we get 3 (x - 2)4/2 > -x for x < 0. 5 J. 7. As ,. 1 7T 1 IT lim arctan — = —, lim arctan — =--, 3:^0+ X 2 a;^0- X 2 the considered limit does not exist. 5J.8. The former limit equals +00, the latter does not exist. 366 CHAPTER 5. ESTABLISHING THE ZOO 5 J.9. The limit can be determined in many ways. For instance: ,. tan x — sin x (tan x — sin x cot x lim-=- = urn -=-•- a=^o sin i i^o\ sin i cotx ,. 1 — cos x 1 — cos x nm -~— = lim cos x ■ sin2 x cos x (1 — cos2 x) lim no cos i (1 + cos i) 2 5J./0. We have 2 sin3 x + 7 sin2 x + 2 sin x — 3 sin x + 1 urn -=-=- = lim - = —3. 2 Sin X + 3 Sin X — 8 Sin X + 3 3j^7r/6 sin x — 1 5J.//. We have xm - 1 m lim - = —. x" — 1 n 5 J.12. After multiplying by the fraction \/x2 + x + : \/x2 + x + : it follows that lim (\J x2 + x — x] = —. 5J./3. We have lim (x \J\ + x2 — x2") = —. 3:—> + oo V / 2 We have ,. 72 - VI + cosx ^2 lim -=- = -. no sin x 8 5J.15. By extending the given fraction, we obtain sin (Ax) lim v ; = 8. no - 1 5J./6. We have .. VI + tan x — \J1 — tan x lim - = 1. ai-x)— sinx 5J./7. 2a: + Vl + x2 - x9 - 7x5 + 44x2 7 lim - —- = —. 3* + V6x6 + x2 - 18x5 - 592x4 18 5J.18. The statement is false. For example, consider f(x):=—, x£ (-oo,0); g(x) := x, lEl. / \ 2n-l lim (-- I = e~10. moo yn + 5/ 5J.20.-i. 5.J.21. f (x) < 0, x > e. 5J.22. The function has a local maximum at the point xi = e~2. It has a local minimum at the point x2 = 1. 5 J.23. No: if a = V2~/2, there is only a local extrémům at the point. 5 J.2* 2 = e 1 - In 1. 5J.25. ^. 5J.26. 4 = p (-1) = p (2), -16 = p (-3). 5J.27. (a)u(O) = 6m/s; (b) í = 3 s, s(3) = 16 m; (c) v (A) = -2m/s,a(4) = -2m/s2. 367 CHAPTER 5. ESTABLISHING THE ZOO 5.J.28. f (xo) = 5-i=. 5J.29. It does not because the one-sided derivatives differ (specifically: -k/2 from the right and —7r/2 from the left). 5J.30. It does. 5J.31. It does not. 5J.32. f(x) :=|x-5| + |x-9|. 5J.33. Let / = g = 1 at the rational numbers and / = g = — 1 at the irrational numbers. 5.J.34. (a) x2 sinx; (b) cos (sins) • cosx; (c) ff^f cos (in (x3 + 2s)); (d) . 5J.35. (a) | x~ s; (b) cosecx = . 5.J.36. cos x • cos (sin x) • cos (sin (sin x)). 5J.37. f (x) = Vl+l_x, H,i6 (1-V2.1 + 4 5J.3S. 3v sin'1 a: 5J.40. -8. In2 (x + \/l + s2), x e R. 5J.4J. / (x) = -I (log, e)2, x > 0, x # 1. [/(x)g(x)/i(x)fc(x)]' = f (x)g(x)h(x)k(x) + f(x)g'(x)h(x)k(x) + f(x)g(x)ti(x)k(x) + f(x)g(x)h(x)k'(x). r J Ac x3 (x+lf /" 3 , _J_ , _^___2_\ J-J-*J- (x+3)2 x + 1 3(x+2) x+3 j- 5J.52. The inscribed rectangle has sides of lengths x, \/3/2(a — x), thus its area is \/3/2(a — x)x. The maximum occurs for x = a/2, hence the greatest possible area is (\/3/8)a2. 5J.53. 4m x 4m x 2m. 5J.54. 28 = 24 + 4. 5J.55. a = l. 5J.56.2-Sr. 5 J.57. It is the square with sides of length c). 5J.58.h= ffl, r = 2^1 R 5 J.59. It is the equilateral triangle (with area 5J.60. [2,-1/2], [-2,-1/2]. 5J.61. h = 2r. 5J.62. The closest point is [1,1], the distance then 2\/2. 5J.63. The closest point is [—1,1], distance 3\/2. 5J.64. t = 1, 5s, the distance between will be \/5 units. 5J.65. It will happen at the time t = ^ s, the distance being -^p units. 5.J.66. P = irrv + 7rr2 =>■ u = P~^.r ==>■ V = |r (P — irr2). The extremum is at r = ^/^", the substitution gives V = cm3. 5 J.67. At about 3 414 products per day. 5J.68. Triple use of l'Hospital's rule gives sin x — x 1 lim 5J.69. 2/tt. 5 J.70. x^o- x3 6 lim i — — x | tan x = 1. =->?■ — V 2 368 CHAPTER 5. ESTABLISHING THE ZOO 5.J.71. lim x—s- + oo (0 3- - 2* ) 5J.72.1/2. 5 J. 73. We have lim x—s- + oo cos — X 2 ) = e -2 5.J.74. By applying l'Hospital's rule twice, one obtains i ■ /-i \sin x O-i lim (1 — cosx =e =1. x^O 5 J. 75. In both cases, the result is e°. 5J.76. The limit is easily calculated by l'Hospital's rule, for instance. 5J.89.2a2;4a (2 + v/2). 5J.90. tt/2. 5.J.91. x = f + kir, x = ^ + kir, k e Z. 5J.P2. 5. 5J.P3. +oo. 3/2. 5J.P5. (a) 3; (b) 9/4. 5J.P6.1/2. 5J.97. (a) 3/4; (b) 1/4. 5J.98. -1/2. 5.J.99.11/18. 5J.100.s/2;3s/2 (s = ln2). 5J./0/. It does. 5 J.102. It suffices to consider the necessary condition for convergence, namely limn-> an = 0. 5J.103. a > 0; /3 e {-2, -1, 0,1, 2}; 7 e (-00, -1) U (1, +00). 5 J.104. It is absolutely convergent. 5 J.105. The limit is 1/2. 5J.106.Ae [0,1). 5J.107. The value of the given series is finite - the series converges. 5J.108. For example: an = n/3, fen = n/2, n e N. 5J.109. The former series converges absolutely; the latter converges conditionally. 5.J.U0. It does. 5J.111. p e R. 5.J.112. R. 5J.113. (l,e]. 5.J.115. (a) yes; (b) no; (c) no; (d) no; (e) yes; (f) yes; (g) yes; (h) no. 5J.116. (a) no; (b) no; (c) yes; (d) yes; (e) no; (f) no; (g) no; (h) yes. 5J.117. The functions (a), (e) are odd; the functions (c), (d) are even. 5J.118. It is periodic. The prime period is (a) 2ir; (b) tt/3. 5J.119. The functions / and g are even, so it suffices to consider the graphs of the functions y = ex, x e [0, +00) and y = lnx, x e (0, +00). 5 J.120. The given function is even, so to draw its graph, it suffices to know the graph of the function y = 2X, x e (—00,0]. 5J.121. (sinhx)' = coshx; (coshx)' = sinhx; (tanhx)' = C0S^2X(cothx)' = — sil^2x- 5.J.114. (-00, |) U (f ,+00' |) U (-|,+oo); y — + l „ _z _ 1 3x + l< 1 3' 5.J.122. Vi+x? ■ 369 CHAPTER 5. ESTABLISHING THE ZOO 5.J.123. Consider the series with the expansions of the functions sinh and cosh into power series. The result is sinh(l) + 2cosh(l). 370 CHAPTER 6 Differential and integral calculus we already have the menagerie, but what shall we do with it? - we'll learn to control it... A. Derivatives of higher orders First we'll introduce a convention for denoting the derivatives of higher orders: we'll denote the second derivative of function / of one variable by /" or f(2\ derivatives of third or higher order only by J^3-1, f^,... . For remembrance, we'll start with a slightly cunning problem using "only" first derivatives. 6.A.I. Determine the following derivatives: i) (x2 ■ sin a)", ii) (xx)'' iü) (if*-; • iv) (xn)(n\ v) (sinx)^). Solution, (a) (x2 ■ sinx)" = (2x sinx + x2 cos x)' = 2 sin a; + 4x cos x — x2 sin a;. (b) (xx)" = [(1 + lax)xx]' = xx~x + xx(l + lna;)2. m (^\^ — 1___6 w Unij — i2(lni)2 i2(lni)4 1 (d) (xn)^ = [(xn)']{n~1) = {nx™-1)^-1) = ...=n\. (e) (sina;)^ = Re(in sina;) + Im(in cos a;). □ n(3) In the previous chapter, we were working either with an extremely large class of functions (for example, all continuous, all differentiable), or with only particular functions, (for example exponential, trigonometric, polynomial). However we had very few tools. We indicated how to discuss the local behaviour of functions near a given point by linear approximation. We learned how to measure instantaneous changes by differentiation. Now we derive several results that will allow us to work with functions more easily when modeling real problems. We also deal with the task of summing infinitely many "infinitely small" changes, in particular, how to "integrate". In the last part of the chapter we come back to series of functions and complete several missing steps in the argumentation so far. We also add useful techniques, how to deal with extra parameters in the functions, and we introduce some further integration concepts briefly. 1. Differentiation 6.1.1. Higher order derivatives. If the first derivative f'(x) of a function of one real variable has a deriva-kf^- five (f'y(x0) at the point x0, we say that the |3g>r second derivative of function / (or second order derivative) exists. Then we write f"(x0) = (f'y(x0) or /(2)(a;o). A function / is two times differentiable on some interval, if it has a second derivative at each of its points. Derivatives of higher orders are denned inductively: k times differentiable functions A function / of one real variable is differentiable k times at the point x0 for some natural number k > 1, if it is differentiable (k — 1) times on some neighbourhood of the point x0 and its (k — l)-st derivative has a derivative at the point x0. We write f1-^ (x) for the fc-th derivative of the function j(x). If derivatives of all orders exist on an interval A, we say the function / is smooth or infinitely differentiable on A. We use the notation class of functions C* (A) for functions with continuous fc-th derivative on the interval A, where k can attain values 0,1,..., oo. Often we write only C*, if the domain is known from the context. When k = 0, C° means continuous functions. CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6.A.2. Differentiate the expression sfx~t- (x + 2)3 ex(x + 132)2 of variable x > 1. Solution. We'll solve this problem using the so called logarithmic differentiation. Let / be an arbitrary positive function. We know that = tj. f'(x) = f(x)-i\nf(x)]', if the derivative f'(x) exists. The usefulness of this formula is given by the fact that for some functions, it's easier to differentiate their logarithm then themselves. Such is the expression in our provlem. We'll obtain Vx~=T-(x + 2)3\' ^(x + 132)2 J Vx~^T-(x + 2)3 f tyx~=T-(x + 2)3]' e?{x + 132)2 sfx^X ■ (x + 2)3 eF(x + 132)2 eF(x + 132)2 |^3 In (x + 2) + i In (x - 1) - x lne - 2 In (x + 132)j sfx^X ■ (x + 2)3 f 3 (x0) + f*>(x0)g (xo, In this new sum, the sum of orders of the derivatives of products in all summands is k + 1 and the coefficients of fU)(xo (A) + (i (fc+i- en are the sums of binomial coefficients □ 6.1.2. The meaning of second derivative. We have already seen that the first derivative of a function is its linear approximation in the neighbourhood of a given point. The sign of a nonzero derivative determines whether the function is increasing or decreasing at the point x0. The points where the first derivative is zero are called the critical points or stationary points of the given function. If xq is a critical point of function /, there are several possibilities for the behaviour of the function / in the neighbourhood of xq. Consider the behaviour of the function f(x) = xn in the neighbourhood of zero for different 372 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS )k-H (ax+l)k (-l)k-1(k-l)\ ak (-k) a _ (-l)kk\ ak+1 (ax+l)k+1 ~ (ax+l)k+1 ' (1) holds for all n e N. Then hW(H^'-'iy' In^(l-x) -T^TF, *G(-1,1). From here we obtain the result (lnS)(B) = (»-l)>(^ pro a; G (-1,1) an G N. (l+x) □ 6.A.4. Determine the second derivative of function y = tg x on its whole domain, i.e. for cos x ^ 0. O 6.A.5. Determine the fifth and the sixth derivative of the polynomial^) = (3a;2 + 2x + l)-{2x-6)-(2a;2 -5a; + 9), x G R. O With shorcuts determine the 12th derivative of function y = e2x + cos x + x10 - 5x7 + 6x3 - 7x + 3, x G R. O 6.A.7. Write the 26th derivative of function f(x) = sinx + x23 - xls + 5a;11 - 3a;8 + e2x, x G R. O 6.A.8. Let / be a given function and let z be a point such that /(*) = 0, /'(*) = 0, f"(z) = 0, f&(z) = l. Which of the following statements: (a) the tangent line to the graph of function / at point [z, f(z)] is the x axis; (b) the function / is not a polynomial of degree two; (c) the function / is increasing at point z; (d) the function / does not have a strict local minimum at point z ; (e) the point z is an inflective point of function / are necessarily true? O 6.A.9. Let a movement of a solid (a trajectory of a mass point) be described by function s(t) = -(t - 3)2 + 16, te[0,7] in units m, s. Determine (a) the initial (i.e. at time t = 0 s) velocity of the solid; (b) the time and location at which the solid has zero velocity; (c) the velocity and the acceleration of the solid at time t = 4 s. Recall that velocity is the derivative of trajectory and acceleration is the derivative of velocity. O n. For odd n > 0, f(x) will be increasing on all R, while for even n it will be decreasing for x < 0 and increasing for x > 0. In the latter case, the function will attain its minimal value among points in the (sufficiently small) neighbourhood of x0 = 0. The same argument applies to the function /'. If the second derivative is nonzero, its sign determines the behaviour of the first derivative. At the critical point x0 the derivative j'(x) is increasing if the second derivative is positive and decreasing if the second derivative is negative. If increasing, it is necessarily negative to the left of the critical point and positive to the right of it. In that case, / is decreasing to the left of the critical point and increasing to the right of it. So / attains its minimal value among all points from a (sufficiently small) neighbourhood of x0 at x0. On the other hand, if the second derivative is negative at x0, the first derivative is decreasing. Thus the first derivative is negative to the left of x0 and positive to the right of it. / then attains its maximal value at xq among all values in a neighbourhood of x0. A function which is differentiable on (a, 6) and continuous on [a, 6] has an absolute maximum and minimum of this interval. Both can be attained only at the boundary of the interval or at a point where the derivative is zero, Thus critical points may be sufficient for finding extremes. Second derivatives help to determine the type of extreme, if nonzero. For a more precise discussion of the latter phenomena we consider higher order polynomial approximations of the functions. We return to the qualitative study of the behaviour of functions later on. 6.1.3. Taylor expansion. As a surprisingly easy use of Rolle's theorem we derive an extremely important result. It is called the Taylor expansion with i remainder1. Consider the power series centered at a, s(x) = J2a"(x-aT- Differentiate it repeatedly, to get the power series S« (x) = ^ n(n - 1). .. (n - k + l)an(x - a)n~k. n=k Put x = a. Then (a) = klak. We can read the last statement as an equation for ak and rewrite the original series as oo 1 S(x) = J2u^k)(a)(x-ar. 71 = 0 (NB. A power series can be differentiated term after term, This is proved later.) ^Brook Taylor was an English mathematician (1685-1731) best known for his formalization of the polynomial approximations of functions, recognized by Lagrange as the " main foundation of differential calculus" 373 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Taylor expansions. We necessarily need the derivatives of higher orders to determine the Taylor expansion of a given function. 6.A.10. Determine the Taylor expansions Tk (of fc-th order at point x) of the following functions: i) Tq of function sin x, ii) T3 of function ^. Solution, (i) We'll compute the values of the first, second and third derivative of function / = sin at point 0: /'(0) = cos(0) = l,/(2)(0) = -sin(0) = 0, /(3)(0) = -cos(0) = —1, also /(0) = 0. Thus the Taylor expansion of the third order of functionsin(a;) at point 0 is 1 , Tq (sin(a;)) = x--x 6 (ii) Again /(l) = e, ex f(2 = 0 2ex x x Suppose / is a smooth function instead of a power series. We search for a good approximation by polynomials in the neighbourhood of a given point a. Taylor polynomial of function / For a k times differentiable (real or complex valued) function / of one real variable, define its Taylor polynomial of k-th degree centered at a by the formula Tk,af(x) = /(a) + f'(a)(x - a) + ^f"(a)(x - a)2+ ±fW(a)(x-a)3 + --- + ±fW(a)(x-a)k. The mean value theorem is used to show how good the approximation to / it is. Taylor expansion with a remainder Theorem. Let f(x) be a function that is k times differentiable on the interval (a, b) and continuous on [a, b}. Then for all x £ (a, b) there exists a number c G (a, x) such that f(x) = f(a) + f(a)(x-a) + ... + Tkhy.f{k~1){a){x ~a)fc_1 + hJ(k){c){x -a)k = Tk-X X* Thus we get the Taylor expansion of third order of function — at point 1: The-) = e+e-(x-iy--(x-iy = e(- y+^~2a:+6) □ Proof. For fixed x define the remainder R, by f(x)=Tk.ltaf(x) + R. Then R = ±r{x - a)k for a suitable number r (dependent on x). Consider the function denned by k—1 6.A.11. Determine the Taylor polynomial Tq of function sin and using theorem (6.1.3), estimate the error of the polynomial at point 7t/4. Solution. Analogously to the previous example, we compute ?o (sin(aO) = x - ^x3 + ^x5. Using the theorem 6.1.3, we then estimate the size of the remainder (error) R. According to the theorem, there exists c e (0, f) suchthat 3=0 3- By the Leibniz rule, its derivative (here x is considered as a constant parameter) is £(^/ü+1)(0(*-0- l R(it/A) COs(c)7t7 1 7!47 < 7! 0,0002. (fc-1)! 1 {k~Ty. (k-i) 1 (3 - 1)! k-l (.x-Z)k-\fW(Z)-r), r(x-£) k-l □ 6.A.12. Find the Taylor polynomial of third order of function y = arctg x, i£l because the expressions in the sum cancel each other out sequentially. Now it suffices to notice that F(a) = F(x) = j(x) (recall that a; is an arbitrarily chosen but fixed number from the interval (a, b)). According to Rolle's theorem there 374 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS at point x0 = 1. o 6.A.13. Determine the Taylor expansion of third order at point x0 = 0 of function l . cos X ' (a) y ■ (b) y = e~^; (c) y = sin (sin x) ; (d) y = tgx; (e) y = &x sin a; defined in a certain neighbourhood of point x0. o 6.A.14. Determine the Taylor expansion of fourth order of function y = In a;2, x £ (0, 2) at point a;0 = 1. O 6.A.15. Find the estimation of the error of the approximation In (1 + x) « x — ^ for a; £ (-1,0). O 6.A.16. Write the Taylor polynomial of fourth degree of function y = sin x, x £ R centered at the origin. Using this polynomial, approximately compute sin 1° and determine the limit lim x slnx~x X-S-0+ x o 6.A.17. Determine the Taylor polynomial centered at the origin of degree at least 8 of function y = e2x, x £ R. Q 6.A.18. Express the polynomial a;3 — 2a; + 5 as a polynomial in variable u = x — 1. O We'll show some more interesting examples of using the differential calculus. First though, we'll mention the Jensen inequality, which disscusses convex and concave functions and which we'll use later. exists a number c, a < c < x such that F'(c) = 0. That is the desired relation. □ A special case of the last theorem is the mean value theorem, as an approximation by Taylor series of degree zero. See (1). 6.1.4. Estimations for Taylor expansions. A simple case of a Taylor expansion is when / is a polynomial: f(x) = anxn + an_1xn + • + a1x + a0, an/0. Because the (n + l)-th derivative / is identically zero, the Taylor polynomial of degree n has zero remainder, therefore for each x0 £ R f{x) = f(x0)+f'(x0)(x-x0)+- ■ .+lf^(x0)(x-xor. We can compute all the derivatives easily (for example the last term is always of the form an(x — x0)n). This result is a very special case of error estimation in Taylor expansion with the remainder. We know in advance that the remainder can be estimated by the size of the derivative, and for polynomials this is identically zero for some order onwards. More generally, the estimation of the size of the fc-th derivative on some interval can be used to estimate the error on the same interval. Good examples of an expansion of an arbitrary degree are provided by the trigonometric functions sin and cos. By iterating the differentiation of the function sin x we always have either sine or cosine with some signs. The absolute values do not exceed one. Thus we obtain a direct estimation of the speed of convergence of the power series ix - (Tkfl sin) (x) < ifc+i (fc + 1)! 6.A.19. Jensen inequality. For a strictly convex function / on interval I and for arbitrary points x1,... ,xn el and real numbers ci,..., cn > 0 sucht that c\ + ■ ■ ■ + cn = 1, the inequality (71 \ 71 E cixi) < E cif(xi) i=l / i=l holds, with equality occuring if and only if x1 = ■ ■ ■ = xn. Solution. Could be proven easily by induction: for n = 2 it is just the definition of the convex function, for the induction This shows that for x much smaller than k the error is small, but for x comparable with k or bigger it may be large. In the figure, compare the approximation of the function cos x by a Taylor polynomial of degree 68 in paragraph 5.4.11 on the page 336. As mentioned in the introduction of the discussion of Taylor expansion of functions, if we start with a power series f(x) centered in a, then its partial sums coincide with Taylor polynomials Tk,af(x). The next statement is one of the simple formulations of the converse implication. This is when the given function f(x) is actually a power series on some neighbourhood of the given point a. 375 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS step / ^ a x. Taylor's theorem \j=i fc+i = / ^i + (i-ci)Er i=2 /k+1 intuitive way: the centroid of mass points {vl»" Vrf placed upon a graph of a strictly convex function lies above this graph. 6. A.20. Prove that among all (convex) n-gons inscribed into a circle, the regular n-gon has the largest area (for arbitrary n > 3). Solution. Clearly it suffices to consider the n-gons inside of which lies the center of the circle. We'll divide each such n-gon inscribed into a circle with radius r to n triangles with areas 5„ie{l,...,ji} according to the figure. With regard to the fact that i £i = hi. e {l,...,n}, we have \ r2 sinp{, i £ {1,..., n}. 2 2 This implies that the area of the hole n-gon is 77 n S = J2 si = \ r2 E sinpi. i=l i=l Thus we want to maximize the sum E™=i sm while for values ipi e (0, it) we clearly have 77 (1) ipi H-----h ipn = Y 0. So \f{k)(x)\ < M, A; = 0,1,..., xe (a-b,a + b). Then the power series S(x) = 'b.f^(a)(x ~ a)n converges on the interval (a — b, a + b) to f(x). Proof. The proof is identical with the the special case of function sin x above. Except that the universal bound by 1 is replaced by M, and thus the estimate of the remainders are \f{x)-{TKaf){x)\< M ,\k+l (k + l)\ □ 6.1.5. Analytic and smooth functions. If / is smooth at a, the formal power series can be written 00 1 S(*) = £n/(fc) («) (*-«)"■ 71 = 0 If this power series has a nonzero radius of convergence and simultaneously S(x) = f(x) on the respective interval, we say that / is an analytic function at a. A function is analytic on an interval, if it is analytic at its every point. Not all smooth functions are analytic. It can be proven that for every sequence of numbers an there is a smooth function, whose derivatives of order k are these numbers ah.2 To show the essence of the problem, we introduce a function which has all its derivatives vanishing at \, zero, but is nonzero at every other point. We see later how useful this function is. Consider the function denned by f(x) = e"1/-2 . It is a well denned smooth function at all points x ^ 0. Its limit at x = 0 exists, and limx^0 fix) = 0. By defining /(0) = 0, / is a continuous function for all real x. By a direct computation based on L'Hopital's rule we compute the derivative of / (the first three ones are at the This is a special case of the Whitney extension theorem, which says that there is a smooth function on a Euclidean space with prescribed derivatives in all points of a closed set A if and only if the Taylor theorem estimates are true for the prescription. In the case of one single point A, the condition is empty. This is relevant for the Taylor theorem for functions of more than one real variable, as in Chapter 8. Hasler Whitney (1907-1989) was a very influential American mathematician. 376 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Moreover, we know the equality occurs exactly for 0), the perimeter also gets a times bigger and the area a2 times (it's a square measure). Hence IQ doesn't depend on the size of the region, but only on its shape. Thus we can consider a regular n-gon inscribed into a unit circle. According to the figure, h = cos w = cos —, % = sin iz> = sin —, which yields o„ = n ■ x = 2n sin — and Sn = n ■ ^ hx = n cos ^ sin ^. Thus for a regular n-gon, we have Ty^. 47T71 COS — Sill — ■k , ■k JQ = in- sin2 jl " = ^COtg-, which we can verify for example for a square (n = 4) with a side of length a, where picture, guess which line is which!). It suffices to consider only the right derivative, since the function is even. lim - - lim , , , 2 Mo+ e1/*2 x X i-s-o+ eL'x 0. By differentiating j(x) at an arbitrary point x =^ 0, j'(x) = e_1/x -2a;~3. By repeated differentiation of the results, there is always a sum of finitely many terms of the form C ■ e-Vx* -x-j, where C is an integer and j is a natural number. Next, assume it is already proven that the derivative of order k of j(x) exists and vanishes at zero. Compute the limit of the expression f^k\x)/x for x —> 0+. This is a finite sum of limits of the expressions e-1/x = x~i/ e1^ . All these expressions are of type 00/00, so L'Hopital's rule can be used repeatedly on them. After several differentiations of both the numerator and denominator (and a similar adjustment as above) there remains the same expression in the denominator, while in the numerator the power is non-negative. Thus the expression necessarily has a zero limit at zero, just as in the case of the first derivative above. The same holds for a finite sum of such expressions. So each derivative f1-^ (x) at zero exists with value zero. In summary, j(x) is smooth on the whole of R. It is strictly positive everywhere except for a; = 0. All its derivatives at this point are zero. It cannot be analytic at x0 = 0. The limit of the function at the improper points ±00 is 1, while all its derivatives converge quickly to zero. One of the most important functions in Mathematics and Physics is the Gaussian function g(x) = f(x-1)=e-x\ We see why in Chapter 10 when dealing with probability and statistics. Replace x with —x2 in the power series for the exponential. It follows that g is an analytic function on the entire R. Its derivatives are easily computed. In particular g' (0) = 0 is the only singular point and g"(0) = —2. Since the limits at ±00 are zero, the number g(0) = 1 is the global maximum of the Gaussian function. Both functions are depicted on the diagram. j(x) is the solid line, while the Gaussian function is dashed. __^ / s \-- \ yr \ / \ / \ / \ / V 04 y a a / \ °" / \ / \ / v / ^ ~-~ — ■ill CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Using the limit transition for n —> oo and the limit lim SmiE = 1, we get the isoperimetric quotient for a circle: IQ = lim 2L Cotg 2L = lim S2£^ = = 1. n—too n n n—too 1 Of course, for a circle with radius r, we could have also directly computed IQ = *&- 4irS _ 4?r (Vr2) (2irr)2 1. For the boundary a of sector of a circle with radius r and central angle p e (0,27r), we have 4itS _ 4tt - IQ = __ _ 27T~y (2r+r"V)2 _ (2+V)2 " Hence we're looking for a maximum of the function f(0, ve(o,2), f'(p) 0. Again it is a smooth function on all of R. By another modification there is another function h, which is nonzero at all inner points of the interval [—a, a], a > 0 and zero elsewhere. 0 if \x\ > a h{x) e^^ + ^ if \x\ < a. This function is again smooth on all of R. The last two functions are in the two figures. On the right, the parameter a = 1/2 is used. -O.K -OjS -0.4 Finally we show how to get smooth analogies of the Heav-iside functions. For two fixed real numbers a < b, define the function j(x) exploiting the above function g as follows: f(x) =_g(x - a)_. g(x - a) + g(b - x) For all x e R the denominator of the fraction is positive (because g is non-negative. For each of the three intervals determined by numbers a and b at least one of the summands of the denominator is nonzero). Thus the definition yields a smooth function j(x) on all of R. For x < a the numerator of the fraction is zero according to the definition of g. For x > b the numerator and denominator are equal. In the next two figures there are functions j(x) with parameters a = 1 — a, b = 1 + a. On the left a = 0.8, and on the right a = 0.4. 378 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS i=l Recall that the isoperimetric quotient is given only by the shape of the figure and doesn't depend on its size. In particular, the value A is constant (it's determined by the shapes of the given figures). Our task is to minimize the sum £™=i ^ wim Y17=i °i — Because we need to minimize the expression n 2 S := 4^ £ A~' i=l Using Jensen's inequality for the strictly convex function y = x2 (on the whole real axis), we obtain n \ 2 n C{ X{ \ ^ C{ X^ i=l / i=l for i, £ 8 and c{ > 0 with the property c1 + ■ ■ ■ + cn = 1. Moreover we know that the equality occurs if and only if By choosing x1 we then get A ■ f-, ie{i,...,ti}, A X^ = 1 / i=l v By several simplifications, we obtain the inequality / n — A A, and then (notice that £™=i °i — 0 ,2 71 2 A — X, ' with equality again occuring for (1) Xl = ■■■ = xn, tj. Ai On This implies that S the smallest, if and only if (1) holds. This smallest value of S is I2/(AirA). Now we only need to determine the lengths of the cut parts o{. If (1) holds, then clearly o{ = k\{ for alH G {1,... ,n) and certain constant k > 0. From n n n Oi = l and simultaneously o{ = k \i = kA, i=l i=l i=l we can immediately see that k = I /A, i.e. o{ = ^l, i £ {1,..., n}. Let's take a look at a specific situation where we are to cut a string of length 1 m into two smaller ones and then create a square and a circle from them so that the sum of their areas is the smallest possible. For a square and a circle (in order), we have (see the example called Isoperimetric quotient) Finally, we can create a smooth analogue of the characteristic function of any interval [c, cl]. Write fe (x) for the latter function j(x) with parameters = —e, b = +e. For the interval (c, d) with the length - c > 2e define the function he (x) = fe (x — c) ■ fe (d — . This function is identically zero on the intervals (—oo, c— a (d + e, oo). It is identically one on the interval (c + d — e). Moreover, it is smooth everywhere. Locally it is either constant or monotonie (you should verify the last claim yourself). The smaller the e > 0, the faster he (x) jumps from zero to one around the beginning of the interval or back at the end of it. The diagram shows the choices [c, d] = [1,2] and e = 0.6, e = 0.3. 6.1.7. Local behaviour of functions. It is time to return to 'rw - the behaviour of real functions of one real variable. We have seen that the sign of the first derivative of a differentiable function W ' determines whether it is increasing or decreasing on some neighbourhood of the given point. If the derivative is zero, it does not of itself say much about the behaviour of the function. We encountered the importance of the second derivative when describing critical points. Now we generalize the discussion of critical points for all orders. First we deal with the local extremes of functions. In the following we consider real functions with a sufficiently high number of continuous derivatives, without specifically stating this assumption. The point a in domain of / is a critical point of order k if and only if /'(a) = ... = /W(a) = 0, / 71 = 0 implies oo oo ex = E 7t(xT= E ^z2", 71 = 0 71 = 0 and oo oo , ,„ X2e-2X=X2 J2±s(_2x)n=j:l^_xn+2t a; G K. Hence e*2+x2e-2-= g *a"*-2)"*"+a, XG Suppose (a) > 0. Then this continuous derivative is positive on a certain neighbourhood O(a) of the point a as well. In that case, the Taylor expansion with the remainder gives í{x) = í{a) + jj^/k+1\c){x-a) k+1 for all x in 0{a). Because of that, the change of values of j(x) in a neighbourhood of a is given by the behaviour of (x — a)k+1. Moreover, if k + 1 is an even number, then the values of /(x) in such a neighbourhood are necessarily larger than the value/(a). So a is a local minimum. But if k is even then the values on the left are smaller, while those on right are larger than /(a). So an extreme does not occur even locally. On the other hand, the graph of the function j(x) intersects its tangent y = /(a) at the point [a, /(a)] in the latter case. Similarly, if (a) < 0, then it is a local maximum for odd k, and there is no extreme for even k. 6.1.8. Convex and concave functions. The differentiable function / is concave at a, if its graph lies completely below the tangent at the point [a, / (a)] in a neighbourhood of a. That is, f(x)f(a) + f(a)(x-a). A function is convex or concave on an interval, if it has this property at all its points. Suppose / has continuous second derivatives in a neighbourhood of a. The Taylor expansion of second order with the remainder implies f{x) = f(a) + f'(a)(x -a) + \f'{c){x - a)2. Then the function is convex, whenever /"(a) > 0, and concave whenever f"(a) < 0. If the second derivative is zero, we can use derivatives of higher orders. But we can only make the same conclusion if the first other nonzero derivative after the first derivative is of even order. If the first nonzero derivative is of odd order, the points of the graph of the function on opposite sides of some small neighbourhood of the studied point will he on opposite sides of the tangent at this this point. A □ 380 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6.A.24. Determine the Taylor series centered at the origin of function (a) y ■ e(-i,i); (l+x)2 (b) y = arctg a;, x e (-1,1). Solution. Case (a). We'll use the formula oo oo ife = E (-*)" = E(-i)"*n, *e(-i,i) 77=0 77=0 for the sum of a geometric series. By differentiating it, we obtain for x e (—1,1) (oo \ ' oo E(-i)^n = E(-i)n"*n_1, 77=0 / 77=1 with (a;0)' = 0, thus the lower index is n = 1. We can see that lw= E(-i) 77+1 „ ™n—1 nar , a; e(-i,i). Case (b). We can express the derivative of function y = arctg f for t e (—1,1) as oo oo (arctgf)' = = E HT = E (-m2n, 77=0 77=0 Because for a; e (—1,1) we have X J (arctg t)' dt = arctg a; — arctg 0 = arctg a; o and x / oo \ oo / x J E (-i)™*2™ = E (-i)njt2ndt 0 \n=0 ) 77=0 \ 0 — Y^ „,277+1 2n+l ' n=0 we already have the result arctg a; = E ^+Tx2n+\ * e (-1,1). n=0 □ 6.A.25. Find the Taylor series centered at x0 = 0 of function f(x) = fucosu2du, x £ o Solution. The equality implies oo, 0 oo. and then (for 0 0 \n=0 { n>' = E [^)\ luAn+1 du) = E (2n)!(4n+2) ^ n=0 \ 0 / n=0 ™477+2 6.1.9. Inflection points. A point a is called arc inflection point of a differentiable function /, if the graph of / crosses from one side of the tangent in the point a to the other. The latter discussion on concave and convex functions shows that the inflections can appear only at points with vanishing second derivative. Suppose / has continuous third derivatives and write the Taylor expansion of third order with the remainder: f{x) = /(a)+/'(a) (x-a)4 f" {a){x-a)2+\ f"'(c)(x-a)3. 2 o If a is a zero point of the second derivative such that /"' (a) ^ 0, then the third derivative is nonzero on some neighbourhood, a is an inflection point since the second derivative changes the sign at a and thus the tangent crosses the graph. In that case, the sign of the third derivative determines whether the graph of the function crosses the tangent from the top to the bottom or vice versa. Moreover, if a is an isolated zero point of the second derivative and simultaneously an inflection point, then on some small neighbourhood of a the function is concave on one side and convex on the other. Thus the inflection points are points of the change between concave and convex behaviour of the graph of the function. 6.1.10. Asymptotes of the graph of the function. We intro- duce one more useful utility for understanding/s-ketching the graph of a function. We consider 'J^ the asymptotes. These are lines in R2 whose dis- ^ tance from the graph of f(x) converges to zero for x —> xq. Thus, an asymptote at the improper point oo is a line y = ax + b, which satisfies lim (f(x) — ax — 6) = 0. x—too An asymptote with a slope. If such an asymptote exists, it satisfies lim (f(x) — ax) = b x—too Consequently the limit lim x—soo x also exists. Conversely, if the last two limits exist, the limit from the definition of the asymptote exists as well, thus these are sufficient conditions too. The asymptote at the improper point — oo is defined similarly. In this way we find all the lines satisfying the properties of asymptotes with slope. It remains to consider lines perpendicular to the x axis: The asymptotes at points a e R are lines x = a such that the function / has at least one of the one-sided limits at a has infinite value. They are called asymptotes without slope. The rational functions have asymptotes at all zero points of the denominator which are zero points of the numerator as well. 381 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS □ 6.A.26. Approximately compute cos with an error lesser than 10"5. O 6.A.27. Without computing the derivatives, determine the Taylor polynomial of degree 4 centered at x0 = 0 of function fix) = cos x — 2 sin a; — In (1 + a;) , a; G (—1,1). Then decide if the graph of function / in neighbourhood of the point [0,1] is above or below the tangent line. O Now we'll state several "classical" problems, in which we'll determine the course of distinct functions. By determining the course we mean (a) the domain (it's given) and the range; (b) eventual parity and periodicity; (c) discontinuities and their kind (including the according one-sided limits); (d) points of intersections with the axes x, y; (e) the intervals where the function is positive and where it's negative; (f) the limits lima:->.-0o f(x), limx^+00 fix); (g) the first and the second derivatice; (h) the critical and the so called stationary points, at which the first derivative is zero (eventually the points, at which the first or the second derivative don't exist) (i) the intervals of monotonicity; (j) strict and nonstrict local and absolute extremes; (k) the intervals where the function is convex and where it's concave; (1) the points of inflection; (m) the horizontal and inclined asymptotes; (n) values of the function / and its derivative /' at significant" points; (o) the graph. 6.A.28. Determine the range of function Solution. The line y = 1 is clearly an asymptote of function / at +oo and the line y = —1 is an asymptote at —oo, because lim lim ! The inequality i, 2ex (e* + l) lim > o, x e 0-1 0+1 -1. We consider a simple illustrative example: Let f(x)=x + -. X f has two asymptotes y = x and x = 0. Indeed, the onesided limits from the right and left at zero are clearly ±oo, while the limit f(x)/x = 1 + 1/a;2 is of course ±1 at the improper points. Finally the limit of fix) — x = 1/a; is zero at the improper points. By differentiating, f'(x) = l-x-2, f'{x) = 2x-\ The function f'(x) has two zero points ±1. At a; = 1, / has a local minimum. At a; = — 1, / has a local maximum. The second derivative has no zero points in all its domain (—oo, 0) U (0, oo), so / has no inflection points. 6.1.11. Differential of a function. In practical use of dif-r/JsLrf> ferential calculus, we often work with depen-'^fT\. dencies between several variables, say y and x. ssSljsi^? The choice of dependent and independent variable is not fixed. The explicit relation y = fix) with some function / is then only one of possible options. Differentiation then expresses that the immediate change of y = fix) is proportional to the immediate change of x with the proportion of f{x) = £ (a;). This relation is often written as dfx) = fix)dx, where we interpret df(x) as a linear map R —> R defined on increments of x at the given point, df(x)(v) = f'(x)-v, while dx(x)(v) = v. We talk about the differential of function f if the following approximation property is true: picture!! lim fx + v)-fx)-dfx)jv) = Q v—s-0 V Clearly, the differential is unique if it exists and, indeed, df(x) = f'(x)dx. Taylor theorem then implies that the differential df exists if the derivative /' is bounded. In particular, this happens at the point x if the first derivative fix) exists and is continuous at x. If the quantity x is expressed by another quantity t, e.g. x = git) and, moreover, g has continuous first derivatives 382 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS then implies that / is continuous and increasing on R. Hence the range is the interval (—1,1). □ _ 2 6.A.29. Find all intervals on which the function y = e x , x e R is concave. O 6.A.30. Consider function y = arctg^i, x^0(xeR). Determine intervals on which this function is convex and concave and also all its asymptotes. O 6A.31. Find all asymptotes of function (a) y = x ex; w y - (x-2Y with maximal domain. 6A.32. Find the asymptotes of function y = 2 arctg | | , x ±1 (x e 6A.33. Consider function y: In 36 +f+10 ex + l defined for all real x. Find its asymptotes. 6.A.34. Determine the course of the function o o o f(x) = i/\x\3 + l. Solution. The domain is the whole real axis, / has no discontinuities. For example it suffices to consider that the function y = ^/x is continuous at every point i£l (unlike even roots defined only on the nonnegative axis). We can also immediately see that f(x) > 1 and f(—x) = f(x) for all i£l, i.e. the function / is positive and even. Thus we can obtain the point [0,1] as the only intersections of the graph of / with the axes by substituting x = 0. The limit behavior of the function can be determined only at ±oo (there are no discontinuities), where we can easily compute (1) lim \J\ x |3 + 1 = lim {/\ x |3 = lim x | = +oo. x—>-±oo x—>-±oo x—>-±oo Now we'll step up to determing the course of a function by using its derivatives. For x > 0, we have f(x) = \/x3 + 1 = (a;3 + 1) = hence (2) /'(*) = (x3 + 1) 3a;2 (a;3 + if > 0, x > 0. again, the chain rule for differentiating the composite functions says that fog has the differential too and df dr df(t)=d(fog)(t) = £(x)-(t)dt. Therefore df can be seen as a linear approximation of the given quantity dependent on the increments of the dependent variable, no matter how this dependence is given. 6.1.12. The curvature of the graph of a function. We shall conclude this section with two straightforward applications of differentials. First we discuss curves in the plane and space, starting with the graphs of functions. Then we provide a brief introduction to the numerical procedures for differentiation (jump to 6.1.16 if getting tired with the curvatures). Imagine the graph of a function as a movement in the plane parametrized by the independent variable x. The vector (l,f'(x)) G R2 represents the velocity at x of such a movement. The tangent line through [x, f(x)] parametrized by this directional vector then represents a linear approximation of the curve. The goal is to discuss how "curved" is the graph at x. This is a straightforward exercise working with differentials in the setup of elementary plane geometry. It might need some effort to keep the overview. If f"(x) = 0 and simultaneously f"'(x) =^ 0, the graph of the function / intersects its tangent line. In such a case, the tangent line is the best approximation of the curve at the point x up to the second order as well. We describe this by saying that the graph of / has zero curvature at the point x. The nonzero values of the first derivative describe the speed of the growth. Intuitively we expect the second derivative to describe the acceleration, including how "curved" the graph is. As a matter of convention, we want the curvature to be positive if the graph of the function is above its tangent. The tangent at a fixed point P = [x, f(x)] is the limit of the secants. The lines passing through the points P and Q = [x + Ax, f(x + Ax)]. To approximate the second derivative, interpolate the points P and Q ^ P by the circle Cq, whose center is at the intersection of the perpendicular lines to the tangents at P and Q. 383 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS This implies that / is increasing on the interval (0, +00). With respect to its continuity at the origin, it must be increasing on [0, +00). Because it's an even function, we know that on the interval (—00,0] it must be decreasing. Thus it has only one local minimum at point x0 = 0, which is also a (strict) global minimum. Because a nonconstant continuous function maps an interval to an interval, the range of / is exactly [1, +00) (consider f(x0) = 1 and (1)). Notice that thanks to the even parity of the function, we didn't have to compute the derivative /' on the negative half-axis, which can though be easily determined by substituting x 3 = (—a;)3 = —a;3, yielding /'(a;) = ±(-a;3 + l)' 0, Ha:2) = _ x < 0. +1) When computing /'(0), we can proceed according to the definition or we can use the limits lim = 0 = lim--. x x^0+ ^/{x3 + l determine the one-sided derivatives and then /'(0) = 0. In fact, we didn't even have to compute the first derivative on the positive half-axis either. To obtain that / is increasing on (0, +00), we only needed to realize that both functions y = $x and y = a;3 + 1 are increasing on R and a composition of increasing functions is again an increasing function. For x > 0, we can easily compute the second derivative using (2) = 2x^(x3 + l)2-fx2^(x3+l)-1(3x2) i.e. after a simplification we have 2a; (3) f"(x) > 0, x > 0. ^3 + l)5 Similarly we can compute f"(x) 2x^1 (-X* + l)2 - \x2 ^(-xS + iy1 (-3a;2) \/(-z3 + l)4 2a; 0, for x > 0 and then /"(0) = 0. Next, we can use a limit transition: lim , 2x = 0 = lim--, 2x X^0+ ^/(x3+1)5 x^O- ^/(_x3 + 1)5 According to the inequality (3), / is strictly convex on the interval (0, +00). Also / must be strictly convex on (—00,0). It can be seen from the figure that if the angle between the tangent at the fixed point P and the x axis is a and the angle between the tangent at the chosen point Q and the x axis is a + Aa, then the angle of the latter perpendicular lines is Aa jST- should only as well pAa aPerhaPs make the measured arc Denote the radius of the circle by p. Then the length of oo if and only if both (proper) limits lim = a, lim (f(x) — ax) = b. x—Yoo x x—Yoo exist. Analogous statement holds for a; —> — oo. Hence the limits lim x—Yoo = lim - x—Yoo lim ■ x—Yoo = 1, lim (f(x) — 1 ■ x) = lim (va^+T — x) = x^yoo x^yoo lim ([&*+T-x] ip£ii^™\ = x^oo I J y(x3 + l)*+x\?x3 + l+x2 I lim x3+i-x3-- = Um i = o x^oo y(x3 + l)*+x\?x3 + l+x* x^yoo ax imply that the line y = x is an asymptote at +oo. If we again consider the fact that / is even, we'll immediately obtain the line y = —a; as an asymptote at — oo. □ 6.A.35. Determine the course of the function f(x) — cos x J V / cos 2x ' Solution. The domain consists of exactly those x G R, for which cos 2x ^ 0. The equality cos 2a; = 0 is satisified exactly for 2a; = § + k-ir, k G Z, tj. i = f + f, J; £ Z. Hence the domain is Clearly we have {f + !f; keZ}. cos( — 2x) /(*) for all x in the domain, thus / (with its domain symmetric with respect to the origin) is an even function, which was implied by the even parity of the function y = cos a;. Moreover, because cosine is periodic with a period of 2ir (i.e. y = cos 2a; has a period of it), it suffices to consider the function / for x G V := [0,tt] \ {f + !f; k G Z} = [0,f)u(f,f)u(3f,^], which implies (see the rule for differentiating inverse functions) dx = l + (tga)2 = l + (f)2 da J" f" ■ Now, we are almost finished, because the increment of the length of arc s dependent on x is given by the formula £ = o + «w. Thus, by the chain rule, _ ds_ _ ds_ dx _ (1 + U'??'2 ^ da dx da /" The result explains the relation between the curvature and the second derivative. The numerator of the fraction is always positive. It equals the third power of the length of the tangent vector of the given curve. The sign of the curvature is therefore given only by the sign of the second derivative, which confirms the ideas about concave and convex points of functions. If the second derivative is zero, the curvature 1/p is also zero. If /" is large, then the radius p is small, thus the curvature is large as well. The circle, by which curvature is defined is called the osculating circle. Compute the curvature of simple functions yourself and use osculating circles while sketching their graphs. The computation at the critical points of the function / is easiest. The radius of the osculating circle is the reciprocal value of the second derivative with the corresponding sign. 6.1.13. Vector differential calculus. As mentioned already in the introduction to chapter five, most considerations related to differentiation are based on the fact that the functions are defined on real numbers and that their values can be added and multiplied by real numbers. That is why functions / : R —> V need to have values in a vector space V. We call them vector functions of one real variable or more briefly vector functions. To end this section, we digress to consider functions with values in the plane or in space. Thus, / : R —> R2 and / : R —> R3. We consider (parametrized) curves in plane and space. We could work with values in R™ for any finite dimension n. For simplification, we work with the fixed standard bases e{ in R2 and R3. So curves are given by pairs or triples of real functions of one real variable, respectively. The vector function r in plane or space, respectively, is given by r(t) = x(t)e1 + y(t)e2, r(t) = x(t)e1 + y(t)e2 + z(t)e3. The derivative of such a vector function is a vector, which approximates the map r by a linear map of the real line to the plane or to the space. In the plane it is dr , dt (t) = r'(t)=x'(t)e1+y'(t)e2 and similarly in space. 385 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS since the course of the function on its whole domain can be derived using its even parity and periodicity with a period of 2tt. Hence we'll only be concerned with the discontinuities xi = 7r/4 and x2 = 3tt/4. We'll determine the corresponding one-sided limits lim lim +00, = +00, lim lim cosx x^3f + cos 2x cos 2x If we have a respect to the continuity of / on the interval (7r/4,37r/4), we can see that / attains all real values on this interval. Hence the range of / is the whole R. We also found out that the discontinuities are of the second kind, where at least one of the one-sided limits is improper (or doesn't exist). By that, we simultaneously proved that the lines x = tt/4 and x = 37r/4 are horizontal asymptotes. If we'd want to formulate the previous results without a restriction to the [0, it], we can say that at all points 7t i fc-rr 4 "T" 2 ' xk = t + k e Z / has a discontinuity of the second kind and every line x — 4" + ~2~' A; G *Z is a horizontal asymptote. Also the periodicity of / implies that no other asymptotes exist. In particular, it cannot have any inclined asymptotes, nor can the (improper) limits limx^+00 f(x), limx^_oo f(x) exist. Now we'll find the points of intersection with the axes. The point of intersection [0,1] with the y axis can be cound by computing /(0) = 1. When looking for the points of intersection with the x axis, we consider the equation cos x = 0, x e V with the only solution being x = tt/2. Then we can easily obtain the intervals [0, 7r/4), (tt/2, 37r/4), on which / is positive, and the intervals (7r/4, tt/2), (3tt/4, it], where it's negative. Now we'll step up to computing the derivative fix) — sin x cos 2x — 2 cos x {— sin 2x) cos2 2x „2 „ „;„2 — sin x (cos x — sin x) + 2 cos x (2 sin x cos; cos2 2x sin x + 3 cos x sin x cos2 2x (sin2 x + cos2 x + 2 cos2 x) sin x s2 2x (2 cos x + 1) sin a; cos2 2a; The differential of a vector function in this context is: (dx dy dz \ where the expression on the right hand side is understood as "selecting" an increment of the scalar independent variable t and mapping it linearly by multiplying the vector of the three derivative components. Thus the corresponding increment of the vector quantity r is obtained (of course, only two components in the plane). The notation r(t) is a convenient way to describe curves in space. For example r(t) = (a cosi, a siní, bt) orr(i) = acosíei + asinie2 + be3 for fixed constants a, b describes a circular helix. Here the parameter t is related to a suitable angle measured around the z-axis. The derivative of r(t) at t = to, determines the direction of the tangent line at r(t0). In Newtonian mechanics, the parameter t can stand for time, measured in suitable units. In this case the derivative of r(t) at time t = t0, gives the velocity vector at the same time. The second derivative then represents the acceleration vector at the same time. 6.1.14. Differentiating composite maps. In linear algebra jji,, and geometry there are very useful special maps called forms. They have one or more vectors as their arguments and they are linear in each of their argu-fS ments. In this way we defined the length of the vectors (the dot product is a symmetric bilinear form) or the volume of a parallelepiped (this is an n-linear antisymmetric form, where n is the dimension of the space), see for example the paragraphs 2.3.22 a 4.1..22. Of course, we insert vectors r(t) dependent on a parameter as the arguments of these operations. By a straightforward usage of the Leibniz rule for differentiation of a product of functions, the following is verified: Theorem. (l)Ifr(t) : R —> R™ is a differentiable vector and 9 : Rn —> Rm is a linear map, then the derivative of the map 9 or satisfies d(

R™ and a k-linearform & : R™ x ... x R™ on the space R™. Then the derivative of the composed map ri, ■,rk-i, -, x e V. drk^ dt ^v dt -1 • • -v->-■■>■*—> dt )■ (3) The previous statement remains valid even if & also has values in the vector space, and is linear in all its k arguments. Proof. (1) The linear maps are given by a constant matrix of scalars A = (a^) so that \p o r(t) = I ^aHrj(i),..., 'S^amiri(t) j. 386 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS The points at which f'(x) =0 are clearly the solutions of the equation sin x = 0, x £ V, i.e. the derivative is zero at points x3 = Q,x4 = it. The inequalities 2 cos2 x+1 > cos2 2x > 0, sinx > 0, xG79n(0,7r) imply that / is increasing at every inner point of the set V, thus / is increasing on every subinterval of V. The even parity of / then implies that it's decreasing at every point x £ (—7r, 0), x ^ —3ir/4, x ^ — 7r/4. Hence the function has strict local extremes exactly at the points Xk = ItTT, k £ Z. With respect to periodicity of /, we uniquely describe these extremes by stating that for x$ = xq = 0, we get a local minimum (recall the value of the function / (0) = 1) and for x4 = x1 = it, a local maximum with the value / (it) = —1. Let's compute the second derivative /.// / \ [4 cos x( — sin x) sin x-\- (2 cos2 x+l) cos x] cos2 2x t (x) = J-----v=-'-1- J \ / cos4 2x 4 cos 2x{— sin 2x) (2 cos2 x+l) sin x cos4 2x [10 sin2 x cos2 x+2 cos4 x+cos2 x+4 sin4 x+7 sin2 x] cos x cos3 2x ' x e V. Note that after a few simplifications, we can also express put \ (3+4 cos2 x sin2 x+8 sin2 x) cos x / (z) = ^-c^Hte-'-. x E D or /./// \ f 11— 4 cos4 x—4 cos2 x) cos x _ ,~ f (x) = 1-J-;;-'-, x £ V. J \ / cos3 2x ' Since 10 sin2 x cos2 a; + 2 cos4 a; + cos2 x + 4 sin4 a; + 7 sin2 a; > 0, x £ R, or 3 + 4 cos2 a; sin2 a; + 8 sin2 x = 11 — 4 cos4 a; — 4 cos2 a; > 3, a; £ R respectively, we have f"(x) = 0 for certain x e V if and only if cos x = 0. But that's satisfied only by x5 = tt/2 £ 2?. It's clear that /" changes its sign at this point, i.e. it's a point of inflection. No other points of inflection exist (the second derivative /" is continuous on V). Other changes of the sign of /" occur at zero points of the denominator, which we have already determined as discontinuities x\ = tt/4 and x2 = 3tt/4. Hence the sign changes exactly at points x\, x2, x5, thus the inequality Carry out the differentiation separately for individual coordinates of the result. However, the derivative acts linearly with respect to scalar linear combinations, see Theorem 5.3.4. That is why the derivative is obtained simply by evaluating the original linear map

0 pro x -> 0+ implies that / is convex on the interval [0,7r/4), concave on (7r/4,7r/2], convex on [tt/2, 3tt/4) and concave on (37r/4, it]. The convexity and concavity of / on other subintervals is given by its periodicity and a simple observation: if a function is even and convex on an interval (a, b), where 0 < a < b, then it's also convex on (—b, —a). All that's left is computing the derivative (to estimate the speed of the growrth of the function) at the point of inflection, yielding /' (tt/2) = 1. Based on all previous results, it's now easy to plot the graph of function /. □ 6.A.36. Determine the course of the function ln(x) and plot its graph. Solution, i) First we'll determine the domain of the function: R+\{1}. ii) We'll find the intervals of monotonicity of the function: first we'll find zero points of the derivative: ln(a;) - 1 In2(x) = 0 The root of this equation is e. Next we can see that f'(x) is negative on both intervals (0,1) and (1, e), hence j(x) is decreasing on both intervals (0,1) and (1, e). Additionally, j'(x) is positive on the interval (e, oo), thus f(x) is increasing here. That means the function / has the only extreme at point e, being the minimum, (we can also decide this using the sign of the second derivative of the function / at point e, because/(2)(e) > 0). iil) We'll find the points of inflection: /(2)(-) = ln(x) And(x) = 0 The root of this equation is e2, so it must be a point of inflection (it cannot be an extreme with regard to the ptrevious point). iv) The asymptotes. The line x = 1 is an asymptote of the function. Next, let's look for asymptotes with a finite slope k: k = lim r ln(x) x-i-oo ln(x) This corresponds to the idea that after the choice of a parametrization with a derivative of constant length, the second derivative in the direction of the movement vanishes. The second derivative lies in the plane orthogonal to the tangent vector. ^ re> i/( kmc. If the second derivative is nonzero, the normed vector 1 n(s) Tr"(s) 0. \\r"(s)\\ is the (principal) normal of the curve r(s). The scalar function k(s) satisfying (at the points where r"(s) ^ 0) r"(s) = k(s)tj(s) is called the curvature of the curve r(s). At the zero points of the second derivative k(s) is defined as 0. At the nonzero points of the curvature, the unit vector b(s) = r' (s) x n(s) is well defined and is called the binormal of the curve r(s). By direct computation 0 = ±(b(s),r'(s)) = (bf(s),r>(s)) + (b(s),r"(s)) = (b'(s,r'(s)) + K(S)(b(s),n(s)) = {b'(s), r'(s)), which shows that the derivative of the binormal is orthogonal to r'(s). b'(s) is also orthogonal to b(s) (for the same reason as with r' above). Therefore it is a multiple of the principal normal n(s). We write b'(s) = -t(s)ti(s). The scalar function r (s) is called the torsion of the curve r(s). In the case of plane curves, the definitions of binormal and torsion do not make sense. We have not yet computed the rate of change of the principal normal, which can be written as n(s) = b(s) x r'(s): n'(s) = b'(s) x r'(s) + n(s)b(s) x n(s) = —t(s)n(s) x r'(s) + k(s)(—r'(s)) = r(s)&(s) — «(s)r'(s). Successively, for all points with nonzero second derivative of the curve r(s) parametrized by the arc length, there is derived the important basis (r'(s),n(s), b(s)), called the Frenet frame in the classical literature. At the same time, this 388 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS If the asymptote exists, its slope must be 0. Let's continue the computation x-s-oo \n(x) 0 ■ x = lim \n(x) = oo, and because the limit isn't finite, an asymptote with a finite slope doesn't exist. The course of the function: 10 - y 5- 10 15 20 x -10 - □ Now move from determining the course of functions onto other subjects connected to derivatives of functions. First we'll demonstrate the concept of curvature and the osculating circle on an ellipse 6.A.37. Determine the curvature of the ellipse x2 +2y2 = 2 at its vertices (4.C.9). Also determine the equations of the circles of osculation at these vertices. Solution. Because the ellipse is already in the basic form at the given coordinates (there are no mixed or linear terms), the given basis is already a polar basis. Its axes are the coordinate axes x and y, its vertices are the points [\f2, 0], [— \/2, 0], [0,1] and [0, —1]. Let's first compute the curvature at vertice [0,1]. If we consider the coordinate y as a function of the coordinate x (determined uniquely in a neighbourhood of [0,1] ), then differentiating the equation of the ellipse with respect to the variable x yields 2x + 4yy' = 0, hence y' = — ^ (y' denotes the derivative of function y(x) with respect to the variable x; in fact it's nothing else than expressing the derivative of a function given implicitly, see ??). Differentiating this basis is used in order to express the derivatives of its components in the form of the Frenet-Serret formulas ^-(s) = K{s)n(s), ^-(s) = T(s)b(s) - K(s)r'(s) db ds (s) = -T(s)n(s). The following theorem tells how crucial the curvature and torsion are. Notice that if the curve r (s) lies in one plane, then the torsion is identically zero. In fact, the converse is true as well. We shall not provide the proofs here. Theorem. Two curves in a space parametrized by the length of their arc can be mapped to each other by an Euclidean transformation if and only if their curvature functions and torsion functions coincide except for a constant shift of the parameter. Moreover, for every choice of smooth functions k a t there exists a smooth curve with these parameters. By a straightforward computation we can check that the curvature of the graph of the function y = j(x) in plane and the curvature k of this curve defined in this paragraph coincide. Indeed, comparing the differentials of the length of the arc for the graph of a function (as a curve with coordinates x(t), y(t) = /(*(*))): dt=(l + (fx)2y/2dx, dx = (l + (fx)2)-y2dt (here we write /i = £) we obtain the following equality for the unit tangent vector of the graph of a curve r'(s) = (x'(s),y'(s)) = ((l+a)2)"1/2 /,(l + (/,)2)"1/2). A messy, but very similar computation for the second derivative and its length leads to «2 = lk"f = (0)2(i + (/,)2)-3 as expected. If we write r = (x,y), y' = fxx', x' = (1 + J2)"1/2, then 2^ )' ^fxfxx-^ (*£ ) fxfxx y" = fxx{x'f + fxX" = fxx{x')2 - fxxfx{x')\ Hence (*"? + (y"f = fUx'f (fx + (i + /2)2 + ft -2/2(l + /2)) = /L(i + /2)-4(/2 + i) = slx(} + slr3- 6.1.16. The numerical derivatives. In the begining of this textbook we discussed how to describe the values in a sequence if its immediate differences ftV are known, (c.f. paragraphs 1.1.5, 1.2.1). Be-'"^ov^^p^— fore proceeding the same way with the derivatives we clarify the connections between derivatives and differences. The key to this is the Taylor expansion with remainder. 389 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS equation with respect to x than yields y" = — 5 (^ — -pr) ■ At point [1,0], we obtain y' = 0 and y" = — \ (we'd receive the same results if we explicitly expressed y = | V2 — x2 from the equation of the ellipse and performed differentiation; the computation would be only a little more complicated, as the reader can surely verify). According to 6.1.12, the radius of the osculation circle will be (i + (yr (y" = -2, or 2, respectively, and the sign tells us the circle will be "below" the graph of the function. The ideas in 6.1.12 and 6.1.15 imply that its center will be in the direction opposite to the normal line of this curve, i.e. on the y axis (the function y as a function of variable x has a derivative at point [0,1], thus the tangent line to its graph at this point will be parallel to the x axis, and because the normal is perpendicular to the tangent, it must be the y axis at this point). The radius is 2, so the center will be at point [0,1 — 2] = [0, —1]. In total, the equation of the osculation circle of the ellipse x2 + 2y2 = 2 at point [0,1] will be x2 + (y + l)2 = 4. Analogously, we can determine the equation of the osculation circle at point [0, —1]: x2 +(y—1)2 = 4. The curvatures of the ellipse (as a curve) at these points then equal \ (the absolute value of the curvature of the graph of the function). For determining the osculation circle at point W/2, 0], we'll consider the equation of the ellipse as a formula for the variable x depending on the variable y, i.e. a; as a function of y (in a neighbourhood of point W/2,0], the variable y as a function of x isn't determined uniquely, so we cannot use the previous procedure - technically it would end up by diving by zero). Sequentially, we obtain: 2xx' + Ay = 0, thus x' = -2|, and x" = -2(± - ^). Hence at point W/2,0], we have x' = 0 and x" = — \/2 and the radius of the circle of osculation is p = — = ^ according to 6.1.12. The normal line is heading to —oo along the x axis at point W/2, 0], thus the center of the osculation circle will be on the x axis on the other side at distance hence at the point [y/2 - 0] = [^,0]. In total, the equation of the circle of osculation at vertice W/2,0] will be (x — -^2)2 + y2 = \. The curvature at both of these vertices equals \/2. Suppose that for some (sufficiently) differentiable function f(x) defined on the interval [a, 6], the values fi = f(xi) at the points x0 = a, x1, x2, ■ ■ ■, xn = b, are given while x{ — Xi-i = h for some constant h > 0 and all indices i = 1,..., n. Write the Taylor expansion of function / in the form f(xt ±h) = fi± hf(xt) + y/"(zi) ± y/(3)(^) + • • • Suppose the expansion is terminated at the term containing hk which is of order k in h. Then the actual error is bounded by hk+i (fc + l)!l; [)l on the interval [xi — h,Xi + h]. If the (k + l)th derivative / is continuous, it can be approximated by a constant. Then for small h, the error of the approximation by the Taylor polynomial of order k acts like hk+1 except for a constant multiple. Such an estimation is called an asymptotic estimation. Asymptotic estimates Definition. The expression G(h) is asymptotically equal to F(h) for /i -5- 0. Write G(h) = 0(F(h)), if the finite limit G(h) lim ttttt- = a e R h^o F(h) exists. Similarly, compare the expressions for h —> oo and use the same notation. Denote the values of the derivatives of j(x) at the points a) Xi as ft'. Write the Taylor expansion as: £11 £111 h±l=h±f[h+J-j-h2±^-hi + ... I D Considering combinations of the two expansions and /, itself, we can express the derivative as follows fi+i - fi-i = ,i h?_ A3) 2h h 3!h fi + l — fi _ rl . h rll . h ~ h + 2\Jt fi — fi — 1 _ rl h „ h ~ Jt 2\Jt This suggests a basic numerical approximation for derivatives: Central, forward, and backward differences The central difference is defined as f[ = ^'+12lf' 1, the/or-— ^'+1h~^', and the backward differ- ward difference is f[ zA h ■ £l fi-fi- ence is f[ — ——— If we use the Taylor expansions with remainder of the appropriate order, we obtain an expression of the error of the 390 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS approximation by the central difference in the form □ 6.A.38. Remark. The vertices of an ellipse (more generally the vertices of a closed smooth curve in plane) can be defined as the points at which the function of curvature has an extreme. The ellipse having four vertices isn't a coincidence. The so called "Four vertices theorem" states that a closed curve of the class C3 has at least four vertices. (A curve of the class C6 is locally given parametrically by points [f(t),g(t)] e R2, t e (a, b) c R, where / and g are functions of the class C3 (R).) Thus the curvature of the ellipse at its any point is between its curvatures at its vertices, i.e. between \ and \/2. B. Integration We start with an example testing the understanding the concept of Riemannian integration. 6.B.I. Let y = x | on the interval I = [—1,1] and let -1, ,0, ,1 be a partition of the interval I for arbitrary n e N. Determine Ssn , SUp and Ssn , m (the upper and lower Riemann sum corresponding to the given partition). Based on this result decide if the function y = x | on [—1,1] is integrable (in Riemann sense). O And now some easy examples that everyone should handle. 6.B.2. Using integration "by heart", express (a) J zTx dx, (b) I- dx, x e (-2,2); (c) / dx, (d) f^kdx,x^-l. Solution. We can easily obtain 3! ft2(/(3)(^+^)-/(3)(^-^)) Here, 0 < £, r\ < 1 are the values from the remainder expression of fi+i and fi-i, respectively. The error of the second derivative in the other two cases is obtained similarly. Thus, under the assumption of bounded derivatives of third or second order, the asymptotic estimates are computed:. Theorem. The asymptotic estimate of the error of the central difference is 0(h2'). The errors of the backward and forward differences are 0(h). Surprisingly, the central difference is one order better than the other two. But of course, the constants in the asymptotic estimates are important, too. In the case of the central difference, the bound on the third derivative appears, while in the two other cases second derivatives show up instead. We proceed the same way when approximating the second derivative. To compute f"(xi) from a suitable combination of the Taylor polynomials, we cancel both the first derivative and the value at x{. The simplest combination cancels all the odd derivatives as well: /z+1'2/t{z + /z-1=/i2) + g/(4)(^) + .-.. This is called the second order difference. Just as in the central first order difference, the asymptotic estimate of the error is f(2) = fi+l^±l_±+0(h2). Notice that the actual bound depends on the fourth derivative of/. 2. Integration 6.2.1. Indefinite integral. Now, we reverse the procedure of differention. We want to reconstruct the actual values of a function using its immedi-ate changes. If we consider the given function j(x) as the (say continuous) derivative of an unknown function F(x), then at the level of differentials we can write dF = f(x) dx. We call the function F the primitive function or the indefinite integral of the function /. Traditionally we write F(x) = J f(x)dx. Lemma. The primitive function F(x) to the function f(x) is determined uniquely on each interval [a, b] up to an additive constant. Proof. The statement follows immediately from Lagrange's mean value theorem, see 5.3.9. Indeed, if F'(x) = G'(x) = j(x) on the whole interval [a, b], then the derivative of the function (F — G) (x) vanishes at all points c of the interval [a, b]. The mean value theorem implies that for all points x in this interval, F(x) - G{x) = F(a) - G(a) + 0-(x-a). 391 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS ■ dx (a) J e x dx = — J —e x dx = — e x + C; (b) f ,} , dx = f . 2 dx = arcsin § + C; ^arctg^+C; (d) / ^3+3x+2 da; = ln I X'3 + 3x + 2 I + where we used the formula J 4^ dx = ln | f(x) | + C. □ 6.B.3. Compute the indefinite integral j(jx +4e* - + 9sin5a; + 2cos§ -for x 3, a; -| + fezr, fc G Z. Solution. Only by combining the earlier derived formulas, we obtain Thus the difference of the values of the functions F and G is constant on the interval [a,b]. □ The previous lemma supports another notation for the indefinite integral: + cos2 x 3—x dx J[7X + 4eT - ± + 9sin5a; + 2cosf cos2 x 3—x + T. dx F(x) = J f(x) dx + C with an unknown constant C. 6.2.2. Newton integral. We consider the value of a real function j(x) as an immediate increment of the region bounded by the graph of the function / and the x axis and try to find the area of this region between boundary values a a & of some interval. We relate this idea with the indefinite integral. Suppose we are given a real function / and its indefinite integral F(x), i.e. F'(x) = j(x) on the interval [a, b]. Divide the interval [a, b] into n parts by choosing the points ln7- + 6e31 + 2^Tn2 | cos 5x + 4 sin | — 3 tg x - X0 < X\ < <^ xn ln | 3 - x | + C. □ For expressing the following integrals, we'll use the method of integration by parts (see 6.2.3). 6.B.4. Compute J a; cos a; dx, x G Rand Jin a; da;, x > 0; Solution. u = lax u' = - X v' = 1 V = X Approximate the values of the derivatives at the points Xi by the forward differences. That is, by the expressions f(xi)=F'(xi) ■F(Xi) Xi-\-\ Xi Finally the sum over all the intervals of our partition yields the approximation of the area: ln x dx ■ i=0 = a;lna; — / 1 da; = a; ln a; — x + C. x cos x dx = u = x u1 = 1 v1 = cos a; v = sin a; sin x dx = x sin x + cos x + C. □ -£f(Xi)(Xi+1-Xi) ^E^^'H^+i-. i=0 n-1 ^(F(xi+1)-F(Xi)) = F(b) - F(a). Therefore we expect that for "nice enough" functions f(x), the area of the region bounded by the graph of the function and the x axis (including the signs) can be calculated as a difference of the values of the primitive function at the boundary points of the interval. This procedure is called the Newton integration? 6.B.5. Using integration by parts, compute (a) / (x2 + 1) e~x dx, x G R, (b) J(2x - l)lnx dx, x > 0, (c) Jarctga; dx, x G R, (d) / ex sm(fJx) dx, x,/3eR, Isaac Newton (1642-1726) was a phenomenal English physicist and mathematician. The principles of integration and differentiation were formulated independently by him and Gottfried Leibniz in the late 17th century. It took nearly another two centuries before Bernhard Riemann introduced the completely rigorous modern version of the integration process. 392 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Solution. First emphasise that by integration by parts, we can compute every integral in the form of J P(x) abx dx, J P(x) sin (bx) dx, J P(x) cos (bx) dx, fP(x)lognaxdx, J P(x) arcsin (bx) dx, J P(x) arctg (bx) dx, f xb log™ (kx) dx, J P(x) arccos (bx) dx, J P(x) arccotg (bx) dx, l (cx) dx, J i ! (cx) dx, where P is an arbitrary polynomial and a £ (0,1) U (1, +oo), 6,c£R\{0}, n £ N, k > 0. Thus we know that (a) F(x) G'(x) J (x2 + 1) e~x dx = F'(x) = 2x G(x) = -e" x2 + l = e~x (b) - (x2 + 1) &~x + J2xe~x dx = F(x) = 2x F'(x) = 2 G'(x)=e~x G(x) = -e~x x2 + 1) &~x - 2x &~x + J 2 &~x dx x2 + 1) &~x - 2x &~x - 2 &~x + C --t~x (x2 + 2x + 3) + C; J(2x — 1) lna; dx = F(x) = lna; G'(x) = 2x-\ F'(x) = 1/a G(x) = x2 - (x2 — x) In x — f dx = (x2 — x) lna; + f 1 — x dx = (x2 — x) lna;+a; —^- + C; (c) / arctg x dx F(x) = arctg a; G'(x) = 1 F'(x) = G(x) = x ; arctg x - J jf^; dx = x arctg x - |J dx = x arctg x — \ In (l + x2) + C; (d) J ex sin(ßx) dx = F(x) =ex I F'(x) =ex \_ G'(x) = sm.(ßx) J G(x) = -i cos ßx | ~ — jex cos(ßx) + i J ex cos(ßx) dx = F(x) = &x I F'(x) =ex \_ G'(x) = cos(ßx) I G(x) = i sin(ßx) | ~ — j&x cos(ßx) + j2-ex smQßx) — J ex sm(ßx) dx, which implies J ex sin a; dx = 11ß2 ex (sm(ßx) — ß cos(ßx)) + C. □ ' 1, > ^ >o, Ki. yCj, io Newton integral If F is the primitive function to the function / on the interval [a, b], then we write ^ f(x)dx=[F(x)]ba = F(b)-F(a) and call it the Newton (definite) integral with the bounds a and b. We prove later that for all continuous functions / £ C° (a, b) the Newton integral exists and computes the area as expected. This is one of the fascinating theorems in elementary calculus. Before going into this, we discuss how to compute these integrals. The primitive functions are well denned for complex functions /, where the real and the imaginary part of the indefinite integrals are real primitive functions to the real and the imaginary parts of /. Thus, with no loss of generality, we work only with real functions in sequel. 6.2.3. Integration "by heart". We show several procedures 1 for computing the Newton integral. We exploit the knowledge of differentiation, and look for primitive functions. The easiest case is the one where the given function is known as a derivative. To learn such cases, it suffices to read the tables for function derivatives in the menagerie the other way round. Hence: 393 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS For expressing the following integrals, it's convenient to use the substitution method (see 6.2.5). 6.B.6. Using a suitable substitution, determine Integration table (a) J ^2x - 5 dx, x > §; (b) dx,x>0; Solution. We have (a) (b) (c) (d) J \l2x — 5 dx = t = 2x-5 dt = 2dx = W Vidt = IfT +C = ^(2x-5)3 + C; t = 7 + In x dt = - dx X _ (7+ln x ' = ft7dt=^+C + C; In (1+sin x)2 dx — t = 1 + sin x dt = cos x dx _ r dt - J t2 1+sin x r cos x ^x . \/l+sin2 x t = sin x dt = cos x dx u = t + Vl + t2 > 0 du = 1 + dt /l+t2 1 -dt J ^du = \nu + C = t+Vl+t2 Vl+t2 In (t + vT+l2") + C = In (sinx + \J\ + sin2x ) + C. For arbitrary nonzero a, b e R and neZ,n/-l: a dx = ax + C axn dx = -^r[Xn+1 + C ■ e~~ +C — dx = a In x + C x acos(bx) dx = ^ sin(&a) + C asin(&a) dx = — ^ cos(bx) + C acos(bx) sinn(&a) dx = b,na+i) sm™+1(&a;) + C i(bx) cosn(bx) dx = - atg(&a) dx = — — ln(cos(&a)) + C ■ dx = arctg (-) + C (bx) + C a1 + x' dx = arccos (-) + C ■. dx = arcsin (-) + C. In all the above formulae, it is necessary to clarify the domain on which the indefinite integral is well defined. We leave this to the reader. Further rules can be added by observations of suitable structure of the given functions. For example, /(*) dx = ln|/(x)| +C □ for all continuously differentiable functions / on intervals where they are nonzero. Of course, the rules for differentiating a sum of differentiable functions and constant multiples of differentiable functions yield analogous rules for the indefinite integral. So the sum of two indefinite integral is the indefinite integral of the sum of the integrated functions, up to the freedom in the chosen constant, etc. 6.B.7. Determine the integrals a) / J sin b) Jx2^/2x + Ida. 6.2.4. Integration by parts. The Leibniz rule for deriva- can be interpreted in the realm of the primitive ■■■'fMkh^t-. functions. This observation leads to the follow- ing very useful practical procedure. It also has theoretical consequences. 394 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Solution. For computing the first integral, we'll choose the substitution t = tg x, which can be often used with an advantage. da; J(x) — cos2 (a;) substitution t = tgx dt = 1^ dx = (l + tg2(x)) dx = (l + t2) Ax sin2(x) - _^!M \x) = l+tg2(x) t2 — dt = 1 2 l+tg2(x) 1 f 1 i+t2 l i+t2 t - 1 Integration by parts The formula for computing the integral on the left hand side is called integration by parts. The above formula is useful if we can compute G and at the same time compute the integral on the right hand side. The principle is best shown on an example. Compute I = a; sin a; dx. t + 1 1 In this case the choice F(x) = x, G'(x) = sin a; will help. Then G(x) = — cos x and therefore tg+1 Now we'll compute the second integral: x2y/2x + Ida; u = x2 u = 2x v' = y/2x + 1 v = \{2x + \) I = x{— cos x) — J — cos x dx = —x cos a; + sin a; + C. Some integrals can be dealt with by inserting the factor 1, so that G'(x) = 1: = -x2(2x + l)f - - f a;2v/2a7TTda; - -(2a; + 1)1 + C, 3 3 J 9 which can be thought of as an equation, when the variable is the integral. By putting it on one side, a;V2a; + Ida; In x dx = / 1 ■ In x dx = a;lna;— / — x dx = x lax — x + C. x x2(2x + l)i 2 ~ 7 v! = 1 xV2x+T U = x v1 = y/2x + 1 v = ±^2x + 1 6.2.5. Integration by substitution. Another useful procedure is derived from the chain rule for differentiating composite functions. If F'(y) = f(y), y = p(x), where p is a differentiable function with nonzero derivative, then dF(p(x)) 1 32/1 If 3 -a;2(2a; + l)2 - - ( -aV2a; + 1 - - (2x + l)ä da; dx F'(y) ■ p'(x) x2(2x + l)i 2 a;v'2a; + l + -^-(2a; + l)t 21 2 105 1 2 and thus F(y) + C = J f(y) dy can be computed as F( 0. Thus we should consider the values C\ and C2. For the sake of simplicity though, we'll use the notation without indices and stating the corresponding intervals. Furthermore, we'll help ourselves by letting aC = C for a e R \ {0} and C + b = C for b e R, based on the fact that {€■ C G R} = {aC; C G R} = {C + b; C G R} = R. We could then obtain an entirely correct expression for example by substitutions C = aC, C = C + b. These simplifications will prove their usefulness when computing more complicated problems, because they make the procedures and the simplifications more lucid. Case (b). Sequential simplifications of the integrated function lead to ■ dx = / ^¥ dx-Jldx = tgx-x + C, where we helped ourselves by the knowledge of the derivative (tgaO'^d^ x=£% + kir,keZ. Case (c). It suffices to realize that this is a special case of the formula jj$dx = ]n\f(x)\+C, which can be verified directly by differentiation (In I f(x) \+C)' = (In [±f(x)])' + (€)' = %^ = ±/(*) _ /M ±m f(x) ■ Hence /T^^ = ta(l + sina;) + C. Case (d). Because the integral of a sum is the sum of integrals (if the seperate integrals are sensible) and a nonzero constant can be factored out of the integral at any time, we have J 6 sin 5x + cos § + 2 dx = -| cos5x + 2 sin § + 3eT + C. 6.B.9. Determine (a) / JoS^ dx, x ^ z + kir, k e Z; (b) / x2 e~3x dx, (c) J cos2 a; dx, x G R. Solution. Case (a). Using integration by parts, we obtain As an illustration, we verify the last but one integral in the list in 6.2.3 using this method. To compute 1 I = vT : dx. Choose the substitution x = sin t. Then dx = cos tdt. So I = 1 VT - sin2 í = / dt = t + C. cos tdt = 1 Vcos2 Í cos í di By substitution f = arcsin a; into the result, I = arcsin x+C. While substituting, the actual existence of the inverse function to y = ip(x) is required. To evaluate a definite Newton integral, it is needed to correctly recalculate the bounds of integration. Problems with the domains of the inverse functions can sometimes be avoided by dividing the integration into several intervals. We return to this point later. 6.2.6. Integration by reduction to recurences. Often the jjfi 1, use of substitutions and integrating by parts leads to recurent relations, from which desired integrals can be evaluated. We illustrate by an example. Integrat- ing by parts, to evaluate cosm x dx ' x cos x dx = cos™1 a; sin a; — (m — 1) J cos™1 x(— sin x) sin x dx = cos^xsinx + {m-l)Jcos--2xsin2xdx. Using the formula sin2 x = 1 — cos2 x, mlm = cosm-1 a;sina; + (m — l)Jm_2 The initial values are To = x, 1\ = sin a;. Integrals in which the integrated function depends on expressions of the form (a;2 + 1) can be reduced to these types of integrals using the substitution x = tgf. For example, to compute _ f dx k J {x2 + lf the latter substitution yields (notice that dx = cos-21 dt) Jk — dt s2t( ^44 + l) \ COS^ t J = I cos2k~2 t dt □ For k = 2, the result is J2 = -(costsmt + t) = - ^^—^ + t After the reverse substitution t = arctg x 1 / x j2 2 V 1 + x- + arctg x ) + C. When evaluating definite integrals, we can compute the whole recurrence after evaluating with the given bounds. For 396 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS ■ dx / = x G'(x^ - 1 COS^ X sin x F'(x) = 1 G(x) =tgx = X tg x — cos x I + C. J tg x dx = x tg x + J -^f dx = x tg x + In Case (b). This time we are clearly integrating a product of two functions. By applying the method of integration by parts, we reduce the integral to another integral in a way that we differentiate one function and integrate the second. We can integrate both of them (we can differentiate all elementary functions). Thus we must decide which of the two variants of the method we'll use (whether we'll integrate the function y = x2, or y = e~3x). Notice that we can use integration bz parts repeatedlz and that the n-th derivative of a polynomial of degree n e N is a constant polynomial. That gives us a way to compute I- ' dx F(x) G'(x) x2 e~3x + § /: _ ~ —3x F'(x) = 2x G(x) = -ie- -3x -3x dx and furthermore x e _ I, 3 - ~3x dx — F(x) = . G'{x) = -3x F'(x) = 1 G(x)- 1 : dx -\x& 3x -3x 3 " 1 ~—3x 9 e + c. In total, we have x e ~3x dx = -|s'e' -3* C2, -3x ~3x - § xe 3x__2_ „- 27 C 3x + c = [x2 + 1X + D+C. Note that a repeated use of integration by parts within the scope of computing one integral is common (just like when computeing limits by the l'Hospital rule). Case (c). Again we apply integration by parts using J cos2 x dx = J cos x ■ cos xdx = F(x) = cos x G' (x) = cos x F' (x) = — sin a; G(x) = sin a; cos x ■ sin x + J sin2 xdx = cos x ■ sin x + J 1 — cos2 xdx = cosx ■ sinx + f 1 dx — f cos2 xdx = cos a; ■ sin a; + a; — j cos2 a; da;. Although the return to the given integral might make the reader cast some doubts on it, the equality j cos2 xdx = cos x ■ sin x + x — j cos2 x dx implies 2 j cos2 xdx = cos a; ■ sin x + x + C, i.e. 1 (1) cos2 xdx = - (x + sin x ■ cos a;) + C. It suffices to remember that we put C/2 = C and that the indefinite integral (as an infinite set) can be represented by one specific function and its translations. example while integrating over the interval [0, 2ir], the integrals have these values: Jo = / dx = [x\l* = 2tt Jo r2ir I\ = I cos xdx = [sinx]2^ = 0 0 2tt 0 for even m Im-2 Thus for even m = 2n, the result is Im = I cosm x dx Jo 2tt 2t? 7 cos x dx - (2n-l)(2n-3)...3-l 2tt. Jo 2n(2n - 2) ... 2 For odd m it is zero (as could be guessed from the graph of the function cos x). 6.2.7. Integration of rational functions. The next goal is the integration of the quotients of two polynomials j(x)/g(x). There are several simplifications to start with. If the degree of the polynomial / in the numerator is greater or equal to the degree of the polynomial g in the denominator, carry out the division with remainder (see the paragraph 5.1.2). This reduces the integration to a sum of two integrals. The division provides f = q-g + h, f - = q+-. 9 9 Thus, / f(x)/g(x) dx = f qdx + f h(x)/g(x) dx where the first integral is easy and the second one is again an expression of the type h(x)/g(x), but with degree of g(x) strictly larger than the degree of h(x) (such functions are called proper rational functions). Thus we can assume that the degree of g is strictly larger than the degree of /. We introduce the procedure to integrate proper rational functions by a simple example. Observe that we can integrate (a + x)~n,n > 1, and -dx = In I a + x I + C. a + x Summing such simple fractions yields more complicated ones: -2 6 4x + 2 + x + 1 x + 2 a;2 + 3a; + 2 which can be integrated directly: 4a;+ 2 a;2 + 3a; + 2 - dx ■ -21n Ire + II + 6 In la; + 2 +C. This suggests looking for a procedure to express proper rational functions as a sum of simple ones. In the example, it is straightforward to compute the unknown coefficients A and B, once the roots of the denominator are known: 4a;+ 2 _ 4a;+ 2 _ A B x2 + 3x + 2 ~ (x + l)(x + 2) ~ x + 1 + x + 2' 397 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS We emphasise that usually suitable simplifications or substitutions lead to the result faster than integration by parts. For example, by using the identity cos2 x = i (1 + cos 2x) , i£l we easily obtain J cos2 x dx = J \dx + j'\cos2xdx = l + + C = - + 2 1 J 2 ^ 2 1 4 2sinlcosI+C= \{x + smx-cosx)+C. □ 6.B.10. Integrate (a) J cos5 x ■ sin a; dx, (b) f cos5 x ■ sin2 x dx, (c) /Ä^^e(-f,f); (d) i-^+v^ ^ a. > o. V xb-\-x Solution. Case (a). This is a simple problem for the so called first substitution method, whose essence is writing the integral in the form of (1) f(p(x)) 0. Solution. This problem can illustrate the possibilities of combining the substitution method and integration by parts (in the sscope of one problem. First we'll use the substitution y = y/x to get rid of the root from the argument of the exponential function. That leads to the integral J e^ dx V2 = x ■2/ye»dy. 2y dy = dx Now by using integration by parts, we'll compute fye.y dy = F(y) = y G'(y) = F'(y) G(y) = = 1 y&y - $ &y dy = y&y - &y + c. Thus in total, we have Je^dx = 2y&y - 2 & + C = 2 e^ (y/x - 1) + C. PIEMAN NUV \NT£GKA'l % % %H RlEMANN INTEGRAL4 Definition. The Riemann integral of the function / on the interval [a, b] exists, if for every sequence of partitions with representatives (Zk)kLo wim norms of the partitions Sk approaching zero, the limit lim Ssk = S A:—>-oo exists and its value does not depend on the choice of the sequence of partitions and their representatives. Then we write S = / f(x) dx. This definition does not look very practical, but nonetheless it allows us to formulate and prove several simple properties of the Riemann integral: Theorem. (1) Suppose f is a bounded real function defined on the interval [a, b], and c G [a, b] is an inner point of this interval. Then the integral f(x) dx exists if and only if both of the integrals f(x) dx and J"6 f(x) dx exist. In that case fb re rb f(x)dx= I f(x)dx+ I f(x)dx. J a J c (2) Suppose f and g are two real functions defined on the interval [a, b], and that both of the integrals J f(x) dx and g(x) dx exist. Then the integral of their sum also exists and b rb rb (f(x) + g{x)) dx= f(x) dx+ g{x) dx. J a J a (3) Suppose f is a real function defined on the interval [a, b], C £ R is a constant, and the integral J f(x) dx exists. Then the integral C ■ f(x) dx also exists and ) rb C ■ f(x)dx = C ■ / f(x)dx. 4Bernhard Riemann (1826-1866) was an extremely influential German mathematician with many contributions to infinitesimal analysis, differential geometry, and in particular complex analysis and analytic number theory. 400 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6.B.14. Prove that 1 . „ — sin x ■ 2 113 -- cos(2a;) H--cos(4a;) H--. 4 v ' 16 V ' 16 Solution. Easier than to compare the given expressions directly is to show that the functions on the right and left hand side have the same derivatives. We have LI = 2 cos x sin3 x = sin(2a;) sin2 x, P' = \ sin(2a;) + \ sin(4a;) = sin 2x(\ + \ cos(2a;)) = sin(2a;) sin2 x. Hence the left and the right hand side differ by a constant. This constant can be determined by comparing the values at one point, for example 0. Both functions are zero at zero, thus they are equal. □ Integration of rational functions. The key to integration of rational functions lies in decomposition of a rational function as a sum of a simple rational functions, which we know, how to integrate. Let us decompose some rational functions: 6.B.15. Carry out the suggested division of polynomials 2xb-x4+3x2-x+1 x2-2x+4 for x e 6.B.16. Express the function y: 3xl+2x3-x2 + l 3x+2 as a sum of a polynomial and a rational function. 6.B.17. Decompose the rational expression ( \ 4x2 + l3x-2 . W x3+3x2-4x-12> (b) 2x°+5xJ-x^+2x-l xe+2x4+x2 into partial fractions. 6.B.18. Express the function ,, _ 2x3+6x2+3j:-6 y ~ x*-2x3 in the form of partial fractions. 6.B.19. Decompose the expression 7x2-10x+37 Proof. (1) First suppose that the integral over the whole □ interval exists. When computing it, we can limit ourselves to limits of the Riemann sums whose partitions have the point c among their partitioning points. Each such sum can be obtained as a sum of two partial Riemann sums. If these two partial sums would depend on the chosen partitions and representatives in the limit, then the total sums could not be independent on the choices in limit. (It suffices to keep the sequence of partitions of the subinterval the same, and change the other so that the limit would change). Conversely, if both Riemann integrals on both subinter-vals exists, they can be approximated with arbitrary precision by the Riemann sums, and moreover independently on their choice. If a partitioning point c is added to any sequence of Riemann sums over the whole interval [a, b], the value of the whole sum is changed. Also the values of the partial sums over the intervals belonging to [a, c] and [c, b] change at most by a multiple of the norm of the partition and possible differences of the bounded function / on all of [a, b]. This is a number arbitrarily close to zero for a decreasing norm of the partition. Necessarily the partial Riemann sums of the function over the two parts of the interval also converge to the limits, whose sum is the Riemann integral over [a, b]. (2) In every Riemann sum, the sum of the functions manifests as the sum of the values in the chosen representatives. Because multiplication of real numbers is distributive, each Riemann sum becomes the sum of the two Riemann sums with the same representatives for the two functions. The statement follows from the elementary properties of limits. (3) Each of the Riemann sums is multiplied by the constant C. So the claim follows from the elementary properties of limits. □ 6.2.9. The fundamental theorem. The following result is crucial for understanding the relation between the integral and the derivative. The complete proof of this theorem is somewhat longer, so it is broken into several subsections. O Fundamental theorem of integral calculus o o x3-3x2+9x+13 into partial fractions. 6.B.20. Express the rational function _ -5x+2 y ~ xi-x3+2x2 in the form of a sum of partial fractions. 6.B.21. Decompose the function y= x3(x+i) o o Theorem. For every continuous function f on a finite interval [a, b] there exists its Riemann integral Ja f(x)dx. Moreover, the function F(x) given on the interval [a,b] by the Riemann integral F(x) = / f(t)dt J a is a primitive function to f on this interval. 6.2.10. Upper and lower Riemann integral. In the first Ji„ step for proving the existence of the integral, we use nrj? an alternative definition, in which the choice of rep- r\£jnf' resentatives and the corresponding value /(&) is re-W-1 placed by the suprema Mi of the values f(x) in the corresponding subintervals x{], or by the infima m, of 401 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS into partial fractions. o 6.B.22. Determine the form of the decomposition of the rational function _ 2x2-U4 y ~ (x-2)x2 (3x2+x+4)2 into partial fractions. Don't compute the undetermined coefficients! O the function f(x) in the same subintervals, respectively. We speak of upper and lower Riemann sums, respectively (in literature, this process is also called the Darboux integral). Because the function is continuous, it is bounded on a closed interval, hence all the above considered suprema and infima exist and are finite. Then the upper Riemann sum corresponding to the partition E = (x0,..., xn) is given by the expression 6.B.23. Express the function _ xA+&x2+x-2 y ~ x4-2x3 as a sum of a polynomial and a proper rational function Q. Then express the obtained function Q in the form of a sum of partial fractions. O 6.B.24. Write the primitive function to the rational function (a) y ■ (b) y- x^2; >-2); x ^ 2. S~,sup = SUp f(Q)(Xi - Xi_l) n The lewer Riemann sum is ",inf = E( illf f(0)(Xi - Xi~l) i=i x*-i<£xeR- Solution. Cases (a), (b). We have f-^dx J x — 2 y = x - 2 dy = dx 6 In I a; /§ dy = 61n|y| + C: v 2\ + C and similarly / (x (x+iy ■ dx ■ dy = dx 3 (x+4y = J$dy = ■ + C. 6 -2y2 + C = We can see that integrating the partial fractions which correspond to real roots of a denominator of rational function is very easy. Moreover, without loss of generality we can obtain a f^dx J X — XQ y = x - x0 dy = dx villi I a; = Jfdy = A]n\y\ + C: x0\ + C and / lx {x—xo dx = y = x - x0 dy = dx + C = - + c -n+l (l-n)(x-x0) for all A,x0eR,n>2,neN. Case (c). Now we are to integrate a partial fraction corresponding to a pair of complex conjugate roots. Thus in the For each partition E = (x0,..., xn; £1;..., £„) with representatives, there are the inequalities (1) S-- < Ss,s < S. Moreover, the infima and suprema can be approximated with arbitrary precision by the actual values of terms in the sequences. Thus, we might suspect that the Riemann integral exists if and only if for all sequences of partitions with norms approaching zero, the limits of both the upper and lower sums will exists and they will be equal. This is indeed true for all bounded functions: Theorem. Let the function f be bounded on a closed interval [a,b]. Then ^sup mf iS^gup, Sinf = SUp S = are the limits of all sequences of upper and lower sums with norm approaching zero, respectively. The Riemann integral of the function f exists if and only if 5'sup — 'S'inf- Proof. First, notice that S%VlV is well denned since it is the infimum of a set of real values bounded from below by any of the S^inf. Similarly for the value S[nf, which is bounded from above by any of Ss,SuP ■ Refine a partition E1 to E2 by adding new points. Then ,sup ^ '^'^2 >sup: SSl.inf < Se ,inf • By the definition of the infimum, there are sequences of partitions Ek for which S'sup is the limit of the sums Ssk,suP-Moreover, every two partitions have a common refinement. Thus it may be assumed that Ek in the sequence is always a partition obtained by refining the previous one. Hence the sums Ssk,sup form a non-increasing sequence of real numbers converging to S'sup- 402 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS denominator there is a polynomial of degree 2 and in the numerator at most 1. If it's of degree 1, we'll write the partial fraction so that we'll have a multiple of the derivative of the denominator in the numerator and add to it the fraction, in whose numerator there is only a constant. This way we'll obtain r 3x+7 j _ 3 r 2x-4 j i -i q f_dx J x2-4x+15 aX — 2 J x2-4x+15 UX + 10 J x fin (a2-4a + 15) + 13 J(x_2).+11 4x + 15 dx ■ In (a2 - Ax + 15) + if J ■ dx x-2 dy=dx § In (x2 4a; + 15) + |ln(a2-4a + 15) + ;ifr/^ arctg y+C = § In (a2 - 4a + 15) + ^f= arctg ^=+C. Again, we can generally express Ax+b ■ dx I T^Sífc dx+(B + Ax0) J dx {x—xo) +i and compute j (x 2(x—xq) (x—xo) +a2 dx ■ y = (x-x0) +a2 dy = 2 (a — xq) dx V In I y I + C = ln[(a — a0) i {x-xiy+a2dx = ji i ^-sfy + a2] + C, _ X-Xg dz=^ a J z2 + l \ arctg ž + C = i arctg ^ + C, i.e. r Ai ■J (x—Xi Ax+b dx oV+a- + ^ASo x^o + c^ |ln((a-a0)2 + a2) where the values A, B, x0 e K, a > 0 are arbitrary. Case (d). All that is left are the partial fractions for multiple complex roots in the form of [(g-ffi+aT. A,B,xoeR,a>0,neN\{l}, which can be analogically simplified to 2{x~Xo)— + (B + Ax0) 2 ' l(x-Xo)2+a2]n Then we'll determine 2{x-xa)-dx = -xay+a< 1 (l-n)yT- - + c = y=(x dy = 2 (a — xq) dx _l_ (l-n)[(x-x0)2+a2} l(x-x0)2+a2]n ■ = f ĚU. = a0)2 + a2 — + C and ifn (a0, a):=Jj^ -x0y+a2 F(X) - [{x-Xa)2+a2 G'(a) = 1 - dx = -2n(x—xo) [(x-xa)2+a2Y G(x) = x — xq [(x-x0)2+a2Y + 2nJ {x—x0) +a [(x-x0)2+a2Y — dx = [(x-x0)2+a2]' [{x-x~)2°+a2^ + (Kn (a0, a) - a2 Kn+1 (a0, a)) A similar argument applies to ■ Hence the values Siní = SUp SS,ini Ssnp mf ^^sup : are good candidates for the limits of upper and lower sums. Next, consider a fixed partition E with n inner partitioning points of the interval [a, b], and another partition E\, whose norm is a small number S. In the common refinement E2, there will be only n intervals contributing to the sum S= by eventually smaller contribution than in the case of Ei. Now, / is a bounded function on [a, 6] and thus each of these contributions will be bounded by a universal constant multiplied by the norm S of the partition. Hence when choosing S sufficiently small, the distance of Ss± iSup from Ssup will not be larger than twice the distance of Ss,SnP from Ssup. Finally, return to the sequence of partitions E^ as chosen above, and choose an e > 0. Then there is some mei such that the distance of S~kiSup from Ssup is less than e for all k > m. Hence for arbitrary partition E with appropriately small norm S > 0 the distance of Ss,SnP from Ssup does not exceed 2e. In summary, for arbitrary 2e > 0, there is S > 0 such that for all partitions with norm at most S the inequal- ity |53,6 Ssi < 2e holds. This is exactly the statement that the number Ssup is the limit of all sequences of upper sums with norms of the partition approaching zero. The statement for lower sums is proved in exactly the same way. It remains to deal with the existence of the Riemann integral f /(a) dx. If Ssup = Sint, then all Riemann sums of sequences of the partitions have the same limit because of the inequalities (1). If the Riemann integral does not exist, then there exist two sequences of partitions E^ and E^ and their representatives with different limits of Riemann sums. Suppose the first limit is larger then the other one. Then the upper Riemann sums can be selected for the first sequence and the lower Riemann sums for the second sequence. Their difference will then be at least as large. In particular, in view of the previous part of the proof, this implies Ssup > inf ■ □ 6.2.11. Uniform continuity. Until now, we have only used the continuity of the function / to know that all such functions are bounded on a closed finite I? interval. It remains to show that for continuous functions 5*,, ^sup ^inf ■ From the definition of continuity, for every fixed point a G [a,b] and every neighbourhood Oe(/(a)) there exists a neighbourhood Os(x) such that f(Os(x)) C C£(/(a)). This statement can be rewritten in this way: for y,z e Os(x), i.e. \y-z\< 2á, it is true that/(y),/(2) e Oe(f(x)), i.e. \f(y)-f(z)\<2e. A global variant of such a property is needed; it is called the uniform continuity of a function /: 403 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS which implies Kn+i (x0,a) = which clearly also holds for n = 1. The last recurrent formula can be extended with the integral (derived in case (c)) Kx (x0,a) = iarctg^ + C. In the given problem we have f 30x-77_ . _ J (x2-6x+13)2 ~~ 15 / (x2-6x+13)2 dx + 13 J" (x2-6x+13)2 (x2-6x+13) and furthermore ■ dx S (x (x2-6x+13) 2 dx = -i + C y y = x2 — 6x + 13 dy = (2a; — 6) da; — J V2 ~ ' x2-6x+13 ■dx = J dx [(x_3)2+22]^ J (x2-6x+13) J_ 12=1 Ki (S 21 4- i_£=3_ 22 ^ 2 AHa!ZJ + 2 (x-3)2+22 x—3 2-6x+13 iarctg^+ 1 x—3 8 x2-6x+13 In total, we have f 30x-77 J 02-6x+13) ■ dx = — 15 x2-6x+13 13 X —3 I — 13 o*v>trr ■c~3 I 8 x2-6x+13 + ° - 16 a*0* 2 + + i| arctg ^v3 + 16 6 2 13x-159 , (~t 02-6x+13) ^ °" □ 6.B.26. Integrate the rational functions (c) / (x-a)(x-2)(x2+2x+2) dx, x =jL2, x =jL 4; (d) / X4_x?_x+1 da;, a; ^ 1; (e) / (x2+L++i3)2 dx'x G K; © J* x4-12x3+62x2-156x+169 ^ 1 G ^" Solution. We'll compute all the given integrals in the way we can always use when integrating rational functions. We won't use any specific simplification or substitution. Even the recurrent formula for Kn+1 (x0, a), which we derived in a general form, will be used only for x0 = 0, a = 1 (and also when n = 0). Using the aforementioned procedures, we obtain (a) (b) f -ptjW da; = 2 f + f r-^T + 2 f ^ j X(X —j X — 1 j (x— l)z J (x — 1 = 2 In | a: — 1 | x-l 0-1) In I a; I + C; Uniform continuity Definition. Let / be a function on a closed finite interval [a, b}. / is uniformly continuous on [a, b], if for every e > 0 there exists 5 > 0 such that for all z,y e [a, b] satisfying \y — z\ < S, the inequality \f(y) — f(z) | < e holds. 2-C Theorem. Each continuous function on a finite closed interval [a, b] is uniformly continuous. Proof. Fixing some e > 0, the definition of continuity of / provides for each x G [a, b] the values S(x), such that f(y) G Oe(f(x)) for all y G 02s(x)(x). Since every finite closed interval is compact, it is covered by finitely many of such neighbourhoods Gs(x)(x), determined by points x1,..., Xk. Choose S as the minimum of all the (finitely many) S(xi). Choose any two points y,z G [a, b] with \y — z\ < S, they both belong to one of 025(xI)(a;i)- Thus \ f(y)—f(z)\ < \f(y) - f(xd \ + \f(xd - f(z)\ < 2e and / has the desired property. □ 6.2.12. Finishing the proof of Theorem 6.2.9. Now we complete the proof of the existence of the Rie-mann integral. Choose e and S as in the definition of the uniform continuity of /. Consider any partition E with n intervals and norm at most S. Then, writing J, = [xi_1, x{], = E(sup/(C)- inf mXxi-Xi-i) i=1 ieJ, «€J4 < e- (b- a). For decreasing norm of the partition, the upper and lower sums are arbitrarily close to each other. In particular the upper Riemann integral and the lower Riemann integral coincide. To complete the proof of the fundamental theorem of integral calculus, it is still needed to verify the statement about the existence of a primitive function. For a continuous function / on interval [a, b] there exists the Riemann integral F(x) = f* j(t) dt for every x G [a, b]. As in the statement 404 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 5x2+6x+3 1_ r 10x+6 j„_ 23 f 10 J 5x2+6x+3 5 J dx ■ A6ln(5x2 + 6x + 3)^fJ-J-dx_ ± In (5a;2 + 6x + 3)-fJ-^ _ 5x+3 -2- d ± In (5a;2 + 6x + 3) di : : da; iln (5a;2 + 6a;+ 3) - ^ (^) +i 23\/6 f dt 30 J t2 + l i In (5a;2 + 6a; + 3) - 30 23\/6 30 arctg t + C = arctg ^2 + c; (c) / (x- dx _ J_ r _dx___l_ r dx , 52 J x-4 20 J x-2 "T" (x-4)(x-2)(x2+2x+2) 52 J x-4 20 J x-2 _J_ r 4x+U j™ _ J_ ln I T _ A__L ln I T _ 9 I + 130 J x2+2x+2 «a — 52 in | a <± 2Q in | a z | -|- 2x+2 j„ i 7 C dx 130 ( 2 / x2+2~L+2 + 7 f x- 2+2x+2 J- ln I (*~4)5 I + 260 111 | (x-2)13 + 130 ' 7 f dx 130 J (x+l)2 + l +2x+2 i = X + 1 di = da; --In 260 --In 260 (x-4)5(x2+2x+2)4 (x-2)" (x-4)5(x2+2x+2)4 (x-2)13 i__7_ r dt ~l~ 130 J t2 + l + 1 260 In (x-4)5(x2+2x+2)4 , , ^-(^_2)i3-L + 14 arctg (a; + 1) 130 J t2 + l ^arctgi + C = 1 +C; (d) f x j _ 1 r dx _ 1 f dx J x4-x3-x+l U"L ~ 3 J (x-1)2 3 J x2+x+l dx _ 3(x-i; 1 _ 1 r_d 3(^-1) 3 J (x+i)2 + f ~ j _ 2x+l ± r. 9 J + 1 3(x-l 1 2 f dt J t2 + l 3%/^S • 3(x-l) (e) 3(x-l) 3v/3 ^arctg^M + C; dt = -^dx 2 arctg f + C: f _2x+J_ . _ r _2x+4_ . _ J (x2+4x+13)2 aJ- — J (x2+4x+13)2 aX 3/(x /f-3/ dx (x2+4x+13)2 dx [(x+2)2+9]^ f = x2 + Ax + 13 dt = (2x + 4) da; -I-If t 27 J dx x + 2 ' + 1 du l ' x2+4x+13 9/(11 _ x+2 ,3 = 3 da; 1 Marctgu+!_^_)+C.: du (u2+l) x2+4x+13 _1 9 1 1+2 x2+4x+13 18 -ill arctg 2±2 18 x+8 6 x2+4x+13 +1 + C; + c = (f) / x4- 5/x 5x2-12 12x3+62x dx 2-6x+13 da; 5x2-12 156x+169 30x-77 (x2-6x+13)2 ; / (x2-6x+13)2 dx = 5 f dx J (x da; (x-3)2+4 + /(*) - f(t) dt - /(a;) x-\-Ax — f(t) dt - f(x) f(t) dt - /(a;) < e. about uniform continuity, there is 8 > 0, dependent on a fixed small e > 0, such that +Ar)-/(aO| 2+2x+2) — f _dx__|_ f _dx _ f _x+S_ I _ J_ l I _ J 25(x-l) ^ J 5(x-l)2 J 25(x2+2x+2) ~~ 25 111 I ^ 1\- s£=T) - 5^ln (22 + 2a: + 2) - ^ arctg (x + 1) + C, where we used f x+8 j _ r j(2x+2) _7_ , _ J x2+2x+2 U"L ~ J x2+2x+2 ^ x2+2x+2 — 1 r 2x+2 i , 2 J x2+2x+2 U"L ^ 7 / fa+Aa+i dx = 5ln(a;2 + 2a; + 2)+7arctg(a; + l) + C,. □ 6.B.28. Determine (a) J ^t^-xXT1 dx,xeR; (b) JJ^dx,x^ ±1. Solution. Case (a). First we must do the division of polynomials (a;3 + 2x2 + x - 1) : (x2 - x + 1) = x + 3 + , to consider a proper rational function (with the degree of the numerator lower then the degree of the denominator). Now we'll compute f *3+.?*a+.-1 dx = fx + 3dx+ [4^dx = J xz — x+1 J J xz — x+1 dx_ _ Si i Sr 4- 2 f ac-l j _ 5 r_ 2 + 02 + 2 J x2-x+l aJ- 2 J ^_ i)2+|/^2 ~~ ^ + 3a; + | In (x2 - x + l) - ^ arctg ^ + C. Case (b). We have |ln|a; + l|-±arctga; + ^/^^Ida;- %/5 f 2x+%/2 l r__ _ V2 r 2x+v2 j 8 J 16 J x2 + \/2x+l -r=a; + ±m|a; — II — ± In I a; + 1 dx I/" i arctg a; + In (a;2 - V2x + l) - ^ arctg (v^a; - l) -& In (a;2 + V^a; + l) - ^ arctg (V2x + l)+C. and add a few observations concerning analytic solutions. Rewrite the equation in terms of the differentials, cf. 6.1.11, 1 dy = f(x) dx. Find the primitive functions on both sides to determine the unknown function y = y(x) implicitly. Indeed, if G(y) and F(x) are the primitive functions with G'(y) = and F'(x) = f(x), and y(x) satisfies G(y(x)) = F(x), then differentiating both sides with respect to x yields y'(x) Q = G'(y(x))y'(x)-F'(x) fix) 9(y(x)) as expected. Of course, it is necessary to be careful with the values y for which g(y) = 0, which need to be discussed separately. For example, the equation y' = y leads to the implicit definition In \y \ = x + C, which for positive y provides y(x) = D ex with positive constant D, the constant solution y = 0 corresponds to D = 0. Negative values of y correspond to negative constants D in the same expression. If y(0) = 1, we recover the exponential y(x) = ex. 6.2.15. Analytic solutions. In the next part of this chapter, we shall prove that the power series are differentiated and integrated term by term, thus the solution y(x) to the equation y' = j(x) with a known analytic function j(x) = n=0 n\ • y (x) = J2^nXn+1+y(0), 77 = 0 where y(0) is the free integration constant. The solution is defined on the coverengce domain of the power series. Of course we might use series centered in other points x0 if prescribing the initial value y(x0). (We shall prove much later, that actually there is always the unique solution with the given initial prescribed value y(0) in Chapter 8.) The latter equation y' = y had the analytic solution ex, too. Let us consider the general case of this type, i.e. equations of the form (1) y' = f(y) with an analytic right-hand side f(y). Given the initial condition y(x0) = y0, straightforward differentiation with the help of the chain rule and the equation (1) shows y'{x0) = f(y0) y"{x0) = f{y)y'\x=Xo = f{y)f{y)\x=Xo = /'(yo)/(yo) y"'(x0) = (f"(y)y'f(y) + f(y)f(y)y')lx=xo = /"(yo)(/(yo))2 + (/'(yo))2/(yo) 407 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS □ 6.B.29. Compute f 2x4+2x2-5x+l i i n Solution. We have f 2x4+2x2-5x+l j _ J x(x2-x+l)2 UX ~ lnlrl+± f 2x~1 dr I 7 f_^__i I f_2x^1_ j 11 r dx 2 J (x2-x4 E + l) i = a2 a; + 1 dt = (2x - 1) dx ln|a;|+1;ln(a;2-a; + l) + |/ dx I- i 1 r dt + 2J t2 [(-i)2+l] 14 f dx___1__88 r 3 J /2x-l\2 , , 2t 9 J In | x \/x2 — x + 1 | + dx _ +1 du = -^dx l In | a Va;2 - a; + 1 1+1^3/-^ +i A/3 44y^S 9 2(x2-x+l ^arctgM-2(x2_x+1 i ^pr = In | x Va;2 - x + 1 | + In | a; Va;2 - a; + 1 | + ^ arctg 22y/3 t 2x-l _ l 9 arCl8 ^3 2(x2-x+l) V3 2x-l 22\/3 9 (^)2 + l 2x-l 1 llx-4 + C = ln|a;Va;2-a; + l|-f arctg ^-±-gz=± + C. □ 6.B.30. Integrate (a) / jfp: dx, (b) f , 3 iln,x2-t- dx,x>0,x^ e. v y J x In3 x+x In2 x —2x ' ' ' Solution. Case(a). The advantage of the method of integrating rational functions described above is its universality (using it, we can find primitive functions of every rational function). Sometimes though, using the substitution method or integrating by parts is more convenient. For example, / 1+x4 dx — V = x dy = 2x dx = 1 dy _ 1 2(l+y2) ~ 2 It +y2 \ arctg y + C = \ arctg a;2 + C. Case (b). Using substitution, we obtain an integral of a rational function f 5 In x_ _ r 5 1nx J x In3 x+x In2 x —2x J In3 x+ln2 x -r ■ - dx = -2 x 5y y = In a; dy=±dx ~ J v3+y'2-2 dy = (y+l)2+l -y+2 y2+2y+2 2 dy = In | y Two crucial observations are due here. First, giving the initial condition y(x0) = y0, all derivatives y(k\x0) are given at this point by the equation. Thus, if an analytic solution exists, we know it explicitly. So we have to focus on the convergence of the known formal expression of the series v(x) — J2n°=o 'k\y<'n\xo)(x — x0)n and we arrive at the theorem below. In its proof, the second observation will be most helpful: the expressions for the derivatives y^ are universal polynomials Pn y^\x) = Pn(f(x)Jl(x),dots,fn-1\x)) in the derivatives of the function /, all with non-negative coefficients and independent of the particular equation.5 Cauchy-Kovalevskaya Theorem in dimension one Theorem. Assume f(y) is a real analytic function convergent on the interval (x0—a, x0+a) C K and consider the differential equation (1) with the condition y(x0) = y0. Then the formal power series y(x) = Yl^Lo n\V^n\xo){x~x0)n converges on a neighborhood of x0 and provides the solution to (I) satisfying the initial condition. Proof. The second observation above suggests how to prove the convergence of the "candidate series" similarly as we proved the convergence of power series in general, i.e. by finding another converging series whose partial sums will bound our's from above. This was the original Cauchy's approach to this theorem and we talk about the method ofmajo-rants. Without loss of generality we shall fix x0 = 0 and y(0) = 0 (we may always use shifted quantities z = y — y0 and t = x—x0 to transform the general case). Assume we can find another analytic function g(x) = J2n°=o ~h.bnxn with all bn = g^> > 0, i.e. g has got all derivatives non-negative at the origin, such that g^ (0) > |/(") (0) | for all n. Now, replace / in the equation (1) by g and write formal power series z(x) = J2n°=o ^T1" f°r tne potential solution of this equation as above. In particular, we deduce (recall the universal polynomials Pn have got non-negative coefficients) z{n)(0) = Pn (g(z(o)),..., ^""^(o))) >P„(|/(y(0))|,...,|/("-1)(y(0))|) > |yW(0)| and, consequently, convergence of z(x) will imply absolute convergence of y(x), i.e. the claim of the Theorem. We try to find a majorant in the form of a geometric series. Let us pick r > 0, smaller than the radius of convergence of /. Then obviously, there is a constant C > 0 such that the derivatives an = /^"^(0) satisfy \-^anrn\ < C for all n, i.e. \an\ < C2^ (the series would certainly not converge Although we shall not need the explicit formulae for these polynomials, they are well known under the name Fad di Bruno's formula. In principle, they are direct generalization of the Leibniz rule to higher order derivatives. 408 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 1 | - \ In (y2 + 2y + 2) + 3arctg (y + 1) + C = In | In a; -1 | - 5 In (In2 x + 2 In x + 2) + 3arctg (In a; + 1) + C. □ For an arbitrary function / that is continuous and bounded on a bounded interval (a, 6), the so called Newton-Leibniz formula b (1) / j(x)dx=[F(x)]bn:= lim F(x)- lim F(x) J x—yb— x^ta+ a holds, where F'(x) = f(x),x e (a,b). Emphasise that under the given conditions, the primitive function F always exists and so do both proper limits in (1). Hence to compute the definite integral, we only need to find the antiderivative and determine the respective one-sided limits (eventually only values of the function, if the primitive function is continuous at the boundary points of the interval). 6.B.31. Determine (a) fvx^? *>0; w Sign**,**-1?. (c) f A dx,xeR v 7 J x V x — 1 ' -1.1] (d) / (x+4)v^+3x_4 dx, x e (-oo, -4) U (1, +oo); (e) /1+vJa+g+2rfs,se(-l,2); (f) f 7—t \ /a, da;, x =£1. Solution. In this problem, we'll illustrate the use of the substitution method while integrating expressions containing roots. Case (a). If the integral is in the form of / / ( Fiy/x, v{2l/x, • • •, v{il/x) dx for certain numbers p(l),p(2),... ,p(j) e N and a rational function / (of more variables), the substitution tn = x is suggested, where n is the (least) common multiple of numbers p(l),..., p(f). Using this substitution, we can always reduce the integrand (the integrated function) to a rational function, which we can always integrate. We'll get dx r7 J J t10 dx J x(^+jx^ 10/ dt t10 = x, 1 10f9 dt ■ 10/■ dt - Ix = t - dx t6+t5 J- + t4 + t5 10 [Inf + 1 - ^ + 3F - 4F-ln(l+f)] +C = ln- + 10 5 , 10 ^ + 3 ^ (1+ ^)10 Case (b). For integrals Jf(x, FlVa~x~+b, r{^/aTTb, 2\/x- + C. Fljl/ax + b) dx, otherwise). We may recognize the derivatives of a geometric series and write (2) *0 = c£jr = c r" r — z with derivatives g^ = C^. Finally, we have to prove that the solution of the equation z' = g(z) is analytic. We can easily integrate this equation with separated variables directly. Written with the help of differentials, (r — z)dz = Cr dx. Thus, the implicit equation reads \ {r — z)2 = —Crx + D, where the constant D is determined by z(0) = 0. Consequently D = \r2 and a simple computation reveals the solution of the implicit equation {x)=r[\±X\- 2Cx The option with the minus sign satisfies our initial condition. This clearly is an analytic function, g provides the requested majorant, and the proof is finished. □ 6.2.16. Improper integrals. When discussing the integra-r4$>g?, tion of rational functions /, there is a need !MI2w> t0 consider definite integrals over intervals, ^52g2T? where f(x) has improper (one-sided) limits. Here / is neither continuous nor bounded. Thus earlier definitions and results may not apply. We speak of "improper" integrals. A simple solution is to discuss the definite integral on a smaller sub-interval, and determine whether the limit value of such a definite integral exists when the boundary approaches the problematic point. If it does, the corresponding improper integral exists and equals this limit. We illustrate this procedure by an example: f2 dx Jo (2-2)1/4- This is an improper integral, because the integrand j(x) = (2 — a;)-1/4 = * has its left-sided limit oo at the point V \2—x) b = 2. The integrand is continuous at all other points. Thus, for 0 < S < 2, consider the integrals (substituting y = 2 — x) <-2~s dx h = 0T- 2 V -1/4 dy h3/i 3L J3/41 Notice that dy = —dx. When x = 2 — S,y = S. When 2 = 0,y = 2. The limit when 5^0 from the right clearly exists, so the improper integral is evaluated. I- ^ dx ^2 x 3 We proceed in the same way to integrate over an unbounded interval. In this case, we speak of improper Riemann 409 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS where again p{\),... ,p(j) G N, / is a rational expression and a, b e R, we choose the substitution tn = ax + b while preserving the meaning of n. In this way, we'll get x+l dx dx = 3x4-1 = t2dt I t- + t2 5 + 1 t2dt = + C = I j£=l±Atdt=lJ^ + 2tdt=l £(£ + i)+c = Z^(*p + i) + c = {/(3a; + l)2 ^±2 + C. Case(c). Another generalizations are the integrals of the type 'fix, p4)/«£±|, p(2)/axib pU)lax±b\ dx J \ y cx+a' V cx+a' ' y cx+a I ' with the only additional condition on the values a, b, c, d e K being ad — be ^ 0. Preserving the meaning of the aforementioned symbols, we now put t71 = . Specifically, da; : cx+d' 2 _ x+l x-1 „ _ t2+l J — t2-l 4t ~(t2 -1 dt r t*-l -At1 j, _ r -At' j, _ J t2+l (t2-l)2 ut — J (t2+l)(t2-l) ut — In 1t + 1 | - In 1t lnk/^4 + 1 -In V x— The simplifications i t-i 2 t2 + l dt = - 2 arctg t + C = l|-2arctg Jf±f + C. In In In V\ X + l l+y/l X-1 VI X + l — VI x—1 (VI l + VI x-1 \Y in -^-n—m—i-in = I x+l I — I x —1 I 21n(y|a; + l| + ^\x - 1 |j -ln2 for x G (—oo, —1) U (1, oo) then allow to write flf^±Jdx = J x y x —1 21n(^v/|a; + l| + Vl^ - 1 l) - 2 arctg y//f±i + C. Cases (d), (e), (f). Now we'll focus on the integrals J f (x, Vax2 + bx + c) dx, where we expect a/0 and &2 — 4ac 7^ 0 for otherwise arbitrary numbers a,b,c e K. Recall that / is a rational expression. We'll distinguish two cases, when the quadratic polynomial ax2 + bx + c has real roots and when it doesn't. If a > 0 and the polynomial ax2 + bx + c has real roots xi, x2, we'll use the representation Vax2 + bx + c = Vo \J{x x\ |2 x-x2 x—xi la x — x\ X — X2 x—xi integrals of the first kind. The integrals of unbounded functions on finite intervals are improper Riemann integrals of the second kind. More explicitly, for a e K /*oo pb 1=1 f(x)dx= lim / /(x) dx, J a b^°°Ja if the integrals and limit on the right hand side exist. Similarly we can have a finite upper bound an infinite lower bound. If both a and b are infinite, we can evaluate the integral as a sum of two integrals with a chosen fixed bound in the middle as in /(x) dx + f(x) dx /(x) dx Its existence and its value do not depend on the choice of such bound, because by changing it, we only change both sum-mands by the same finite value, but with opposite sign. At the same time a limit for which the upper and lower bound would approach ±00 at the same speed can lead to different results! For example r '1 2' — X 0 a I x dx = = 0, l-a z — a even though the values of the integrals J x dx with a fixed and b —> 00 diverge to infinity. The integrated functions may have more discontinuities with infinite one-sided limits. The interval of integration may be unbounded. Then the integration intervals must be split in such a way that the individual intervals of integration include only one of the above phenomena. Hence when evaluating the improper integral of a rational function, divide the given interval according to the discontinuities of the integrated function. Then compute all the improper integrals separately. 6.2.17. New acquisitions to the ZOO. It might seem that in-^jfi 1, definite integrals can be described in terms of known elementary functions. But this is false. On the contrary, nearly all continuous functions lead to integrals which we cannot express in this way. Functions obtained by integration often appear in applications. Many of them have names and there are efficient methods how to approximate them numerically (we shall come to this point briefly in 6.3.11 below). In the methods of signal processing, the function . sin(x) smc(x) = - x is important (cf. the discussion of Fourier transform in 7.2.6). Check yourself that it is a smooth function with limit values /(0) = 1, /'(0) = 0, /"(o) = -|. This even function has its absolute maximum at the point x = 0. It oscillates with a fast decreasing amplitude as x approaches infinity. 410 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS and letí2 = fzf^-. If a < 0 and the polynomial ax2 + bx + c has real roots x1 < x2, we'll use the representation Vax2 + bx + c= v^a \J {x - ^i)2 = The sine integral function is defined by andleti2 = fr^f- Ifthepolynomialaa;2+&a;+cdoesn'thave real roots (necessarily for a > 0), we choose the substitution Vax2 + bx + c = ±^/a ■ x ± t with any choice of the signs. Note that we of course choose the signs so that we get as easy expression to integrate as possible. In all these cases, these substitutions again lead to rational functions. Hence (d) Si(a;) = / sinc(i) dt. Jo Other important functions are Fresnel's sine and cosine integrals FresnelS(a;) = / sin( ^Trt2)dt Jo FresnelC(a;) = / cos( ^Tit2)dt. Jo The function Si(x) is shown in the left figure. Both Fresnel's functions are shown on the right. dx (x+4)Vx2+3x-4 dx = f dx (x+4) x+4 I v lot J TZšZWZ V 1-t2 ) I 1- (x+4)^>(x-l)(x+4) 2 _ x-1 x+4 X = T^72 - 4 ť 5 1-t2 _ dx — 7^1 .o\2 dt (1-t2)2 dt ■ §sgn(l-í2)/ldf = fsgn ^ f + C 5 \ x+4 4- f +4 3 4 5 (e) dx [ dx _ [ J l + s/-x2+x+2 J l+^/-(x-2)(x+l I An even more important way how to get new functions is to add some free parameter in the integral. One of the most important mathematical functions ever is the Gamma function. It is defined for all positive real numbers z by dx l+(x+l). X dx x+1 = F+i ~~ 1 - "6t dt (t2+i) -6t ť + 1 2y/5 I (t2 + l)(t2+3t+l VE 2 4 (t2 + 1)2 t2+3t + 1 64 dt = %/5 5 2t+3+v/5 *2 + l 5 _2t-3+v/5 In I 2^ + 3 + ^ -2arctgf + 2^ In I -2f - 3 + V5 I + C = r(z) = / e-ltz-ldt. Jo It can be proved that this function is analytic at all points 0 < z e K. For small z e N, we can evaluate: T(l) = / e-lťdt = [-e"1]^ = 1 r(2) = / e^df = [-e-*i]g°+ / e-ťdí = l (f) 2%/5 5 2^ In 5 dx In 2 4/|=f+3 + ^ -2arctgJ|=f + r(3) = / e_t f2 df = 0 + 2 / e_ť f df = 0 + 2 = 2. _ y ,— 1 Jo Jo In I -2 ^/2=* _ 3 + V5 J + C = f+3-s/5 SarctgW-^ + C; (x-l)\/x2+x+l \/ X2 + X + 1 = X + Í x2 + x + 1 = x2 + 2xt + t2 + 1 t2+2t-2 2t-l -2(t^ - t + 1) (2t-l)2 tJ+2t-2 i 2-7TTdí = J 2í-l 2í-l ť2+2ť-2 dt = 10 Jo Integration by parts reveals immediately r{z + \) = zr{z). Hence for all positive integers n this function yields the value of the factorial: r(n) = (n-1)!. The following figure shows the behaviour of the function f(x)=]n(r(x + l)). 411 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 3 VŠ i„ I t+1-%/3 3 111 4+1+v^ ___Vš i V 3 t+1-VŠ 3 t+1+VŠ \/3 i dt = 3 ln|í + l- v/3|-Ýlnlí + 1 + V/3|+C = +C In- y'xS+x+l-x+l-y'S \/x2+x+l-x+l+\/3 ■+c. □ 6.B.32. Using a suitable substitution, compute -y/5-l U s/5-1 +oo f-dx ote, x G j x+\/x2+x-l ' Solution. Even though the quadratic polynomial under the root has real roots x\, x2, we won't solve this problem by substitution t2 = X~X2. We could proceed that way, but we'll x—xi r J' rather use a method we introduced for the complex roots case. That's because this method yields a very simple integral of a rational function, as can be seen from the calculation dx x-\-\j x2 -\-x — \ \/x2 + x — 1 = x + t x2 + x - 1 = x2 + 2xt + t2 T - *!+! ^ 1-24 r -242+24+2 i. J (t+2)(l-2t) (1-24) 2 4+2 1 1 24-i í-21n|í + 2|-iln|í-i|+C = Vx2 + x - 1 - a; - 2 In (Va;2 + x-l - x + 2) - In I Va;2 + x - 1 - a; - 5 I + C. Note that each recommended substitution (see the above problems) can be in most specific problems usually replaced by another substitution, which allows to obtain the result in a much easier way. An undeniable advantage of the recommended substitutions is their universality though: by using them, one can compute all integrals of the respective type. □ 6.B.33. For x > 0 determine (2+5x)' \fxr^ dx; (a) / ml^dx; (0 J^dx. Solution. All three given integrals are binomial, i.e. they can be written as J xm(a + bxn)p dx for some a,fi£l, m,n,p G Q. The binomial integrals are usually solved by applying the substitution method. If p G Z (not necessarily p < 0), we choose the subtitution x = ts, where s is the common denominator of numbers m a n; if m^1- G Z and p <£ Z, we choose a + bxn = ts, where s is the denominator of number p; and if m^k+p £Z(p^Z, 2i±± ^ Z), we choose a + fa;" = isi", If we draw the function x In x — x instead, there is not much difference to be seen. Hence it seems that the factorial n! grows similarly to en This is the famous Stirling's approximation formula. More precisely, one can verify W1"^ e~n R2, 413 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS hold, the substitution t = tg f is used. We'll show it on the given integrals. Case (a). In the denominator, we have 1 + 4 cos2 x + 3 sin2 x = 4 + cos2 x and in the numerator only the sine function to an odd power, i.e. the substitution t = cos x, where dt = — sin x dx, allows to replace all the sines and cosines and thus obtain r sin3 x d f s^n xcos2 x^ d ■J 1+4 cos2 x+3 sin2 x j 4+cos2 x J zg=£l dt = J (1 - j+V) dt = t - I arctg I + C : 4+ť f arctg COS X j f~1 Case (b). Because both the sine and cosine appear here to an even power, the substitution t = tg x leads to i i+t2' dx — i-)-^2 dt, by which we obtain / l+sln2 x = / i ,1+t2„ dt = f T^A2 dt = l+t2 ^ arctg (s/2t) + C = ^ arctg (v^tgx) + C. Case (c). Now we'll use the universal substitution t tg |, where sin a; : 2t l+t2' 1 Then we can determine dx l-f l+t2 ' dx — ^_ 2 dfj« J 2 —COS X J n ^ 1+3Í2 arctg 3í) + C = 2# i arctg V3tgf + C. □ Definite integrals. 6.B.35. Compute the definite integrals ~3 T Jtg2xdx, J^dx. f 0 Solution. For x ^ | + kir, where fc e Z, we have J tg2 a; da; = tg x — x + C, as we have compute earlier. This implies that 71-/3 / tt/6 / tg2a;da;=[tga;-a;]^ = y3_f - - f) _2__7t Of course, definite integrals can be also computed directly. For example, the substitution y = tg x yields F(t) = [f(t),g(t)] is given. Look at it as a trajectory of a movement. Assume that f(t) and g(t) have piece-wise continuous derivatives. By differentiating the map F(t) we obtain vectors corresponding to the speed of the movement along this trajectory. Hence the total length of the curve (i.e. distance traveled over time between the values t = a,t = b)'v& given by the integral over the interval [a, b], with the integrated function h(t) being the length of the vectors F'(t). Therefore the length s is given by the formula s= f\(t)dt = t' x/(f'(t))2 + (g'(t))2dt. The result can be seen intuitively as a corollary of Pythagoras' theorem: the linear increment As of the length of the curve corresponding to the increment At of variable t is given by the proportion in the orthogonal triangle and thus at the level of differentials d8 = y/^W+IFW dt. In the special case when the curve is the graph of a function y = j(x) between points a < b, we obtain rb _ S= / y/l + (f'(x))2dx and at the level of differentials, ds = s/l + (y'(x))2dx, just as expected. As an example, we calculate the circumference of the unit circle as twice the integral of the function y = Vl — x2 over [—1,1]. We know that the result is 27r, because it is defined in this way. s = 2 = 2 yj\ + (y')2 dx = 2 1 + 1 - a;2 dx 1 i VT ar : dx = 2[arcsina;]]_1 = 27T. If we instead use y = \J r2 — x2 = r\J\ — (x/r)2 and bounds [—r, r] in the previous calculation, by substituting x = rt we obtain the circumference of the circle with radius s(r) 1 + 2r[arcsina;l (x/r) (x/r)2 - 2nr. dx i vT-ť : dt The result is of course well known from elementary geometry. Nevertheless, by using integral calculus, we derive the important fact that the length of a circle is linearly dependent on its diameter 2r. The number it is exactly the ratio, appearing in this dependency. 414 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS tt/3 tt/3 f tg2 X dx = f dx = jo J COSz X 7r/6 7r/6 y = tgx;dy = -M- l+tg2 x 1+y2 ■ I 1/V3 iV3 1+y' dy ■ I l-T±pdy=[y- arctgy]^ = ^ - f. When doing the substitution, we only need to not forget to change the limits of the integral to values gained by substituting = tg (tt/3), 1/V3 = tg (tt/6). We'll compute the second integral by integration by parts for the definite integral. (Note that we also found the primitive function to function y = x cos-2 x earlier.) We have ir/4 f —t— dx = J cosA x 0 tt/4 F(x) = x G'(x) = -4j- V / cos^ [x\gx\l/A - J tgxdx = [xtgx]g/4 + J cosx 0 F'(x) = 1 G(x) =tgx ir/4 ^dx = [x tg x]l/A + [In (cos x)]*/A -z4-}„V2- z=2jr2 4 + ln ■ 2 □ 6.B.36. Compute the definite integrals (a) Jo 77!=? dx> Jo vT^x (b) /12t^t^; W Jo + JO \ e +3 cos2 x Solution. We have (a) l dx; y = 1 — x2 dy = —2x dx I- ■dy ■ (b) dx = X + \Jx2 — 1 dz ■ da; lnz]2+v^ = ln (2 + V3); 2+Vä l (c) /( n v e2x_|_3 cos: + p = e dp = ex da; ^— ) da; = f da; + f —^— da; = iA x I J e +3 j cos^ x 7 0 0 e Jp^3^+ [tga;]o = 3i(^dp+tgl e/%/3 7s ds = —3dp e/%/3 f / ^ds + tgl = f [arctgsf/^ + tgl 1/V3 +f (arctg^-f) +tgl; 6.2.21. Areas and volumes. The Riemann integral can be used to compute areas or volumes of shapes defined by a graph of a function. As an example, calculate the area of a circle with radius r. The quarter-circle bounded by the function vV2 — a;2 for 0 < x < r determines one quarter the area. Use the substitution x = r sin t, dx = r cos t dt (using the corollary for h in the paragraph 6.2.6) to obtain by symmetry a(r) yr = 4rz = 2r2 tt/2 c2 da; = 4r ■tdt = -Ar2 tt/2 ■tdt tt/2 2 • 2 cos t + sm tdt tt/2 dt = Trr It is worth noticing that this well known formula is derived from the principles of integral calculus. The area of a circle is not only proportional to the square of the radius, but this proportion is again given by the constant it. Notice the ratio of the area to the perimeter of a circle. 7rr2 1 27rr ~ 2r' The square with the same area has the side of length y^r and therefore its perimeter is 4y/7rr. Hence the perimeter of a square with the area of the unit circle is 4y/7r, compared to the perimeter 27i of the unit circle, which is about 0.8 less. It can be shown that in fact the circle is the shape with the smallest perimeter among all with the same area. We derive such results in comments about the calculus of variations in chapter 9. Another analogy of this approach is the computation of the volume or the surface area of a solid of revolution. Such a set in R3 is defined by plotting the graph of a function y = j(x) (for a; in an interval [a, b]) in the plane xy and rotating this plane around the x axis. This is exactly what happens when producing pottery on a jigger - the hands shape the clay in the form of y = j(x). When computing the area of the surface, an increment Ax causes the area to increase by the multiple of the length As of the curve given by the graph of the function y = j(x) and the size of the circle with radius f(x). Hence the surface area A(f) is computed by the formula add appropriate picture here! A(f) = 2tt f(x)ds = 27r f{x)sj\ + {f'{x))2dx where the differential ds is given by the increment on the length of curve y = f(x), see above. If instead we determine the solid of revolution by its bound parametrized in the xy plain by a pair of functions [x(t), y(t)], then the corresponding differential of the length s has the form ds = yj{x'{t))2 + (y'(t))2dt. Thus we obtain A = 2n y(t)^(y'(t))2 + (x'(ty)2dt. 415 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS □ 6.B.37. Prove that 20 0 Solution. Because the geometric meaning of the definite integral implies 4 = I zk dx < I 7T+=X dx < I x9 dx = To □ 6.B.38. Without symbols of differentiation and integration, express (7i5ln(i + l) dtj , xe (-1,1), if the differentiation is done with respect to x. Solution. Integration is often thought of as the inverse operation to differentiation. In this problem, we'll use this "inverse-ness". The function F(x) := ]t5m(t + 1) dt, x G (—1,1) o is clearly the antiderivative of function f(x) := x5 In (x + 1) on interval (—1,1), i.e. by differentiating it, we'll get exactly /. Hence f0 \' fx \' jt5ln(t + l) dt) = - lft5ln(t + l) dt) = —a;5 In (x + 1) . □ Improper integrals. 6.B.39. Decide if + 00 f ^ dx G R. ■J xwx 1 Solution. The improper integral represents the area of the figure between the graph of a positive function arct£* x > 1 ^ x^/x ' — and the x axis (from the left, the figure is bounded by the line x = 1). Hence the integral is a positive real number, or equals +oo. We know that f R is positive and nonincreasing on the interval (1, oo). Then this series converges if and only if the integral f(x)dx. converges. Proof. If the integral is interpreted as the area of a re-g)t gion under the curve, the criterion is clear. Indeed, notice that the given series diverges or converges if and only if the same is true for the same series without the first summand. Moreover, by the monotonicity of f(x), there are the following estimates <4 = / f(x)dx (d) !°_y-^dx; (e) f, —r— da. v y J1 x m x Solution. We have (a) OO /^ = -H(* + 2)-4]0~ = (b) (c) (d) -\ lim (a + 2)"4 - 2"4 ) = -HO - 4) = ^ 202 J In I a I -0+ 6.B.42. Compute the improper integrals □ J x2 e x dx; j dx +e Solution. Because the improper integral is a special case of a definite integral, we have at our disposal the basic methods to compute them. By integration by parts, we obtain J x2 e x dx o Fix) = ■ G'(x) = e~x OO rx]^ +2 Jxe~xdx F'(x) = 2x G(x) = -e~x F(x) = x G'(x) = e~x F'(x) = 1 G(x) = -e~x - lim ^ + 2 [-x e~x]™ + 2 J t~x dx = 0-2 lim 4+2 f-e-xln =0 + 2 lim -e-* + 1=2 The substitution method then yields e1 +e ■ dx ■ y = e" dy = ex da; — oo —oo [arctgy]^0 = lim arctgy - 2, when the new limits of the integral are derived from the limits lim ex = 0, lim ex = +oo. □ 6.B.43. Compute Jx2n+le-x dx^ 0 neN. Solution. We'll first solve this problem by the substitution method and then repeatedly apply integration by parts, yield- ing Jx2n+l&-x dx F(y) = j," G'iy) = e-y y = x* dy = 2a; da; F'iy) = ny Giy) = -e-y ±fyne-ydy- 2 0 n-1 ~-yn e~y]™ + nj y71'1 e_J/ dy ) = f / y™"1 e-v dy = sequence of the functions x(l — x2)n used above. They inte- 2(n+l) Thus, we consider the grate to x(l — x2)n dx functions /„(a;) = 2(n + l)a;(l-a;2) + These functions with n = m2, m = 1,..., 10 are on the next diagram. We verify that the values of these functions converge to zero for every x e [0,1] (for example ln(/„(a;)) —> —oo). But for all n r1 fn(x) dx = l ^0. 6.3.3. Uniform convergence. A reason of failure in all three previous examples is the fact that the speed of pointwise convergence of values fn(x) —> fix) varies dramatically point from point. Hence a natural idea is to confine the problem to cases where the convergence will have roughly the same speed over all the interval: Uniform convergence Definition. We say that the sequence of functions fn(x) converges uniformly on interval [a, b] to the limit fix), if for every positive number e, there exists a natural number NeN such that for all n > N and all x e [a, b] the inequality \fn(x)-f(x)\ /(a;o) at some point a;0 G [a, 6]. Moreover, assume all derivatives gn(x) = j'n(x) are continuous and converge uniformly to the function g(x) on the same interval. Then the function j(x) = g(t) dt is differentiable on the interval [a,b], the functions /„ (x) converge to /(x) and f'(x) = g(x). In other words, ~r~f(x) = T"( lim fn(x)) = lim (-^-fn(x) dx dx \ra-s-oo / n-s-oo \ dx Proof of the first claim. Fix an arbitrary fixed point xq g [a, b] and let e > 0 be given. It is required to show that \f(x)-f(x0)\ < e for all x close enough to x0. From the definition of uniform convergence, \fn(x) - f(x)\ < e for all x e [a, b] and all sufficiently large n. Choose some n with this property and consider S > 0 such that \fn(x) - fn(x0)\ < £ for all x in ^-neighbourhood of x0. That is possible because jnix) are continuous for all n. Then \f(x) - f(x0)\ <\f(x) - fn(x)\ + \fn(x) - fn(x0)\ + \fn(x0) - f(x0) \ < 3e for all x in the ^-neighbourhood of x0. This is the desired inequality with the bound 3e. □ Remark. In fact, the arguments in the proof show a more general claim. Indeed, if the functions fn(x) converge uniformly to j(x) on [a, b], and the individual functions fn(x) have the limits (or one-sided limits) limx^Xo jn(x) = an, then the limit lim^^ f(x) exists if and only if the limit linijj^oo an = a exists. Then they are equal, that is, a = lim lim fn(x) = lim lim fn(x] n—voo \ x—vxq i x—vxq \ n—voo The reader should be able to modify the above proof for this situation. 6.3.5. Proof of the second claim. The proof of this part of , the theorem is based upon a generalization of the properties of Cauchy sequences of numbers to uniform convergence of functions. In this way we can work with the existence of the limit of a sequence of integrals without needing to know the limit. 421 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS In total, for the improper integral we can write: 1 c3 - 1 da; 1 = — lim O 5—>-oo In la; — 1 1 , , J , r- 2x + 1 --mix + x + 1) — v 3 arctan -=— 2 V VŠ = — lim ( — In |(5 — 11 3 s—>oo y3 - i ln(á2 + S + 1)- víš arctan (^Jřf 1 1 7 --In 2 H— In 13 H--arctan —= 3 6 3 ^ = - In 13 H--= arctan —=--In 2--it. 6 y/3 y/3 3 6 □ 6.B.48. Determine the surface and volume of a circular paraboloid created by rotating a part of the parabola y = 2x2 for x e [0,1] around the y axis. Solution. The formulas stated in the texts are true for rotating the curves around the x axis! Hence it's necessary either to integrate the given curve with respect to variable y, or to transform. f2 „ V S = 2tt ■ áx = tt + — áx = 2ir 2 V 8a; ,,0 17WŤ-1 24 x 1 j 2 + 16 ŮX □ 6.B.49. Compute the area S of a figure composed of two parts of plane bounded by lines x = 0, x = 1, x = 4, the x axis and the graph of a function Solution. First realize that ^=<0, xe[0,l), ^=>0, a;e(l,4] and i_— ^ _i lim X—Vl- il i —oo, lim , fx-l x^l+ V^-l The first part of the figure (below the x axis) is thus bounded by the curves y = 0, x = 0, x = l, y = with an area given by the improper integral +oo. Uniformly Cauchy sequences Definition. The sequence of functions fn(x) on interval [a, b] is uniformly Cauchy, if for every (small) positive number e, there exists a large natural number 7Y such that for all x e [a, b] and all n > N, \fn(x) - fm(x)\ < E. Every uniformly convergent sequence of function on interval [a, b] is also uniformly Cauchy on the same interval. To see this, it suffices to notice the usual bound \fn(x) - fm(x)\ < \fn(x) - f(x)\ + \f(x) - fm(x)\ based on the triangle inequality. Before coming to the proof 6.3.4(2), we mention the following: Proposition. Every uniformly Cauchy sequence of functions fn(x) on the interval [a, b] uniformly converges to some function f on this interval. Proof. Of course, the condition for a sequence of functions to be uniform Cauchy implies that also for all x e [a,b], the sequence of values /„ (a;) is a Cauchy sequence of real (or complex) numbers. Hence the sequence of functions fn(x) converges pointwise to some function f(x). Choose 7Y large enough so that \fn(x) - fm(x) \ < E for some small positive e chosen beforehand and all n > N, x e [a, b]. Now choose one such n and fix it, then \fn(x) - f(x)\= lim m—yoo fn(x) - fm(x)\ < E for all x e [a, b]. Hence the sequence fn(x) converges to its limit uniformly. □ Proof of the second claim in 6.3.4. Recall that every uniformly convergent sequence of functions is also uniformly Cauchy and that the Riemann sums of all single terms fn(x) of the sequence converge to fn (x) dx independently of the choice of the partition and the representatives. Hence, if \fn(x) - fm(x) \ < B for all x e [a, b], then also fn(x) dx - fm(x) dx < e\b — a\ Therefore the sequence of numbers J fn(x) dx is Cauchy, and hence convergent. The Riemann sums of the limit function f(x) can be made arbitrarily close to those of fn(x) for large n, by the same argument as above. So f(x) is integrable. Moreover, fn(x) dx- f(x) dx J a so the limit value is as expected. < e\b — a\, □ 422 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Si = -J dx; o while the second part (above the x axis), which is bounded by the curves y = o, has an area of Since 1, 4, y S2 = J^dx. i^tfdx^ivi^r+c, the sum 5*1 + 5*2 can be gotten as s = -Jim_(§\/(^T^-§) + Jm (§^9-1^(^1)2) = | (i + ^9) . We have shown among other things, that the given figure has a finite area, even though it's unbounded (both from the top and the bottom). (If we approach x = 1 from the right, eventually from the left, its altitude grows beyond measure.) Recall here the indefinite expression of type 0 ■ oo. Namely, the figure is bounded if we limit ourselves to x e [0,1 — S] U [1 + S, 4] for an arbitrarily small S > 0. □ 6.B.50. Determine the avarage velocity vp of a solid in the time interval [1,2], if its velocity is v(t) = te\i,2] Omit the units. /i+t2 Solution. To solve the problem, it suffices to realize that the sought avarage velocity is the mean value of function v on interval [1,2]. Hence with 1 + t2 = x, t dt = dx/2. □ 6.B.51. Compute the length s of a part of the curve calles tractrix given by the parametric description f(t) = r cost + r In (tg |) , g(t)=rsint, te[Tr/2,a], where r > 0, a e (it/2, it). Solution. Since /'(f) = -rsini + 2tg § -cos2 = — r sin t + 6.3.6. Proof of the third claim. For the corresponding result about derivatives, extra care is needed regarding the assumptions: If the functions jn(x) = jn(x) — fn(x0) are considered instead of fn(x), the derivatives do not change. Hence without loss of generality it can be assumed that all functions satisfy fn(x0) = 0. Then one of the assumptions of the theorem is satisfied automatically. For all x e [a, b], we can write fn(x) = / gn(t)dt. ■J xo Because the functions gn converge uniformly to g on all of [a, 6], the functions fn(x) converge to /(*) git) dt. g is a uniform limit of continuous functions, thus g is again continuous. By 6.2.8, for the relations between the Riemann integral and the primitive function, the proof is finished. 6.3.7. Uniform convergence of series. For infinite series, the corresponding results follow as a corollary in this way: Consequences for uniform convergence of series Theorem. Consider a sequence of functions fn{x) on interval [a,b\. (1) If all the functions fn{x) are continuous on [a,b] and the series converges uniformly to the function S(x), then S(x) is continuous on [a,b\. (2) If all the functions fn{x) are Riemann integrable on [a, &1 and the series uniformly converges to S(x) on [a, b], then S(x) is integrable on [a, b] and j (x) dx. // oo x oo pb ('£fn(x))dx = 52 / /n( -- Vi=l / n=lJa (3) If all the functions fn (x) are continuously differentiable on the interval [a, b], if the series S(x) = Y^n°=i fn(x) converges for some x0 G [a,b], and if the series T(x) = Y^=i fn(x) converges uniformly on [a, b], then the series S(x) converges. S(x) is continuously differentiable on [a, b] and S' (x) = T(x). That is: g'(t) = r cos t on interval [7r/2, a], for the length s we get d dx dx 423 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS tt/2 r2 cos4« + r2 cos2 tdt=] tt/2 r2 cos2 t sin2 í -r / ^dí = -rpn(sinC/2 = -rIn(sina). tt/2 □ 6.B.52. Compute the volume of a solid created by rotation of a bounded surface, whose boundary is the curve x4 — 9x2 + y4 = 0, around the x axis. Solution. If [x, y] is a point on the x4 — 9x2 + y4 = 0, clearly this curve also intersects points [—x,y], [x,—y], [—x,—y]. Thus is symmetric with respect to both axes x, y. For y = 0, we have x2 (x — 3) (x + 3) = 0, i.e. the x axis is intersected by the boundary curve at points [—3,0], [0,0], [3,0]. In the first quadrant, it can then be expressed as a graph of the function f(x) = ^9x2 -x4, x G [0, 3]. The sought volume is thus a double (here we consider x > 0) of the integral 3 3 / irf2(x) dx = tt J V9x2 — x4 dx. o o Using the substitution t = ^9 — x2 (xdx = —tdt), we can easily compute / \/9x2 - x4 dx = fx ■ \/9 - x2 dx = - ft2dt = 9, 0 0 3 and receive the result 187r. □ 6.B.53. Torricelli's trumpet, 1641. Let a part of a branch of the hyperbola xy = 1 for a; > a, where a > 0, rotate around the x axis. Show that the solid of revolution created in this manner has a finite volume V and simultaneously an infinite surface S. Solution. We know that + 00 + 00 V = n f (±ydx = n f ±dx. lim x—>-+oo and a a +°° ( \ > 2ir f - dx = 2ir I lim In x — In a } = +oo. a X \x^+oo ) The fact the the given solid (the so called Torricelli's trumper) cannot be painted with a finite amount of color, but can be filled with a finite amount of fluid, is called Torric-celli's paradox. But realize that a real color painting has a 6.3.8. Test of uniform convergence. A simple way to test that a sequence of functions converges uniformly is to use a comparison with m, k \sk(x) - Sm(x)\ = ^ fn(x) n=m+l k k n=m+l n=m+l If the series of the (nonnegative) constants J2n°=i a« ^s con" vergent, then the sequence of its partial sums is a Cauchy sequence. But then the sequence of partial sums sn(x) is uniformly Cauchy. By 6.3.5 the following is verified: The Weierstrass test Theorem. Let fn{x) be a sequence of functions defined on interval [a, b] with \fn(x) \ < an G K. If the series of numbers a« convergent, then the series S(x) = JZr^=i fn(x) converges uniformly. 6.3.9. Consequences for power series. The Weierstrass test has important results for power series 71 = 0 Hit m centered at a point xn. We saw earlier in 5.4.8, that each power series converges on an entire interval (xq — S, xq + S). The radius of convergence S > 0 can be zero or oo. (see 5.4.12). In the proof of theorem 5.4.8, a comparison with a suitable geometric series is used to verify the convergence of the series S(x). By the Weierstrass test, the series S(x) converges uniformly on every compact (i.e. bounded closed) interval [a, b] contained in the interval (xq — S, xq + S). Thus the following crucial result is proved: 424 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS nonzero width, which the computation doesn't take into account. For example, if we would paint it from the inside, a single drop of color would undoubtedly "block" the trumpet of infinite length. □ C. Power series 6.C.I. Expand the function ln(l + x) into a power series at point 0 and 1 and determine all x e K for which these series converge. Solution. First we'll determine the expansion at point 0. To expand a function into a power series at a given point is the same as to determine its Taylor expansion at that point. We can easily see that [ln(x + l)]<") = (-l)"+1/(n~\)!, K ' (x+ !)■"■' Differentiation and integration of power series Theorem. Every power series S(x) is continuous and is continuously differentiable at all points inside its interval of convergence. The function S(x) is Riemann integrable and can be differentiated or integrated done term by term. Abel's theorem states that power series are continuous even at the boundary points of their domain when they converge there (including eventual infinite limits). We do not prove it here. The pleasant properties of power series also reveal limitations on the use in practical modelling. In particular, it is not possible to approximate piece-wise continuous or non-differentiable functions very well by using power series. Of course, it should be possible to find better sets of functions fn[x) than just the values fn(x) = xn, up to constants. The best known examples are Fourier series and wavelets discussed in the next chapter. 1) = In 1 + anXn, where 77=1 (-!)"+>-1)! _ (-1) 77+1 so after computing the derivatives at zero, we have ln(x + 6.3.10. Laurent series. We return to the smooth function f(x) = e_1/x from paragraph 6.1.5 in the context of Taylor series expansions. It is not analytic at the origin, because all its derivatives are zero there and the function is strictly positive at all other points. At all points x0 ^ 0 this function is given by its convergent Taylor series with radius of convergence r = \x0\. At the origin the Taylor series converges only at the one point 0. Replace x with the expression — 1/x2 into the power series for ex. The result is the series of functions Thus we can write ln(x + l) = x - ^x2 + ^x3 - ^x4 + ■ = y(-^>— E For the radius of convergence, we can then use the limit of the quotient of the following coefficients of terms of the power series r=-* =-1=1. 77=0 Hence the series converges for arbitrary a; e (—1,1). For a; = —1 we get the harmonic series (with a negative sign), for x = 1 we get the alternating harmonic series, which converges by the Leibniz criterion. Thus the given series converges exactly for a; G (-1,1]. Analogously, for the expansion at point 1, by computing the above derivatives from 6.C.1, we get ln(a; + l)=ln(2) + ^(a;-l)-^(a;-l)2 The series converges at all points x ^ 0. It gives a good idea about the behaviour near the exceptional point x = 0. Thus we consider the following series similar to power series but more general: Laurent6 series A series of functions of the form oo s(x) = E an(x - x°)n 77= —OO is called a Laurent series centered at x0. The series is convergent if both its parts with positive and negative exponents converge separately. The importance of Laurent series can be seen with rational functions. Consider such a function S(x) = f(x)/g(x) with coprime polynomials / and g and consider a root x0 of + 1 ,(x-iy 3-23 OO ln(2) + E (-1)77+1 n ■ 2r> 6Pierre Alphonse Laurent (1813-1854) was a French engineer and military officer. He submitted his generalization of the Taylor series into the Grand Prix competition of the French Academie des Sciences. For formal reasons it was not considered. It was published much later, after the author's death. 425 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS and for the radius of convergence of this series, we get 1 1 1. The first series converges for —1 < x < 1, the second for-l-oo 1+7r+\/1+-!+-+v'l+l polynomial g (x). If the multiplicity of this root is s, then after multiplication we obtain the function S(x) = S(x) (x — xo)s, which is analytic on some neighbourhood of a;0. Therefore we can write S(x) = + V / (x-x0)T + + a0 + ai(x — x0) + . n=—s Consider the two parts of the Laurent series separately: —l 00 S(x) = S-+S+ = an(x-x0)n + ^2an(x-x0)n. 77=—00 77=0 For the series S+, Theorem 5.4.8 implies thatits radius of convergence R is given by R-1 = limsup^^ \]/\an\. Apply the same idea to the series S- with l/x substituted for x. It is then apparent that the series S- (x) converges for a;—a;01 > r, where r = limsup^^ \]/\a-n\. Notice that the conclusions about convergence remain true even for complex values of x substituted into the expression. Laurent series can be considered as functions defined on a domain in the complex plain. We return to this in chapter 9. The following theorem is proved already. Convergence of the Laurent series on the annulus Theorem. The Laurent series S(x) centered at x0 converges for all x £ C satisfying r < \x — x0\ < R and diverges for all x satisfying \x — x0\ < r or \x — x0\ > R, where -1 lim sup \/\a-n |, R lim sup The Laurent series need not converge at any point, because possibly R < r. If we look for an example of the above case of rational functions expanded to Laurent series at some of the roots of the denominator, then clearly r = 0 and therefore, as expected, it converges in the punctured neighbourhood of this point x0. i? is given by the distance to the closest root of the denominator. In the case of the first exam-pie, the function e_1/x , r = 0 and R = 00. 6.3.11. Numerical approximation of integration. Just as in paragraph 6.1.16, we use the Taylor expansion to propose simple approximations of integration. We deal with an integral I = fb f(x) dx of an analytic function f(x) and a uniform partition of the interval [a, 6] using points a = x0, xi,..., xn = b with distances Xi — a^_i = h > 0. Denote the points in the middle of the intervals in the partitions by xi+1/2 and the values of the function at the points of the partition by f(xi) = ji. Compute the contribution of one segment of the partition to the integral by the Taylor expansion and the previous theorem. Integrate symmetrically around the middle values so 426 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS o 6.C.10. Apllications of the integral criterion of convergence. Now let's get back to (number) series. Thanks to the integral criterion of convergence (see 6.2.19), we can decide the question of convergence for a wider class of series: Decide, whether the following sums converge of diverge: oo a) E ' nlnn' n=l oo b) £ 71=1 Solution. First notice, that we cannot decide the convergence of none of these series by using the ratio or root test (all lim-and lim ^/a^ equal 1). Using the integral its lim |^2±i 77—SOO a" criterion for convergence of series, we obtain: a) 1 ■ da; / - dí = lim [ln(i); ! x\n(x) Jo hence the given series diverges. oo, b) ; da; : lim 6—too 1, hence the given series converges. □ 6.C.11. Using the integral criterion, decide the convergence of series oo E (77+1) In2 (77+1) ' 77=1 Solution. The function f(X) = (x+l) In^+l)' X G I1' +°°) is clearly positive and nonincreasing on its whole domain, thus the given series converges if and only if the integral f^~°° f(x) dx converges. By using the substitution y = In (x + 1) (where dy = dx/(x + 1)), we can compute +oo +oo S 17+Jjl^n7+i) dx= I dy = hT5- 1 In 2 Hence the series converges. □ Uniform convergence. 6.C.12. Does the sequence of functions yn = e^, a; e R, n e N converge uniformly on R? Solution. The sequence {y^JnGN converges pointwise to the constant function y = 1 on R, since that the derivatives of odd orders cancel each other out while integrating: rh/2 [h/2 / oo x f(xi+1/2+t)dt= / [T^f{n)(xt+1/2)tn )dt h/2 J-h/2\^n\ ) h/2 y fc=0\J~h/2 h2k+l fc=0 klf{k)(^+1/2)tkdt 22fe(2fc + l) A simple numerical approximation of integration on one segment of the partition is the trapezoidal rule. This uses the area of a trapezoid given by the points [x{, 0], [x{, f], [0,a;j+i], [xi+i,fi+i] for approximation. This area is Pi = \(fi+fi+i)h . In total, the integral I is approximated by n — l , /trap = P* = 2 (/° + 2/1 + ' ' ' + 2/™"1 + In)- i=0 Compare /trap to the exact value of I computed by contributions over individually segments of the partition. Express the values f by the middle values fi+i/2 and the derivatives fi+1/2 m tne following way: h h2 /j+l/2±l/2 = fi+l/2 ± 2 fi+1/2 + 2!22^"(* + 1//2) ±3&/(3)(i + 1/2) + --" Thus, the contribution Pi to the approximation is Pi = \{h+h+i)h = h(f+1/2+^f(i+l/2))+0(h5). Estimate the error Ai = I — /trap over one segment of the partition: a = h(ft+i/2 + g/;;1/2 - h+i/2 - ^fUi,2 + o(h4)) h3 = -^f"+i/2 + 0(h5)-The total error is thus estimated as |J-Jtrap| = ^n/i3|/"|+nO(/i5) 12 (b-a)h2\f"\ + 0(hA) where |/"| represents an upper estimate for \f"(x) | of / over the integral of integration. If the linear approximation of the function over the individual segments does not suffice, we can try approximations by quadratic polynomials. To do so, three values are always needed, so work with segments of the partition in pairs. Suppose n = 2m and consider x{ with odd indices. We choose fi+i = f(xi + h) = fi+ atih + fiih2 fi-i = f{xi -h) = fi- atih + fiih2 427 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS lim 77 —>-00 x e. But the computation yn (V2n) = e > 2 for all n G N implies that it's not a uniform convergence. (In the definition of uniform convergence, it suffices to consider e G (0,1).) □ 6.C.13. Decide whether the series oo _ E^/x-n 774+X2 77 = 1 converges uniformly on the interval (0, +oo). Solution. Using the denotation = a:>0, neN, we have From now on, let n G N be arbitrary. The inequalities f'n{x) > 0 for a; G (0,n2/V3) and f'n{x) < 0 for x G (n2/ V3, +oo) imply that the maximum of function /„ is attained exactly at the point x = n2/ \/3. Since Jn \ \/3 / _ 4™2 4n2 — 4 n2 ^ +txJ' v 7 77=1 77 = 1 according to the Weierstrass test, the series E^Li fn(x) converges uniformly on the interval (0, +oo). □ 6.C.14. For x G [—1,1], add Z-< 77(77+1) 77+1 Solution. First notice that by the symbol for an indefinite integral, we'll denote one specific primitive function (while preserving the variable), which should be understood as a so called function of the upper limit, while the lower limit is zero. Using the theorem about integration of a power series for x G (—1,1), we'll obtain „00 (-1)"+ Z^77=l 77(77+1 77+1 E (-1) 77+1 dx = /E~i((-i)n+1/^"-1^)^ HJi:Zi(-zr-1dx)dx = f(fi-x+: dx) dx = J (J dx) dx = Since E (-1) 77+1 xn \dx = / In (1 + x) + d dx, which implies The approximation of the integral over two segments of the partition between Xi-i and Xi+i is now estimated by the expression (notice we integrate the quadratic polynomial with the requested values fi-i, f, f+i in the points Xi-i, Xi, xi+1, respectively. It is not necessary to know the constant P = fi + att + Pitz dt = 2hf + -p,thA h 3 2/i/i + ^(/i+i + /i-i-2/i) h 6 '-(f+1 + f-1+4f). I ^ -^S imp I This procedure is called Simpson's rule1. The entire integral is now approximated by ^ 71—1 71—1 777+i + 2 Y^ hm + f2n)- 777=0 777=1 As with the trapezoidal rule above, the total error is estimated by = ^(&-a)/i4|/(4)l + 0(/i5), where f^ represents the upper bound for f^ (x) over the interval of integration. 6.3.12. Integrals dependent on parameters. When inte- 'St* gratmg a function f(x,y1,... ,yn) of 1 real variable x depending on further real parame-ters yi,...,yn with respect to the single variable x, the result is a function F(yi,..., yn) depending on all the parameters. Such a function F often occurs in practice. For instance, we can look for the volume or area of a body which depends on parameters, and determine the minimal and maximal values (with additional constrains as well). Often it is desirable to interchange the operations of differentiation and integration. That this can be done is proved below. We begin with an examination of continuous dependency on the parameters. For sake of simplicity, we shall deal with functions f(x,y) depending on two variables, x G [a,b], y G [c,d\. We say / is continuous on / [a,b] x [c, d] C K2 = C if for each z = (x,y) from the domain of / and e > 0 there is someS > such \f(w) — f(z)\ < eiiw G 0$(z). (Noticethe definition is the same as with the univariate functions, just we use the distance in the plane.) The function f(x, y) is called uniformly continuous if for each e > 0, there is S > 0 such that for any two points z, w in I C K2 = C, \z-w\ < S implies \f(z)-f(w) < e. Exactly the same argument as with univariate functions, based on the fact that every open cover of a compact set in the complex we know from the continuity of the given functions that This way of approximating the integral is attributed to the English mathematician and inventor Thomas Simpson (1710-1761). 428 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS (-1) E 77=1 The choice x Next, ■ xn = \n(l+x) + C\, = 0 then yields 0 = In 1 + Cu i.e. d = 0. f]n(l + x) dx = | per partes u = In (1 + x) v' = 1 l+x V = X = x In (1 + x) — J dx = x In (1 + x) — J 1 — j1^ dx = x In (1 + x) - x + In (1 + x) + C2 = (x + 1) In (x + 1) -x + C2. Since the given series converges at the point x = 0 with a sum of 0, analogously as for C\ , 0 = l-lnl-0 + C2 implies that C2 = 0. In total, we have for x e (—1,1): 2—< 77(77+1 „77+1 — [x + 1) In (x + 1) - : Moreover, according to Abel's theorem (see 6.3.9), the sum of the given series equals the (potentially improper) limit of the function (x + 1) In (x + 1) — x at points —1 and 1. In our case, both limits are proper (at point 1, the function is even continuous and the value of the limit at point 1 then equals the value of the function 2 In 2 — 1.) For computing the value of the limit at point —1, we'll use L'Hospital's rule: lim (x + 1) In (a; + 1) x—y — 1+ ,• mt -, = lim —|--h 1 = lim - t—S-0+ j t—S-0+ - = lim -t + 1 = 1. t—S-0+ lim tha.t + 1 t—S-0+ + 1 Of course, the convergence of the series at points ±1 can be verified directly. It's even possible to directly deduce that 00 E ^T+TJ = 1 (by writing out nin+T) = k ~ Tr+T■ D 77=1 6.C.15. Sum of a series. Using theorem 6.3.5 "about the interchange of a limit and an integral of a sequence of uniformly convergent functions", we'll now add the number series 00 1 77=1 dx We'll use the fact that / ^fr 1 772" Solution. On interval (2,00), the series of functions J2n°=i x^tt converges uniformly. That is implied for example by the Weierstrass test: each of the function —k+x is decreasing on interval (2, 00), thus their values are at most 2^tt; the series X^i 2^tt ^s convergent though (it's a geometric series with quotient 5). Hence according plane contains a finite subcover, cf. Theorem 5.2.8(5), provides the following lemma (cf. the proof of Theorem 6.2.11). Lemma. Each continuous function f(x,y) on I = [a,b] x [c, d] is uniformly continuous. Now we are ready for the following important claim: Theorem. Assume f(x, y) is a function defined for all x lying in a bounded interval [a, b] and all y in a bounded interval [c, d], continuous on I = [a, b] x [c, d}. Consider the (Riemann) integral F(y)= / f(x,y)dx. J a Then the function F(y) is continuous on [c, d}. Proof. Fix a point y G [c, d], small e > 0, and choose ! .ji * „ a neighbourhood W of y such that for all y e W C [c, d] and all x e [a, b] (remember / is uniformly con- tinuous) \f(x,y) - f(x,y)\ < e. The Riemann integral of continuous functions is evaluated by approximations of finite sums (equivalently: upper, lower, or Riemann sums with arbitrary representatives see paragraph 6.2.9). The goal is to establish that the Riemann sums for the integrals with parameters y and y cannot differ much. In the following estimate for any partition with k intervals and representatives £j, first use the standard properties of the absolute value and then exploit the choice of W: k-l k-1 Xi+l k-l Xi+l i=0 < e(b - a). It follows that the limit values for any sequences of the partitions and representatives F(y) and F(y) cannot differ by morethane(b—a) either, so the function F is continuous. □ 6.3.13. Integrating twice. The fact that the integral F(y) = fa f(x> V)dx of a continuous function / : [a, b] x [c, d] —> R in the plane is again a continuous function F : [c, d] —> R allows us to repeat the integration and write fd ph fd fb (1) 1=1 I f(x,y)dxdy= (/ f{x,y)dx)dy. J c J a J c J a The next theorem is the simplest version of the claim known as Fubini theorem. 429 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS to the Weierstrass test, the series of functions J2n°=i J^tt Fubini theorem converges uniformly. We can even write the resulting Theorem. Consider a continuous function f : [a, b] x function explicitly. Its value at any a e (2, oo) is the value ^ d] ^ R ^ plam r2 ^ muMple integmtion (1) is of the geometric series with quotient ^, so if we denote the weU defined and does not depend on the order of integration, limit by / (x), we have i-e., 1 - 1 1 - 1 / = / (/ f(x,y)dx)dy= / (/ f(x,y)dy)dx.. J a;™4"1 a2 1 — — x(x — 1) c a a c 77=1 x V ) By using (6.3.7) (3), we get oo ^ 00 r00 da r°° I °° i \ Proof. We know / is uniformly continuous on the prod- E^ = E / ~nTT = \ IE ~nTT ) dx uct of intervals M x Ic> d]in the Plane- Thus.for each s > 0 n=i 77=1-'2 -'2 \t7=i / there is (5 > 0 such that |/(ai, yi)—/(a2,1/2)! < e whenever 1 , , r5 1 1 , |ai - a2| < 5 and |yi - y2| < $■ ■ da = lim /---da 2 a(a — 1) iS-^oo J2 x — 1 x We kn°w both Rieman integrals in (1) exist, thus we lim [(ln(<5 - 1) - ln(<5) - ln(l) + In 2] may fix a sequence Sk of partitions of the interval [a, b] into <5^oo k subinterval [aj_i,aj] of equal size 1/fc and with repre- lim 8—too In 5-1 5 § _ l\ j = 1,... ,k. Then we may write +ln(2) sentatives i = 1,... ,k, and similarly for the interval [c, d] with the subintervals [yi-i, yp] and representatives 77^, In ( lim ) + In 2 = In 2 1 s^t-og □ 1 = J* ^(£mi,k,y)Ub-")yy 6.C.16. Consider function/(a) = J2n°=ine nx ■ Deter- if I < 5, then mine /•In 3 rb / f(x)dx. Jin 2 Solution. Similarly as in the previous case, the Weierstrass test for uniform convergence implies that the series of functions Y^n°=i ne~nx converges uniformly on interval (In 2, In 3), since each of the functions ne~nx is lesser than tJIt on (In 2, In 3) and the series J2n°=i 3^ converges, which can be seen for example from the ratio test for convergence of series: Thus, the convergence of f(x,y)dy- -J2f(^y)i(b-a')\ i=l k ^E i=l f J Xi- /(a, y) dx - /(Ci,fc, y)\{b- a) - 1 k y)fr - ^ 71=1 71=1 ^ ' ^ — 1 □ =^o^2ifdm,k,y)dy)Ub-a) „-_i J c 6.C.17. Determine the following limit (give reasons for the _ ^m y~^y~^j^ g^Uffo _ a)(,j procedure of computation): fc/^oo ^.=i fc £ i|m /" cos (77 J ^ Clearly, the same result will appear if we swap the order of (! + -)" the integration. □ 77—>-00 430 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Solution. First we'll determine lim n—too )" ■ The sequence of these functions converges pointwise and we have \ n > lim n—¥oo i \ _|_ — J l lim 1 + -) (77) J_ ex It can be shown that the given sequence converges uniformly. Then according to (6.3.5), lim 71—>-00 1 + 5) ■ da; •--^ n—¥oo (1 _|_ ^ J '1 = 1 ex da; We leave the verification of uniform convergence to the reader (we only point out that the discussion is more complicated than in the previous cases). □ 6.C.18. By using differentiation, obtain the Taylor expansion of function y = cos x from the Taylor expansion of function y = sin x centered at the origin. O 6.C.19. Find the analytic function whose Taylor series is x - i a;3 + i x5 - j xr + ■ for a; G [-1,1]. O 6.C.20. From the knowledge of the sum of a geometric series, derive the Taylor series of function V ~ 5+2x centered at the origin. Then determine its radius of convergence, o 6.C.21. Expand the function y ~ 3^2^' x g (—§' i) to a Taylor series centered at the origin. O 6.C.22. Expand the function cos2 (a;) to a power series at the point 7r/4 and determine for which x G K this series converges, o 6.C.23. Express the function y = ex denned on the whole real axis as an infinite polynomial with terms of the form an(x — \)n and express the function y = 2X denned on R as an infinite polynomial with terms anxn. O 6.C.24. Find a function / such that for x G K, the sequence of functions n2x2-\-l ' n G N 6.3.14. Differentiation in the integrals. We are ready to discuss the differentiation of integrals with respect to parameters. The following result is ex-p?T~> tremely useful. For instance we shall use it in the next chapter when examining integral transforms. Differentiation with respect to parameters Theorem. Consider a continuous function f(x,y) defined for all xfrom a finite interval [a, b] and for all y in another finite interval [c, d], a point c G [c, d], and the integral F(y)= f f(x,y)dx. ■J a If there exists the continuous derivative -£^f on a neighbourhood of the point c, then -jj^F(c) exists as well and d /*k d dy-F{c) = Ja d-yf(x>y)\y=cdx- Proof. By the assumed continuity of all functions and the already known continuous dependence of integrals on parameters, some knowledge about univariate antiderivatives can be used. The result is then a simple consequence of the Fubini theorem. Denote G(y)= I ^f{x,y)dx, F(y) Ja ay f(x,y)dx and compute, invoking Fubini theorem, the antiderivative dz H(y) = / G(z) dz = rb / ry d \ — fix, z) dz ] dx dzjy ' I f(x, z) dx I dz = / (f(x,y) - f(x,y0)) dx J a = F(y)-F(y0). Finally, differentiating with respect to y yields as desired. □ 6.3.15. The Riemann-Stieltjes integral. To end this chapter, we mention briefly some other concepts of integration. Mostly we confine ourselves to remarks and comments. Readers interested in a thorough explanation can find another source. First, a modification of the Riemann integral, which is useful when discussing probability and statistics. In the discussion of integration, we summed infinitely many linearized (infinitely) small increments of the area given by a function j(x). We omitted the possibility that for different values of x we could take the increments with different weights. This 431 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS to it. Is this convergence uniform on r? O 6.C.25. Does the series oo E ^fe, kde l£I, converge uniformly on the whole real axis? O 6.C.26. By using differentiation, obtain the Taylor expansion of function y = cos x from the Taylor expansion of function y = sin x centered at the origin. O 6.C.27. Approximate (a) cosine of ten degress with a precision of at least 10~5; (b) the definite integral J^2 x42^ with a precision of at least 10"3. o 6.C.28. Determine the power expansion centered at x0 = 0 of function X f(x) = J e*2 dt, i6R. o o 6.C.29. Find the analytic function whose Taylor series is x - ±X3 + ±X5 - j X7 + ■ for a; G [-1,1] O 6.C.30. From the knowledge of the sum of a geometric series, derive the Taylor series of function y ~ 5+2x centered at the origin. Then determine its radius of convergence, o 6.C.31. Use the derivatives of functions y = Igx and y = cotg x to find the indefinite integrals of functions (a) y = cotg2a;, x G (0,tt); (b) v= ^x^x. ze(o,f). O 6.C.32. By repeated use of integration by parts, for all x G r determine (a) fx2 sin a; dx; (b) J x2 ex dx. o 6.C.33. For example by using integration by parts, determine can be arranged at the infinitesimal level by exchanging the differential dx for ip(x)dx for some suitable function p. Imagine that at some point x0, the increment of the integrated quantity is given by af(x0) independently of the size of the increment of x. For example, we may observe the probability that the amount of alcohol per mille in the blood of a driver at a test will be at most x. We might like to integrate over the possible values in the interval [0, x\. With quite a large probability the value is 0. Thus for any integral sum, the segment containing zero contributes by a constant nonzero contribution, independent of the norm of the partition. We cannot simulate such behaviour by multiplying the differential dx by some real function. Instead we generalize the Riemann integral in the following way: Riemann-Stieltjes integral Choose a real nondecreasing function g on a finite interval [a,b]. For every partition E with representative & and points of the partition a = x0,x1,... ,xn = b the Riemann-Stieltjes integral sum of function f(x) as n i=l The Riemann-Stieltjes integral 1= f f(x)dg(x) exists and its value is 7, if for every real e > 0 there exists a norm of the partition S > 0 such that for all partitions E with norm smaller than S, \SS-I\ < e. For example, choose g(x) on interval [0,1] as a piece-wise constant function with finitely many discontinuities ci,..., cj; and "jumps" a, = lim g(x) — lim g(x), then the Riemann-Stieltjes integral exists for every continuous j(x) and equals f(x)dg(x) = ^2aif(a) By the same technique as used for the Riemann integral, we define upper and lower sums and upper and lower Riemann-Stieltjes integral. For bounded functions they always exist, and their values coincide if and only if the Riemann-Stieltjes integral in the above sense exists. We have already encountered problems with the Riemann integration of functions that are "too jumpy". For a 432 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS f a In2 x dx function g(x) on a finite interval [a, b] define its variation by for x > 0. 6.C.34. Using integration by parts, determine f (2 - a2) ex dx on the whole real line. 6.C.35. Integrate (a) J(2x + 5)10 dx, iGl; (b) / 1 > o; (c) J t~x3x2 dx, (d) /15=^aLf da, a G (-1,1); (e) / ^ dx " 3. > 0; O O ■ dx. (g) Je^+3 (h) Jsiny7^ dx, x G E; a > 0 by using the substitution method. o 6.C.36. Fora G (0,1), by using suitable substitutions, reduce the integrals f -r2 / g //t- f _^£_ J Y 1-X "a-, J (a._1)v/a.2+a.+ 1 to integrals of rational functions. o 6.C.37. Fora G (-7r/2,7r/2) compute J l+sin2 x using the substitution t = tg a. 6.C.38. How many distinct primitive functions to function y = cos (lna) does there exist on the interval (0,10)? O 6.C.39. Give an example of a function / on th interval I = [0,1] that doesn't have a primitive function on /. O 6.C.40. Using the Newton integral, compute (a) J sin a dx; 0 1 (b) Jarctga dx; (c) (d) 371-/4 / -tt/4 e / l/e 1+sin x dx; lna I dx. o 6.C.41. Compute dx. = sup^|g(ai) -g(ai_i) where the supremum is taken over all partitions E of the interval [a, b]. If the supremum is infinite, we say that g(x) has unbounded variation on [a, b]. Otherwise we say that g is a function with bounded variation on [a, b]. A function is of bounded variation if and only if it can be written as the difference of two monotonic functions. As in the discussion of the Riemann integral, we derive the following theorem. We invite the reader to add the details of its proof. The main tools are the mean theorem, the uniform continuity of continuous functions on closed bounded intervals. The variation of g over the interval [a, b] plays the role of the length of the interval in the earlier proofs dealing with Riemann integration. Properties of the Riemann-Stielties integral Theorem. Let /(a) and g(x) be real functions on a finite interval [a, b}. (1) Suppose g(x) is non-decreasing and continuously differentiate. Then the Riemann integral on the left hand side and the Riemann-Stieltjes integral on the right hand side either both exist or do not exist. In the former case, their values are equal f{x)g'{x)dx = / f{x)dg{x) O (2) If /(a) is continuous and g(x) is a function with finite variation, then the integral Ja f(x)dg(x) exists. 6.3.16. Kurzweil-Henstock integral. The last topic in this chapter is a modification of the Riemann integral, which fixes the unfortunate behaviour at the third point in the paragraph 6.3.1. That is, the limits of the non-decreasing sequences of integrable functions are again integrable. Then we can interchange the order of the limit process and integration in these cases, just as with uniform convergence. Notice what is the essence of the problem. Intuitively we assume that very small sets must have zero size. Thus the changes of values of the functions on such sets should not change the integral. Moreover, a countable union of such sets which are "negligible for the purpose of integration" should also have zero size. We would expect for example that the set of rational numbers inside a finite interval would have this property, hence its characteristic function should be integrable and the value of such an integral should be zero. We say that a set A c E has zero measure, if for every e > 0 there is a covering of the set A by a countable system 433 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS o 6.C.42. For arbitrary real numbers a < b determine J sgnx dx. a Recall that sgnx = 1, for x > 0; sgnx = —1, for x < 0; and sgnO = 0. O 6.C.43. Compute the definite integral } x3 i o 6.C.44. For example by repeated integration by parts, compute 71-/2 J e2x cos a; dx. o o 6.C.45. Determine J x2 e x dx. o o 6.C.46. Compute the integral i using the substitution method. 6.C.47. Compute e8 In 2 (b) / jl dx. o o 6.C.48. Which of the positive numbers 7T-/2 tv p := J cos7 x dx, q := J cos2 x dx o o is bigger? O 6.C.49. Determine the signs of these three numbers (values of integrals) 2 tt 2tt a := fx3 2X dx; b := f cosx dx; c := f dx. -2 0 0 of open intervals J,, i = 1,2,... such that oo Em(di) < e. i=l m(Ji) means the length of the interval J,. In the sequel, the statement "function / has the given property on a set B almost everywhere" means that / has this property at all points except for a subset A c B of zero measure. For example, the characteristic function of rational numbers is zero almost everywhere. A piece-wise continuous function is continuous almost everywhere. Now we modify the definition of the Riemann integral so that restrictions on the Riemann sums are permitted, eliminating the effect of the values of the integrated function on sets of measure zero. This is achieved by a finer control of the size of the segments in the partition in the vicinity of problematic points. A positive real function 5 on a finite interval [a, b] is called a gauge. A partition E of interval [a, b] with representatives & is S-gauged, if & - 0. In order to restrict the Riemann sums to a gauged partition with representatives in the definition of the integral, it is necessary to know that for every gauge S, a 5-gauged partition with representatives exist. Otherwise the condition in the definition could be satisfied in a vacuous way. This statement is called Cousin's lemma. It is proved by exploiting the standard properties of suprema: For a given gauge S on [a, b], denote by M the set of all points x e [a, b] such that a 5-gauged partition with representatives can be found on [a, x]. M is nonempty and bounded, thus it has a supremum s. If s ^ b, then there is a gauged partition with representatives at s, where s is in the interior of the last segment. This leads to a contradiction. Thus the supremum is b, but then the gauge 5(b) > 0 and thus b itself belongs to the set M. Now we can state the following generalization of the Riemann integral. Call it the K-integrat. o 6.C.50. Order the numbers There are many equivalent definitions and thus also names for this K-integral. A complicated approach was coined by Arnaud Denjoy around 1912. Thus the space of real functions integrable on an interval [a, b] in this sense is often called Denjoy space. Other people involved were Nikolai Luzin and Oskar Perron. We can find the integral under their names. The simple and beautiful definition was introduced by Jaroslav Kurzweil, a Czech mathematician still living in 1957. Much of the theory was developed by Ralph Henstock (1923-2007), an English mathematician. 434 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 71-/2 tt/2 A := J cos a; sin2 a; dx, B := J sin2 a; dx, C : = o o l J —x5 5X dx, -l D:=r^dx+J^dx + j^dx 2tt tt 10 by size. O 6.C.51. By considering the geometric meaning of the definite integral, determine 2 (a) J | x — 1 da;; -2 0,10 (b) J" tg x dx; -0,10 2tt (c) J sin a; da;. o 6.C.52. Compute \ x dx. 6.C.53. Determine o o J x5 sin2 x dx. -l o 6.C.54. Without using the symbols of differentiating and integrating, express / a j 3t2 cos t dt c2 with variable x e K and a real constant a, if we differentiate with respect to a;. O Kurzweil-Henstock integral Definition. A function / denned on a finite interval [a, b] has a Kurzweil-Henstock integral f(x) dx, if for every e > 0, there exists a gauge 5 such that for every 5-gauged partition with representatives E, the inequality Ss —1\ < e is true for the corresponding Riemann sum Ss- 6.3.17. Basic properties. When denning the K-integral, only the set of all partitions is bounded, for which the Riemann sums are taken into account. Hence if the function is Riemann integrable, then it is K-integrable, and the two integrals are equal. For the same reason, the argumentation in Theorem 6.2.8 about simple properties of the Riemann integral applies. This verifies that the K-integral behaves in the same way. In particular, a linear combination of integrable function cf(x) + dg(x) is again integrable and its integral is c Jb f(x)dx + d Jb g(x)dx etc. To prove this, it suffices only to think through some modifications when discussing the refined partitions, which moreover should be 5-gauged. The Kurweil integral behaves as anticipated with respect to the sets of zero measure: Theorem. Let the function f, which is zero almost everywhere be defined on the interval [a, b}. Then the K-integral f{x)d{x) exists and is zero. Proof. The proof is an illustration of the idea that the influence of values on a "small" set can be removed by a suitable choice of gauge. Denote by M the corresponding set of zero measure, outside of which j(x) = 0 and write Mk c [a,b], for the subset of the points for which k — 1 < \j(x) | < k. Because all the sets Mk have zero measure, each of them can be covered by a countable system of pairwise disjoint open intervals Jk,i such that the sum of their lengths is arbitrarily small. Define the gauge S(x) for x e Jk,i so that the intervals (x — 5(x),x + 5(x)) are still contained in Jk,i- Outside of M, S is denned arbitrarily. For any 5-gauged partition E of the interval [a, b] the bound on the corresponding Riemann sum is given as k = 1, ^2f(tn)(Xi+l -Xi) = ^ f(£n)(xi+1 - 3=0 3=0 5,eM xm(x) dx for all sets M C [a, b] has the properties of a measure. The set of such measurable sets M is closed under finite intersections and countable unions. The measure m is additive with respect to unions of at most countable systems of pairwise disjoint sets. This measure coincides with theLebesgue measure. This measure is used in another concept of integration, which is extremely useful in higher dimensional applications, the Lebesgue integral. We do not go into more details here. We remark that a real function / is Lebesgue integrable if and only if its absolute value is K-integrable. A big advantage of the K-integral compared to other concepts is the possibility of integrating many functions which are not integrable in absolute value. Compare the concepts of convergence and absolute convergence of series. A typical example is the sinus integral over all reals. The K-integral of the sine function f0°° dx exists, while the absolute value g(x) = | sinc(a;) | is not Lebesgue integrable. Such integrals are important in models for signal processing where it is necessary to aggregate potentially infinite many interferences canceling each other by different signs. 6.3.20. The convergence theorems. We have dealt with uniform convergence and Riemann integrability. With the K-integral, there is a much nicer and stronger theorem available. A special case is the monotone convergence theorem for uniformly bounded functions fo (x) < f± (x) < .... 437 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Dominated convergence theorem Theorem. Suppose fo, fi, f2, ■ ■ ■ are all K-integrable functions on an interval [a, b], converging pointwise to the limit function f. If there are two K-integrable functions g and h satisfying for all n g N and x g [a, b], then f is K-integrable too, and For monotone convergence, there is a stronger result saying that a sufficient and necessary condition for the K-integrability of the pointwise limit is supn fn (x) dx < oo. This theorem could not be applied in our third example in 6.3.2. There the functions /„ have a "bump" which gets larger but narrower when close to the origin. The functions cannot be dominated by an integrable function. With the Riemann integral, a similar dominated convergence theorem can be proved, except that we have to guarantee the integrability of the pointwise limit /. g(x) < fn< h(x), f(t) dt = lim 438 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS D. Extra examples for the whole chapter 6.D.I. Determine the significant properties of the function /(z) = -3JTT, zgR\{-1}. O By "significant properties" is meant items such as domain, range, zeros, extrema, stationary and inflection points, points of discontinuity, intervals of increasing, decreasing, convexity, concavity, and asymptotes if applicable. Briefly, all the relevant information required to sketch a graph. 6.D.2. Determine the significant properties of the function fi* 6.D.3. Determine the significant properties of the function fix) = a3-3^_+te+1. 6.D.4. Determine the significant properties of the function fix) = yrxe~x. 6.D.5. Determine the significant properties of the function fix) =arctg 6.D.6. Determine the significant properties of the function In x 6.D.10. Determine the significant properties of the function x'-2 x-1 ■ 439 o o o o Especially, find the extremes, the points of inflection and the asymptotes and sketch its graph. O 6.D.7. Determine the significant properties of the function mix2 - ix + 2) + x. In particular, find the extremes, the points of inflection and the asymptotes: O 6.D.8. Determine the significant properties of the function ix2-2)ex2-\ In particular, find the extremes, the points of inflection and the asymptotes: O 6.D.9. Determine the significant properties of the function ln(2a;2-a;-l). Among other things, find the extremes, the points of inflection and the asymptotes. O CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Among other things, find the extremes, the points of inflection and the asymptotes. 6.D.11. Using any basic formulas, determine a primitive function for the function (a) y = \JxsJx yfx, x G (0, +00); (b) y={2x+ 3X)2 , 16I; (c) y = 77f=3P' xe (-M); COS x 1+sin x ' 6.D.13. Determine J i+x4 dx, x G . 6.D.14. Determine S x2-2x+3 dx, X G 6.D.15. For x G (0,1), compute f I f 2~\\ H—3, „ + 4 sin 2; — 5cosa:) dx. 6.D.17. Determine Sz^ldx, x>0. 6.D.18. Compute (a) Jxn lax dx, x > 0, n — 1; (b) / jf^z dx, 6.D.19. Determine o o 6.D.12. Find a primitive function for the function y = *x + vth? on the interval (-2,2). O o o o 6.D.16. Determine the indefinite integrals (a) Jarctga; dx, 168; (b) J^dx, x>0 using integration by parts. O o o 440 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS x2+4x+8 U"L' X fc o 6.D.20. Compute the indefinite integral of the function o 6.D.21. Determine 6.D.22. Integrate 6.D.23. Compute the integral o / przr. dx, x=£l. o O 6.D.24. For a; G (0, f), compute (a) J sin3 a; cos4 a; dx; (b) f |+cos29x dx; v y J l+cos 2x ' (c) J 2 sin2 | da;; (d) J cos2 a; da;; (e) J cos5 a; \/sina; da;; (f) f _dx_. ^ ' J sin2 x cos4 x ' (p-) f dx . Vo/ J sin3 x ' (h) /ife- o 441 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6.D.25. Compute the indefinite integral 1 x4 + 3x3 + 5x2 + Ax + 2 áx. 6.D.26. Compute the integral "í siní 1 — cos2 t dt. 6.D.27. Compute the integral ln2 dx 3ex 6.D.28. Compute: (i) Jq2 sin x sin 2x dx, (ii) J sin2 x sin 2x dx. 6.D.29. Compute the improper integral +oo (a) / T^; — oo +oo (b) / f; 0 (c) J^^dx; o 1 (d) J ln I a; I dx. -l 6.D.30. Determine 6.D.31. Compute the improper integrals 6.D.32. Compute 371-/2 J 1+sm x 0 +oo / mjdx. +oo 6.D.33. By using the substitution method, compute O o o o o o o o 442 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 0 _ 2 °° -(1/x) J x e~x dx; J —^— dx. -oo 0 6.D.34. Compute the integrals / "vf dx> Ie-vfdx' I e~vf dx- 6.D.35. Find the values of a G R, for which +oo (a) / ff G R; i i dx (b) / £ G R; o +oo (c) / sin(aa;) G 6.D.36. For which p, q G R is the integral o o o +oo r dx J xP(lnx)i 2 finite? O 6.D.37. Decide if the following is true: +oo (a) / ^GR; — OO +oo (b) / j% G R; — OO +oo (c) / ^0^t dx e K-i O 443 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS Solutions of the exercises 6A.4. 6A.5.p(5){x) = 12-5!;p(6)(x) = 0. 6A.6. 212 e2x + cos x. 6AJ.fi2S> (x) = -sinx + 226 e2a:. 6A.8. All of them. 6A.9. (a)u(0) = 6m/s; (b)t = 3s, s(3) = 16m; (c)v(A) = -2m/s,a(4) = -2m/s2 6A.12. f + i (x - 1) - i (x - l)2 + i (x - l)3. 6.A./3. (a) 1 + ^; (b) 1 - ^; (c) x - ^; (d) x + ^; (e) x + x2 + ^. 6.A./4. 2 (x - 1) - (x - l)2 + f (x - l)3 - i (x - l)4. 6.A./5. 3(l+s)3 ' . „;„ 1 o ^___- 180 6-1803 ' 6A.16.x-lr;sml sa — - 3; lrma;-x)+--j- 6.A./7. ELo fr z*\ n > 8, n e N. 6.A./S. (x - l)3 + 3 (x - l)2 + (x - 1) + 4. 6.A.26. 1 102.2 + 104.4!-6.A.27.1 — 3x + ^jx4; above the tangent line. 6A.29.(-^,^ 6A.30. It's convex on intervals (—oo, 0) and (0,1/2); concave on interval (1/2, +oo). It has only one asymptote, the line y = 7r/4 (v ±oo). 6A.31. (a) y = 0 at — oo; (b) x = 2 - horizontal, y = 1 v ±oo. 6A.32. y = 0 for x —> ±oo. 6^4.33. j/ = In 10, y = x + In 3. 6.B.I. S~„ , Sup = ^, Ss„, w = ^; yes, it is. 6.B.15. 2x3 + 3x2 - 2x - 13 + ^gff. 6.B./6.X3-ix + § + 5(3^. 6.B./7. (a) ^5 + ^5 - ^3; (b) § - 4j + -J^ + ^f^. 6.B./S. + 4 - 3J+1 1 x2—4x+13' 2 . 2o;-3 6.B.20. 4 - i + x^—x+2' 6.5.21. - - 4 + 4 - -4t. 6 11 00 A -I- B -i- £. -I- -Da:+E 1 FaJ+G 0.«.^. :c_2 -1- pj- -1- x -r (3x2+x+4yA -r 3x2+x+, 6.B.23.1 + 4 - 3 + -H- xyi x x — 2 6.B.24.(a)31n|x-2|;(b)^F. 6.B.44. y—^ for a e (0,1), 00 else. 6.C.2. s=0 converges for all real x. 2n-l 2n E(-!)K n = l converges for all real x. (2n)! 444 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6.C.4. 3(-l)"+1 /(-) = £■ S converges for x e (—1,1]. 6.C.5. It's good to realize we're expanding \ ln(s). /(x) = ^(-l)8+1^(x-l)8, Converges on interval (0, 2]. 6.C.7. The error belongs to the interval (0,1/200). 6.C.8. 0 < If Ins dx < ^, s Ins dx = In4 - f. 6.C.P. Vs"ds = | (2v^2 - 1). 6C18 V°° s2" 6.C.19. y = arctg s. 6.C.20. Exactly for s e 6.C.2/. | £~ 6.C.22. (-l)i+122i / 7T\2i+l f{x) = 1/2 + i=o (2J + 1)! V 4 The series converges for all s e _R. 6.C.23. E~ o £ (* - i)"; E~ o ^r2 6.C.24. /(s) = s, s e R; yes. 6.C.25. No. "•l—'<»• z^„=0 (2n)! X ' 6.C.27. (a) 1 - Yp-^r + ist^t ; (b) § - -^r. O.C.-iO. 2^„=0 (2n + l) n! X 6.C.29. y = arctg s. 6.C.30. Exactly for s e (- §, §), we have oo _j_ = i V f-2Ts" 5+2s 5 /-^ V 5/ 71 = 0 6.C.31. (a) -cotg s - s + C; (b) tg s - cotg x + C. 6.C.32. (a) -s2 cos s + 2s sin s + 2 cos s + C; (b) e31 (s2 -2x + 2) +C. 6.C.33. ^ (2 In2 s - 2 Ins + l) + C. 6.C.34. (2x-x2)ex +C. 6.C.35. (a) (2a:+5)11 + C; (b) + C; (c) -f e^3 + C; (d) 5 arcsin3 s + C; (e) ^ + C; (f) arctg2 + C; (g) ^ arctg ^ e* j + C; (h)2 sin ^ - 2^ cos + C. 6.C.36. For example 1 — s = t2x gives J" ^_^2 ^4 dt; and \/s2 + s + 1 = s + y leads to J" g2^_2. 6.C.37. ^2 ^tg s) + C. 6.C.38. Infinitely many. 6.C.39. For example, / can attain a value of 1 at rational points of the interval and be zero at irrational points. 6.C.40. (a) 2; (b) f - (c) 2In (l + v% (d) 2 - f. 6.C.4/. VE-V2. 445 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6.C.42. \b\ - \a\. 6.C.43. i In 2. 6.C.44. \ (e* - 2). 6.C.45.e- 5e_1. 6.C.46. §. 6.C.47. (a) 4; (b) 6.C.4S. p < g. 6.C.49. a > 0; 6 = 0; c > 0. 6.C.50. C — 1+ x—> — 1 — The function intersects the x axis only at the origin. It is positive for x < —1 and not positive for x > —1. It can be shown easily that lim f(x) = +oo, lim f(x) = —oo; x—>—OO x—S- + 00 /(*) = -f^. f"(x) = -jxTW> zeRx{-l}. This implies that / is increasing on the intervals [—2, —1), (—1,0] and decreasing on the intervals (—oo, —2], [0, +oo). At the stationary point xi = 0 it reaches a strict local maximum and at the stationary point xi = —2 it has a local minimum yi = 4. It is convex on the interval (—oo, —1) and concave on the interval (—1, +oo). It does not have a point of inflection. Thelinex = —lis a horizontal asymptote, the inclined asymptote at ±oo is the line y = -x + l. For example, /(-3) = 9/2, / (-3) = -3/4, /(l) = -1/2, / (1) = -3/4. 6D.1. The function is defined and continuous ont\ {0}. It is not odd, even, nor periodic. It is negative on the interval (1, +oo). The only point of intersection of the graph with the axes is the point [1,0]. At the origin, / has a discontinuity of the second kind and its range is is R, because lim f(x) = +oo, lim f(x) = —oo, lim f(x) = +oo. x—5-0 x—S- + 00 x—>—OO Moreover, /"(*) = £. zeBi\{0}. The only stationary point is xi = —$2. The function / is increasing on the interval [xi,0), decreasing on the intervals (—oo,xi], (0, +oo). Hence at xi it has a local minimum yi = 3/\/ 4. It has no points of inflection. It is convex on its whole domain. The line x = 0 is a horizontal asymptote and the line y = —x is an inclined asymptote at ±oo. 6D.3. The function is defined and continuous ont\ {1}. It is not odd, even nor periodic. The points of intersection of the graph of / with the axes are the points [l — \/2,0] and [0, —1]. At xo = 1, the function has a discontinuity of the second kind and its range is R, which follows from the limits lim f(x) = —oo, lim f(x) = +oo, lim f(x) = +oo. x—5-1— x—5-1+ x—5-ioo After the arrangement f(x) = (x-l)2 + ^-, xeM\{l}, it is not difficult to compute /(x) = 2<2^i, xeMx{l}, The only stationary point is xi = 2. The function / is increasing on the interval [2, +oo), decreasing on the intervals (—oo, 1), (1, 2]. Hence at the point xi it attains the local minimum yi = 3. It is convex on the intervals (—oo, 1 — v^2), (1, +oo) and concave on the intervals (l — y/2, l). The point x2 = 1 — \/2 is a point of inflection. The line x = 1 is a horizontal asymptote. The function does not have any inclined asymptotes. 446 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6.D.4. The function is defined and continuous on R. It is not odd, even nor periodic. It attains positive values on the positive half-axis, negative values on the negative half-axis. The point of intersection of the graph of / with the axes is only at the point [0,0]. The derivative is: f(x) = -^-yZe-x, xeR\{0}, /(0)=+oo, f'(x)= i61n{0}. jv x^ yv xD The only zero point of the first derivative is the point xo = 1/3. The function / is increasing on the interval (—oo, 1/3] and decreasing on the interval [1/3, +oo). Hence at the point xo, it has an absolute maximum yo = l/V3e. Since lima;->- f(x) = —oo, its range is (—oo, j/o]. The points of inflection are xi = X2=0, X3 = It is convex on the intervals (xi, X2) and (x3, +oo), concave on the intervals (—oo, xi), (x2, X3). The only asymptote is the line y = 0 at +oo, i.e. lima;->+ f(x) = 0. 6D.5. The function is defined and continuous ont\ {2}. It is not odd, even nor periodic. It's positive exactly on the interval (0, 2). The only point of intersection of the graph of / with the axes is the point [0,0]. At xo = 2, the jump of size it is observed, as follows from the limits lim f(x) = f, lim f(x) = -f. We have fV) = l?^rv xeRx{2}, = x6Rx{2}. The first derivative does not have a zero point. The function / is therefore increasing at every point of its domain. Since lim /(x) = -f, lim /(x) = -f, x—S-—OO x—S- + 00 its range is the set (—it/2, it/2) \ {—7r/4}. The function / is convex on the interval (—oo, 1), concave on the intervals (1,2), (2, +oo). Thus the point xi = 1 is a point of inflection with /(l) = 7r/4. The only asymptote is the line y = —7r/4 at ±oo. 6D.6. Domain R+, global maximum x = e, point of inflection x = \/e^ , increasing on (0, e), decreasing on (e, oo), concave on (0, Ve?, convex on (Ve?, oo), asymptotes x = 0 and y = 0, lim^o f(x) = —oo, lim^oo f(x) = 0. 6D.7. Domain R \ [1,2]. Local maximum x = 1~2v/j', concave on the whole domain, asymptotes x = 1, x = 2. 6D.8. Domain R. Local minimas —1,1, maximum at 0. Even function. Points of inflection ±-75 • no asymptotes. 6D.9. Domain R \ [—5,1]. No global extremes. No inflection points, asymptotes x = — |, x = 1. 6D.10. Domain R \ {1}. No extremes. No points of inflection, convex on (—00,1), concave on (1, 00). Horizontal asymptote x = 1. Inclined asymptote y = x + 1. 6D.U. (a) £ xVx^; (b) £j + 2^ + ^; (c) =f£; (d) In (1 + sinx). &D./2.ea: + 3arcsin§. 6.D.13. \ In (1 + x4) + C. 6.D.14. 2v/2arctg ^ + C. 6D.15. In J x ~x J + I arcsinx — 4cosx — 5 sinx + C. 6.D.16. (a) xarctgx - ln(1+3;2) + C; (b) ^ + C. 6.D.17. x - 2^/x~ + 2 in (1 + ^/xj + C. 6.D.18. (a) ^ lnx - + C; (b) ^f- + C. 6.D.19. f In (x2 + 4x + 8) - i arctg s±l + C. 6^^arctg2^ + ,IJ^ly + G 6.D.21. i In £±£ + ^ arctg 2^1 + C. 6.D.22. I In |x - 1| - § In (x2 + x + 1) - ^3 arctg ^ + C. 447 CHAPTER 6. DIFFERENTIAL AND INTEGRAL CALCULUS 6D.23. In (| x - 1 | (x - 2)4) - + x + C. 6.D.24. (a) - + C; (b) lJf + § + C; (c) x - sinx + C; (d) f + + C; (e) f sini x - f sini x + ^ sin^ x + C; (f) 4^ + 2tgx - + C; (g) i In |tg f | - ^ + C; (h) In |tg f | + C. 6.D.25. \ ln(x2 + 2x + 2) - \ ln(x2 + x + 1) + iy^arctan (^(23:+1)V3) + C. 6.D.26. iln(|±^f). 6.D.27. -| - fin 2. 6.D.2S. CO I. (ll) 2 Sm x- 6.D.29. (a) tt; (b) +oo; (c) 20; (d) -2. 6.D.30. -oo. 6.D.32. 2^1 jr. - |; 1. 6V.34.2-h1 - \\\- e ' e e'1 ' 6.D.35. (a) a > 1; (b) a < 1; (c) a = 0. For p > 1, q e R and for p = 1, q > 1. 6.D.37. (a) true; (b) false; (c) true. 448 CHAPTER 7 Continuous tools for modelling How do we manage non-linear objects? - mainly by linear tools again... A. Orthogonal systems of functions If we want to understand three-dimensional objects, we often use (one or more) two-dimensional plane projections of them. The orthogonal projections are special in providing the closest images of the points of the objects in the chosen plane. Similarly, we can understand complicated functions in terms of simpler ones. We consider their projections into the (real) vector space generated by those chosen functions. Perhaps we recall from Chapter 2 that the orthogonal projections In this chapter, we mainly deal with applications of the tools of differential and integral calculus. We consider a variety of problems related to functions of one real variable. The tools and procedures are similar to the ones shown in Chapter 3, i.e. we consider linear combinations of selected generators and linear transformations. This chapter serves also as a useful consolidation of background material before considering functions of several variables, differential equations, and the calculus of variations. We begin by asking how to approximate a given function by linear combinations from a given set of generators. Approximation considerations lead to the general concept of distance. We illustrate the concepts on rudiments of the Fourier series. Our intuition from the Euclidean spaces of low dimensions is extended to infinite dimensional spaces, particularly the concept of orthogonal projections. The next part of this chapter focuses on integral operators. These are linear mappings on functions which are defined in terms of integrals. Especially, we pay attention to convolutions and Fourier analysis. Throughout all these considerations, we work with real or complex valued functions of one variable. Only then do we introduce the elements of the theory of metric spaces. This should enlighten the concepts of convergence and approximation on infinite dimensional spaces of functions. It will also cover our needs in analysis on Euclidean spaces R™ in the next chapter. 1. Fourier series 7.1.1. Spaces of functions. As usual, we begin by choosing W appropriate sets of functions to use. We want #4Sfx] enough functions so that our models can conve-cSkJrUr~_ niently be applied in practice. At the same time, the functions must be sufficiently "smooth" so that we can integrate and differentiate as needed. All functions are denned on an interval / = [a,b] cl, where a < b. The interval may be bounded, (i.e., both a and b are finite), or unbounded (i.e., either a = — oo, or & = +oo, or both). CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING were easily computed in terms of inner products. Now we do the same for the infinite dimensional spaces of functions. The inner product mimics the product of scalars. Actually, the simplest way is to introduce the inner product to suitable vector spaces of functions on a given interval I = [a, b] C K in this way: {1,9) f(x)g(x)dx. We refer to this inner product as L2, see 7.1.3 for more information, including the complex valued functions. Such a scalar product allows us to calculate the projections to finite dimensional subspaces in the same way as we did in finite dimensional vector spaces. 7.A.I. Given the vector subspace (x2,1/x) of the space of real-valued functions on the interval [1,2], with the L2 product, complete the function 1 /a to an orthogonal basis of the subspace. Determine the orthogonal projection of the function x onto it, and compute the distance of the function x from this subspace. Solution. First, consider the basis. It is required that the function 1 jx be one of the vectors of the basis. The vector space in question is generated by two linearly independent functions, thus its dimension is 2. All of the vectors in it are of the form a ■ i + b ■ x2 for some a,b e K. It remains to find one more vector of the basis which is orthogonal to the function fi = 1/x. According to the Gram-Schmidt process, we seek it in the form /2 = x2 + k ■ i, k e K. The real constant k can be determined from the condition of orthogonality: 0 : Therefore, 1 2 , 1 -,xz + k ■ - X X -,a2 ) + k(-,- X / \ X X k : (I 1\ r21.1 Hr \x' xl Jl x x aX Thus, the requested orthogonal basis is (^, x2 — f). Next, we calculate the projection px of the function x onto this subspace (see (1) on page 110). We find _ \ ' xl _ I \ ' xl Px — / 1 1 v I X (a;2-3 2_3\ \ X 1 X * \ X1 X * v x' 2 15/2 ^ ä + 34^ ~x> Spaces of piecewise smooth functions We denote by 5° = 5° [a, 6] the set of all piecewise continuous functions on I = [a, b] with real or complex values. Otherwise put, all functions / in 5° = 5° [a, 6] have only finitely many points of discontinuity on bounded intervals. Moreover, / has finite one-sided left and right limits at every point in [a, 6]. In particular, / is bounded on all bounded subintervals. For every natural number k > 1, we consider the set of all piecewise continuous functions / such that all their derivatives up to order k (inclusive) lie in 5°. We denote this set by Sk [a,b], or briefly Sk. Note that the derivatives of functions in Sk need not exist at all points, but their onesided limits must exist. If the interval I is unbounded, we often consider only those functions with compact support. A function with compact support means that it is identically zero outside some bounded interval of the real line. For unbounded intervals, we denote by Sk the subset of those functions in Sk which have compact support. Functions in 5° are always Riemann integrable on the bounded interval I = [a, b], with both \f(x) dx < oo, and |/(a;)| da; < oo. Both integrals are finite for unbounded intervals if the function / has compact support. 7.1.2. Distance between functions. The properties of limits and derivatives ensure that Sk and Sk are vector spaces. In finite-dimensional spaces, the distance between vectors can be expressed by _ means of the differences of the coordinate components. In spaces of functions, we proceed analogously and utilize the absolute value of real or complex numbers and the Euclidean distance in the following way: The Li distance of functions The Li-distance between functions / and g in 5^ is denned by 11/-slli = / \f(x) -g(x)\ dx. ■J a If g = 0, then the distance from / to the zero function, namely is called the Li-norm (ie. length, or size) of The Li-distance between functions / and g (when both are real valued) expresses the area enclosed by the graphs of these functions, regardless of which function takes greater values. We observe that ||/ — g\\ 1 > 0. Since / and g are both piecewise continuous functions, 11/ ~ 5II i = 0 only if / and 9 differ in their values at most at the points of discontinuity, and hence at only finitely many points on any bounded interval. Recall that we can change 450 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Finally, the distance of a vector from the subspace is given by the norm of the difference between this vector and its projection. In this case: -Pxh /408 2 x = 0.0495. 15 34 da; □ 7.A.2. Consider the real vector space of functions on the interval [1,2] generated by the functions \,-^x,-^s with the L2 product. Complete the function \ to an orthogonal basis of this space. Determine also the projection of the functions ^ and x onto the above vector space. Then find their distances from this vector space. Solution. As in the previous exercise, use the Gram-Schmidt orthogonalization process (with the given scalar product). Successively, h{x) = -, 1 1 X 1 3 X2 ~ 4a? 1 3 a;3 ~ 2a? + is 22H = °-00167- The projection of x is 2/i + 96(—| 5760(-| ln(2) + ||)/3, while the distance is 0.035. ile the distance + ln(2))/2 + approximately The illustrations show the functions x and 1 /xA and their approximations. We can see that the function ^, which is the one whose shape is similar to that of one or more generators, is approximated much better by the projection. □ But there are plenty of inner products on functions. We mention two of them in the following exercises. 7.A.3. Verify that for each interval I = [a,b] c K and positive continuous function cu on 7, the formula the value of any function at a finite number of points, without changing the value of the integral. If in particular, / and g are both continuous on [a,b], then 11/ — 111 =0 implies j(x) = g(x) for all x G [a, b]. Indeed, if f(x0) ^ g(xo) at a point x0, a < x0 < b, and if / and g are both continuous at x0, then / and g also differ on some small neighbourhood of x0, and this neighbourhood, in turn, contributes a non-zero value into the integral, so that then 11/>o. If we have three functions /, g, and h, then, of course, b pb \f(x) — g(x) \ dx = \f(x) — h(x) + h(x) — g{x) \ dx J a pb pb < I \f(x) — h(x)\dx+ I \h(x) — g(x)| dx, J a J a so the usual triangle inequality II/-5II1 < + holds. To derive this inequality, we used only the triangle inequality for the scalars; thus it is valid for functions /,j£ cfP with complex values as well. 11 / — g 111 is not the only way to measure distance between two functions / and g. For another way: The -Lo—distance The L2-distance between functions / and g in cfP is denned by ll/-sl|2 = b \ 1/2 \f(x) - g(x)\2 dx If g = 0, then 11 /112, the distance from / to the zero function, is called the L2 norm of /. Clearly ||/||2 > 0. Moreover, ||/||2 = 0, implies that j(x) = 0 for all x except for a finite set of points in any bounded interval. As above for the L1 norm, ||/ — g\\2 = 0 only if / and g differ in their values at most at the points of discontinuity, and hence at only finitely many points on any bounded interval. In particular, if / and g are both continuous for all x, then ||/ — g\\2 = 0 implies j(x) = g(x) for all x. The square of 11 /112 for a function / is Il/I|22= f\!(x)\2dx and it is related to the well-defined symmetric bilinear mapping of real or complex functions to scalars if, 9) = / f(x)g(x) dx J a since (//}= / f(x)f(x)dx= / |/(a;)|2da;= ||/||22. J a J a We can use therefore all the properties of inner products in unitary spaces as described in Chapter 3. In particular, the CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING (f,a)u= f{x)g(x)w(x)dx J a defines an inner product on the continuous functions on /. Verify that choosing I = (—1,1) and ui(x) = (1 — a2)-1/2, the functions Tk(x) = cos(fcarccos(a;)), k £ N, form an orthogonal system of polynomials with respect to this inner product. These polynomials are called Chebyshev polynomials. Solution. We compare the denning formula with the L2 inner product above: Consider the substitution x = 1. To illustrate, we apply this procedure to the three polynomials 1, x, x2 on the interval [—1,1]. Put gx = 1, and generate the sequence 51 = 1 g2 = x g-i = x2 \\9i\ x ■ ldx \ ■ gx = x — 0 = x \\9i\\- \J-i fi x2 ■ ldx \ ■ gx \\92\ x2 ■ x dx J ■ g2 = x2 — The corresponding orthogonal basis of the space R2 [x] of all polynomials of degree less than three on the interval [—1,1] is 1, x, x2 — 1/3. Rescaling by appropriate numbers so that 452 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING This is the recurrent definition of Chebyshev polynomials. That all Tk(x) are polynomials now follows by induction. □ 7A.4. Show that the choice of the weight function ui(x) = e~x and the interval I = [0, oo) in the previous example leads to an inner product for which the Laguerre polynomials kj the basis elements all have length 1, yields the orthonormal basis Ln(x) = £ k=0 form an orthonormal system. k\ o 7A.5. Check that the orthonormal systems obtained in the previous two examples coincide with the result of the corresponding Gram-Schmidt orthogonalisation procedure applied to the system 1, x, x2,..., xn,..., using the inner products (, )u], possibly only up to signs. O Given a finite-dimensional vector (sub)space of functions, calculate first the orthogonal (or orthonormal) basis of this subspace by the Gram-Schmidt orthogonaliza-tion process (see 2.3.20). Then determine the orthogonal projection as before. See the formula (1) at page 110. 7A.6. Given the vector subspace (sm(x),x) of the space of real-valued functions on the interval [0, it] , complete the function a; to an orthogonal basis of the subspace and determine the orthogonal projection of the function \ sin(a;) onto it. O 7A.7. Given the vector subspace (cos(a;), x) of the space of real-valued functions on the interval [0, it] , complete the function cos (a;) to an orthogonal basis of the subspace and determine the orthogonal projection of the function sin(a;) onto it. o B. Fourier series Having a countable system of orthogonal functions Tk, k = 0,1,..., as in the examples above, we may sequentially project a given function / to 11 the subspaces 14 = (T0,..., Tk). If the limit of these projections exists, this determines a series built on linear combinations of Tk. Under additional conditions, this should allow us to differentiate or integrate the function / in a similar way, as we did with the power series. We consider one particular orthogonal system of periodic functions, namely that of J.B.J. Fourier. The periodic functions are those describing periodic processes, i.e. f(t+T) = f(t) for some positive constant T e K, hi h2 2' V 2 For example, hi = and -l llSif ' x, h3 Ydx ■ (3a;2-1). We could easily continue this procedure in order to find orthonormal generators of B4 [x]. The resulting polynomials are called Legendre polynomials. Considering all Legendre polynomials h{, i = 0,..., we have an infinite orthonormal set of generators such that polynomials of all degrees are uniquely expressed as their finite linear combinations. 7.1.4. Orthogonal systems of functions. Generalizing the latter example, suppose we have three polynon-ials hi, h2, h3 forming an orthonormal set. For any polynomial h, we can put H = (h, hi)hi + (h, h2)h2 + (h, h3)h3. We claim that H is the (unique) polynomial which minimizes the L2-distance \\h - H\\. See 3.4.3. The coefficients for the best approximation of a given function by a function from a selected subspace are obtained by the integration introduced in the definition of the inner product. This example of computing the best approximation of H by a linear combination of the given orthonormal generators suggests the following generalization: Orthogonal systems of functions Every (at most) countable system of linearly independent functions in 5^ [a, b] such that the inner product of each pair of distinct functions is zero is called an orthogonal system of functions. If all the functions /„ in the sequence are pair-wise orthogonal, and if for all n, the norm ||/n||2 = 1, we talk about an orthonormal system of functions. Consider an orthogonal system of functions fn £ 5° [a, b] and suppose that for (real or complex) constants c„, the series oo F(X) = ^Cnfn{x) 71 = 0 converges uniformly on a finite interval [a, b]. Notice that the limit function F(x) does not need to belong to 5° [a, b], but this is not our concern now. By uniform convergence, the inner product (F, fn) can be expressed in terms of the particular summands (see the corollary 6.3.7), obtaining oo j-b (F,fn) = E Cm / fm(x)fn(x) dx = Cn\\fn\\22, m=0 Ja 453 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING called the period of /, and all t G R. One of the fundamental periodic processes which occur in applications is a general simple harmonic oscillation in mechanics. The function f(t) which describes the position of the point mass on the line in the time t is of the form (1) f(t) = asin(o;i + b) for certain constants a, a; > 0, & G R. In the diagram on the left, f(t) = sin(i + 1) and on the right, f(t) = sin(4i + 4): Applying the standard trigonometric formula sin(a + ß) = cos a sin ß + sin a cos ß, with a, ß G R, we write the function f(t) alternatively as (2) f(t) = ccos(o;i) + dsin(cdt), where c = a sin b, d= a cos b. 7.B.I. Show that the system of functions 1, sin(a;), cos(a;), ..., sin(na;), cos(na;),... is orthogonal with respect to the L2 inner product on the interval I = [—tt, it] . O Building an orthogonal system of periodic functions sin(na;) and sin(na; + 7r/2) = cos(na;) leads to the classical Fourier series. 1 In application problems, we often meet the superposition of different harmonic oscillations. The superposition of finitely many harmonic oscillations is expressed by sums of functions of the form fn(x) = an cos(ncjx) + bn sin(na;a;) for n G {0,1,..., m}. These particular functions have prime period 27r/(neu). Therefore, their sum (3) 2 + £^ (a„ cos(na;a;) + bn sin(nüjx)) is a periodic function with period T = 2tt/uj. ^The Fourier series are named in honour of the French mathematician and physicist Jean B. J. Fourier, who was the first to apply the Fourier series in practice in his work from 1822 devoted to the issue of heat conduction (he began to deal with this issue in 1804-1811). He introduced mathematical methods which even nowadays lie at the core of theoretical physics. He did not pay much attention to physics himself. since each term in the sum is 0 except when m = n. Exactly as in the example above, each finite sum J2n=o cnfn(x) *s the best approximation of the function F(x) among the linear combinations of the first k + 1 functions /„ in the orthogonal system. Actually, we can generalize the definition further to any vector space of functions with an inner product. See the exercise 7.A.3 for such an example. For the sake of simplicity we confine ourselves to the L2 distance, but the reader can check that the proofs work in general. We extend our results from finite-dimensional spaces to infinite dimensional ones. Instead of finite linear combinations of base vectors, we have infinite series of pairwise orthogonal functions. The following theo-f ^ rem gives us a transparent and very general answer to the question as to how well the partial sums of such a series can approximate a given function: 7.1.5. Theorem. Let fn, n = 1, 2,..., be an orthogonal sequence of (real or complex) functions in 5° [a,b] and let g G 5° [a, b] be an arbitrary function. Put cn = H/nll 2 / g(x)fn(x) dx. J a Then (1) For any fixed n G N, the expression which has the least L2-distance from g among all linear combinations of functions fi ,...,/„ is n hn = y^Cjfj(x). i=l (2) The series lcn|2||/nl|2 always converges, and moreover oo £m2II.U2<:NI2- 71=1 (3) The equality X^i cn\\fn\\2 = ||ff||2 holds if and only if Hindoo \\g - hn\\2 = 0. Before presenting the proof, we consider the meaning of the individual statements of this theorem. Since we are working with an arbitrarily chosen orthogonal system of functions, we cannot expect that all functions can be approximated by linear combinations of the functions /,. For instance, if we consider the case of Legendre orthogonal polynomials on the interval [—1,1] and restrict ourselves to even degrees only, surely we can approximate only even functions in a reasonable way. Nevertheless, the first statement of the theorem says that the best approximation possible (in the L2-distance), is by the partial sums as described. The second and third statements can be perceived as an analogy to the orthogonal projections onto subspaces in terms of Cartesian coordinates. Indeed, if for a given function g, the series F(x) = J2^=i cnfn(x) converges pointwise, then the function F(x) is, in a certain sense, the orthogonal projection of g into the vector subspace of all such series. 454 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 7.B.2. Show that the system of functions 1, sin(no;a;), cos(no;a;), for all positive integers n is orthogonal with respect to the L2 inner product on the interval [—ir/ui,ir/cu]. o When projecting a given function orthogonally to the subspace of functions (3), the key concept is the set of Fourier coefficients an and bn, n6N, 7.B.3. Find the Fourier series for the periodic extension of the function (a) g(x) = 0, x £ [—7r, 0), g(x) = sinx, x £ [0,7r); (b) g(x) = \x\, x £ [-7T, tt); (c) g{x) = 0, x £ [-1, 0), s(a;) = x + 1, ir £ [0,1). Solution. Before starting the computations, consider the illustrations of the resulting approximation of the given functions. The first two display the finite approximation in the cases a) and b) up to n = 5, while the third illustration for the case c) goes up to n = 20. Clearly the approximation of the discontinuous function is much slower and it also demonstrates the Gibbs phenomenon. This is the overshooting in the jumps, which is proportional to the magnitudes of the jumps. The case (a). Direct calculation gives (using formulae from 7.1.6) 7t 0 7t ao =\ J 9{x)&x = ^ f Odai+i/sinaida; — tv — tv 0 1 [ itt 2 = - — COS Xin. = —, tv l ju tv ' The second statement is called Bessel's inequality and it is an analogy of the finite-dimensional proposition that the size of the orthogonal projection of a vector cannot be larger than the original vector. The equality from the third statement is called Parseval's theorem and it says that if a given vector does not decrease in length by the orthogonal projection onto a given subspace, then it belongs to this subspace. On the other hand, the theorem does not claim that the partial sums of the considered series need to converge point-wise to some function. There is no analogy to this phenomenon in the finite-dimensional world. In general, the series F(x) need not be convergent for any given x, even under the assumption of the equality in (3). However, if the series J2n= converges to a finite value, and if all the functions /„ are bounded uniformly on I, then, the series F(x) — J2n°=i cnfn(x) converges at every point x. Yet it need not converge to the function g everywhere. We return to this problem later. The proof of all of the three statements of the theorem is similar to the case of finite-dimensional Euclidean spaces. That is to be expected since the bounds for the distances of g from the partial sum / are constructed in the finite-dimensional linear hull of the functions concerned: Proof of theorem 7.1.5. Choose any linear combination / = 2~Zn=i anfn and calculate its distance from g. We obtain k ~b k ^2anfn\\2= g(x)-^2anfn 77=1 J 11 77=1 (x) dx b g(x)\2 dx - / y~]g(x)anfn(x) dx-ja 77=1 (x)g(x)dx+ / y~] anfn( -- 77=1 Ja 77=1 dx l~Y^ anCn\\fn\\2-^ancn\\fn\\2 +^a2n\\fn\ 77=1 k + ll/™H2((C™ ~ fl77)(c77 - an) - \Cn[ 77=1 k ^ ^ 11 /t7 II (I ^77 ^77 I I ^77 Since we are free to choose an as we please, we minimize the last expression by choosing an = cn, for each n. This completes the proof of the first statement. With this choice of an, we obtain Bessel's identity k k ii5-Ec^ii2 = ii5ii2-Eic"i2ii/™ii2- n=l n=l Since the left-hand side is non-negative, it follows that: Ec2ii/«ii2^ 455 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING in J g{x)cos(nx) Ax — tt 0 tt = ^/0da;+iJsinxcos(nx) Ax ^ J sin((l + n)x) + sin((l — n)x) A x j_ 2ir 2ir cos((l+n)x) cos((l—n)x) 1+n 1-n cos((l+n)7r) cos((l — 77)77) 1+n 1-n i 1 + 1+77 1—n 1 2ir ^ 1 + 77 1 ^(-1)" + 1 1-772 1-77 1 2 ' ai =57 Jsm(2a;) da; = 0 77 0 77 &i = ^ f g(a;) sin a; d a; = i J Oda; + iJ sin2 a; d x — tt — tt 0 = iJl-cos(2a:)da: = i[a:--^ 0 L 7T bn = ^ / g{x)svci(nx) Ax — tt 0 tt = ^:/0da; + dJsmxsin(na;) dx -tt 0 tt ~ ~krr S cos((l — n)x) — cos((l + n)x) A: _ 1 sin((l—77)x) sin((l+77)x) _ 2tF [ 1-77 1 + 77 for n G N \ {1}. Thus, we arrive at the Fourier series 1 sin a; 1 v-^ / i — - + — + 77 = 2 V tt -— cos(na;J Since (—1)™ + 1 = 0 when n is odd, and = 2 when n is even, we can put n = 2m to obtain the series 1 sin a; 2 ^-^ 77 ^ 9~ '77 ^ ä(2r cosl^ma;] tt z—' 4m2 — 1 777=1 The case (b). The given function is of a sawtooth-shaped oscillation. Its expression as a Fourier series is very important in practice. Since the function g is even on (—tt, tt), it is immediate that bn = 0 for all n e N. Therefore, it suffices to determine an for n£N: ao = ^ ] g{x)Ax = l]xAx = l H\ = ^- -tt 0 L JO For other neN, use integration by parts, to get tt tt an = — J g{x) cos(na;) d x = -fx cos(na;) d x -tt 0 tt = - \- sm(nx)]^--— f sinfna;) da; tt L 77 ^ ' \ 0 77tt j ^ ' = ^F[cos(na;)]J=^F0((-l)n-l)-So an = — t4- for n odd, an = 0 for n even. Let k —> oo. Since every non-decreasing sequence of real numbers which is bounded from above has a limit, it follows that OO 77=1 which is Bessel's inequality. If equality occurs in Bessel's inequality, then statement (3) follows straight from the definitions and the Bessel's identity proved above. □ An orthogonal system of functions is called a complete orthogonal system on an interval I = [a,b] for some space of functions on I if and only if Parseval's equality holds for every function g in this space. 7.1.6. Fourier series. The coefficients Cn from the previous theorem are called the Fourier coefficients of a given function in the (abstract) Fourier series. The previous theorem indicates that we are able to work with countable orthogonal systems of functions /„ in much the same way as with finite orthogonal bases of vector spaces. There are, however, essential differences: • It is not easy to decide what the set of convergent or uniformly convergent series F(x) = E °nfn looks like. • For a given integrable function, we can find only the "best approximation possible" by such a series F(x) in the sense of L2-distance. In the case when we have an orthonormal system of functions /„, the formulae mentioned in the theorem are simpler, but still there is no further improvement in the approximations. The choice of an orthogonal system of functions for use in practice should address the purpose for which the approximations are needed. The name "Fourier series" itself refers to the following choice of a system of real-valued functions: Fourier's orthogonal system 1, sin a;, cos a;, sin 2a;, cos 2a;, smni, cosna;, An elementary exercise on integration by parts shows that this is an orthogonal system of functions on the interval [-7T, TT]. These functions are periodic with common period 27r (see the definition below). "Fourier analysis", which builds upon this orthogonal system, allows us to work with all piece-wise continuous periodic functions with extraordinary efficiency. Since many physical, chemical, and biological data are perceived, received, or measured, in fact, by frequencies of the signals (the measured quantities), it is really an essential mathematical tool. Biologists and engineers often use the word "signal" in the sense of "function". 456 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING This determines the Fourier series of a function of sawtooth-shaped oscillation as TT 2^/(-l)"-l . , - + -> -5-cosfna;) 2 tt ^ \ n2 ' tt 4 cos((2ti — l)a;) Periodic functions tt —' (2ti-1)2 77=1 v ! tt 4 / cos(3a;) cos(5a;) ---cos x H---—- H--+ • 2 f 32 52 This series could have been found by an easier means, namely by integrating the Fourier series of Heaviside's function (see "square wave function" in 7.1.9 ). The case (c). The period for this function is T = 2, and so cj = 2tt/T = tt. Use the more general formulae from 7.1.6, namely xo+T 1 ao = T I d(x) da; = J g(x) Ax x0 — 1 0 1 = J Odx + j(x + l) dx = §, -1 0 x0+T 1 an = ij; J g{x) cos{nux) dx = J g(x) cos(mrx) dx x0 — 1 0 1 = J" 0 d a; + J (a; + 1) cos(n7ra;) d x = I . -1 0 x0+T 1 bn - 2 ir2 ' _ l-2(-l) 777T T J g(x) sin(ncjx) dx = J g(x) sm(mrx) dx xo —1 0 1 = J0da; + J(a; + 1) sin(n7ra;) d x -1 0 The calculation of a0 was simple and needs no further comment. As for determining the integrals at an and bn, it sufficed to use integration by parts once. Thus, the desired Fourier series is I + g (izir^i cos{n7TX) + iz2tn 4 \ 71 7T V / 777T \(nTTX) Some refinements of the expression are available. For instance, for n G N, «77 = — ^f2" f°r n °dd, «77 = 0 for 71 even, and, similarly, K = ^ for 71 odd, bn = - ^ for 71 even. □ A real or complex valued function / defined on R is called a periodic function with period T > 0 if f(x + T) = j(x) for every 16I, It is evident that sums and products of periodic functions with the same period are again periodic functions with the same period. We note that the integral f*°+T f(x) dx of a periodic function / on an interval whose length equals the period T is independent of the choice of a;0 G R. To prove it, it is enough to suppose 0 < a;0 < T, using a translation by a suitable multiple of T. Then, rxo+T rT rxo+T f(x) dx = I f(x) dx + I f(x) dx l xo J xo JT rT rxo rT f(x)dx+ / f(x)dx= / f(x)dx Jo Jo Fourier series The series of functions F[x) = — + 'Sy^j (an cos{nx) + bn sin(na;)) 2 n=l from the theorem 7.1.5, with coefficients 1 rxo+2-k TT J Xo 1 rXo+2-k TT J xo In the next exercise we show that the calculation of the Fourier series does not always require an integration. Es- which have values pecially in the case when the function g is a sum of products (powers) of functions y = sin(ma;), y = cos(nx) for a™ 771,71 G N, one can rewrite g as a finite linear combination of basic functions. g(x) cos(nx) dx, g(x) sin(7ia;) dx, is called the Fourier series of a function g on the interval [a;0,a;o + 2tt}. The coefficients an and bn are called Fourier coefficients of the function g. If T is the time taken for one revolution of an object moving round the unit circle at constant speed, then that constant speed is a; = 27r/T. In practice, we often want to work with Fourier series with an arbitrary primary period T of the functions, not just 271. Then we should employ the functions cos(o;7ia;), sm(uinx), where cu = 2f. By substitution t = cux, we can verify the orthogonality of the new system of functions and recalculate the coefficients in the Fourier series F(x) of a function g on the interval [x0, x0 + T]: F{x) = + (an cos(nux) + bn sm(nujx)^, by, = n- =1 2 rxo+T T . Ixo 2 rxo+T T . Ixo g(x) cos(ncjx) dx, g(x) sin(7io;a;) dx. 457 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 7.B.4. Determine the Fourier coefficients of the function (a) g(x) = sin(2a) cos(3a), a £ [—tt, 7t]; (b) g(x) = cos4 x, x £ [—7r,7r]. Solution. Case (a). Using suitable trigonometric identities, sin(2a) cos(3a) = |(sin(2a + 3a;) + sin(2a — 3a)) = ^ sin(5a) — i sin a. It follows that the Fourier coefficients are all zero except for&i = -1/2, b5 = 1/2. Case (b). Similarly, from cos a = (cos a) = ( 2y—-) = \ (1 + 2 cos(2a) + cos2(2a)) 7.1.7. The complex Fourier coefficients. Parametrize the unit circle in the form: piut _ CQSUJt _|_ j smCl;f. For all integers m, n with m ^ n, ptmx e-.m dx= I et(m-a)x ^ |(l + 2cos(2a) + l+cos(4x) > = § + \ cos(2a) + \ cos(4a), a £ R. Hence a0 = 3/4, a2 = 1/2, a4 = 1/8, and the other coefficients are all zero. □ 7.B.5. Given the Fourier series of a function / on the interval [—7T, 7r] with coefficients am, &„, m £ N, £ Z+, prove the following statements: (a) If /(a) = /(a + tt), a £ [-tt,0], then a2fc-i = &2fc-i = 0 for every A; £ N. (b) If /(a) = -/(a + 7r), a £ [-7T, 0], then a0 = a2k = b2k = 0 for every k £ N. Solution. The case (a). For any k £ N, the statement can be proved directly by calculations, but we provide here a conceptual explanation. The definition of the function / ensures it is periodic with period -k. Thus we may write its Fourier series on the shorter interval [—-|, f] as follows oo f{t) =--h (än cos(2ní) + bn sin(2ní)). 71=1 Clearly this must be also the Fourier series of the same function / over the interval [—tt, tt] and so the claim is proved. Alternatively, if oo /(a) = a0/2 + ^( an cos(na) + bn sin(na)) 77=1 then oo /(a+7r) = ag/2+(a„ cos(na+?i7r)+?)Tl sin(na+?i7r)) 77=1 oo = ag/2 + (—1)™ (an cos(na) + bn sin(na)) 77=1 The two series are the same only when the odd coefficients are zero. _ 1 \ei(m-n)xY _ q Thus for m^n, the integral {etmx, emx) = 0. Fourier's complex orthogonal system —nut — cut -I cut J2ut nut 6 ,...,6 ; , c , • • • Note that if m = n, then etmx e~tmx dx = I dx = 2tt. The orthogonality of this system can be easily used to recover the orthogonality of the real Fourier's system: Rewrite the above result as (cos ma + i sin ma) (cos nx — i sin nx) dx = 0 By expanding and separating into real and imaginary parts we get both (cos ma cos nx + sin ma sin nx) dx = 0 (sin ma cos nx — cos ma sin nx) dx = 0 By replacing n with —n, we have also (cos ma cos nx — sin ma sin nx) dx = 0 (sin ma cos nx + cos ma sin nx) dx = 0 r and hence, with m ^ n, cos ma cos nx dx = 0 sin ma sin nx dx = 0 T sin ma cos nx dx = 0 r which proves again the orthogonality of the real valued Fourier system. Note the case m = n > 0, when cos2 nx dx = || cos(na) ||| = tt, sin2 nx dx = \\ sin(na)||| = tt, lín = 0, then ||1||| = 2tt. 458 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING The case (b). Similarly if oo an cos(nx) + bn sin(na;)) n=l then — f(x + tt) oo = — ao/2 — (an cos(nx + mr) + bn sin(na; —\-mr)) n=l oo = — ciq/2 + (—(an cos(nx) + bn sin(na;)). n=l The two series are the same only when the even coefficients are zero. □ Complex Fourier series. It is sometimes convenient (and of-N\^_Lv ten easier) to express the Fourier series using the "Af^o complex coefficients cn instead of the real coeffi-W cients an and bn. This is a straightforward consequence of the facts emu)x _ cos(jlulx) _|_ j sin(na;a;) or, vice versa cos(no;a;) = (e«™* + e-«™*) sin(no;a;) = ^(e"™* -e~m"x). The resulting series for a real or complex valued function g on the interval [—tt, tt] is F(x) = X^L-oo cn elnx with cn = ^ J\-^g(x)dx. See the explanation in 7.1.7. We need just one formula for Cn, rather than one for an and another one for bn. 7.B.6. Compute the complex version of the Fourier series F(x) of the 27r-periodic function g(x) denned by g(x) = 0, if — tt < x < 0, while g(x) = 1 if 0 < x < tt. Solution. We have for n^O, while cq = ^ JJ1 d x = 1/2. So For a (real or complex) function f(t) with —T/2 < i < T/2, and all integers n, we can define, in this context, its complex Fourier coefficients by the complex numbers _ 1 T/2 T/2 me- at. The relation between the coefficients an and bn of the Fourier series (after recalculating the formulae for these coefficients for functions with a general period of length T) and these complex coefficients cn follow from the definitions, cq = ao/2, and for natural numbers n, we have Cn — 2 (^71 ibn), C—n — 2 (^n + ibn) • If the function / is real valued, cn and c_„ are complex conjugates of each other. We note here that for real valued functions with period 27r, the Bessel inequality in this notation becomes °° -1 fit (l/2)|a0|2 + E(l«n|2 + l^|2) 459 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING In the second of the sums, replace n with —n. The second sum is then Y^Li oo 1 (l-(-l)") so that 1 1 00 ^> = \ + hY. 71=1 ~ 2 + 2ti ^ n= l l ^ n=l (l-(-i)") (l-(-i)") ^inx _mx^ 2i sin(na;) 7.1.9. Periodic extension of functions. The Fourier series converges, of course, outside the original interval [—T/2, T/2], since it is a periodic function on R. The Heaviside function g is denned by -1 if - 71 < X < 0, 1 if 0 gence of the Fourier series for the function C»2rVrT g(x)=e~x, ie[-l,l). Solution. It is not necessary to calculate the corresponding Fourier series if we wish only to check convergence (for the similar calculation in the case of function ex see 7.B.13). Define the function s(x) on R with period T = 2 as follows: If the interval [—T/2, T/2] is chosen for the prime period T of such a square wave function, the resulting Fourier series is sin(o;a;) H— sin(3o;a;) H— sin(5o;a;) + . s(x) = g(x) = e x, xe (-1,1), g(-l)+ lim g{x) , . x-s-l- e-|-e s(!) = -Ö- = ö~ The number uj = %f is also called the phase frequency of the wave. As the number of terms of the series increases, the approximation gets much better except in a (still shrinking) neighbourhood of the discontinuity point. There, the maximum of the deviation remains roughly the same. This is a general property of Fourier series and it is called the Gibbs phenomenon. In accordance with 7.1.8(1), the Fourier series of the Heaviside function converges to the mean of the two onesided limits at the points of discontinuity. 460 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING This function is the sum of the Fourier series in question, cf. Theorem 7.1.8. In other words, the Fourier series converges to the periodic function s(x). Moreover, this convergence is uniform on every closed interval which contains none of the points 2k + 1, k G Z. This follows from the continuity of the functions g and g' on (—1,1). On the other hand, the convergence cannot be uniform on any interval (c, d) such that [c, d] contains an odd integer. This is because the uniform limit of continuous functions is always a continuous function, and the periodic extension of s is not continuous at the odd integers. Thus, the series converges to the function g on (—1,1), yet this convergence is uniform only on the subintervals [c, d] which satisfy the restriction — 1 < c < d < 1. □ 7.B.10. Determine the cosine Fourier series (see the definitions at the end of the paragraph 7.1.10) for a periodic extension of the function g(x) = 1, x G [0,1), g(x) = 0, x G [1.4). Determine also the sine Fourier series for f(x)=x-l, x£(0,2), f(x) = 3-x, xe[2,4). Solution. We have already encountered the construction of a cosine Fourier series in 7.B.4(b) and also 7.B.3(b). It is the case of the Fourier series of an even function. Therefore, the first thing we to do is extend the definition of the function g to the interval (—4,0) so that it is even. This means g(x) := 1 for x G (-1,0), g(x) := 0 for x G (-4,-1]. Now we can consider its periodic extension onto the whole R with period x0 = —4, T = 8 and cu = n/4. Necessarily bn = 0 for all n G N in a cosine Fourier series. We determine the Fourier coefficients an, n G N xo+T 1 1 ao =T J 9(x) dx = | J 1 dx = 1 Jl dx = 1, x0 xo+T 0 J g(x) cos [nujx) d x = ^ J < _2 T x0 -2- sin nir 4 d x We use (1) f(x) dx = 2 J f(x) dx, -a 0 which is valid for every even function / integrable on the interval [0, a]. The expression sin(n7r/4) is conveniently left as is, rather than the alternative of sorting out when it attains which Of course, we cannot expect that the convergence of Fourier series for functions g with discontinuity points be uniform (then, the function g would have to be continuous itself, being a uniform limit of continuous functions). However, the converengence is uniform on any subinterval where the original function is continuous. 7.1.10. Utilizing symmetry of functions. We consider the problem of finding the Fourier series of the function j(x) = x2 on the interval [0,1]. If we just periodically extend this function from the given interval [0,1], the resulting function would not be continuous, and so the convergence at integers would be as slow as in the case of a square wave function. However, we can easily extend the domain of / to the interval [—1,1], so that j(x) = x2 is an even function for — 1 < x < 1. If we then extend periodically, the result is continuous. The resulting Fourier series then converges uniformly, and since then / is even, only the coefficients an axe non-zero. For n > 0, iterated application of integration by parts yields: 2 2.J-! 4 The remaining coefficient is a0 2 1 x cos(tttix) dx Jo x dx = 2 I x dx = o The entire series giving the periodic extension of a;2 from the interval [—1,1] thus equals ^1 4 (-i)re 3 77" z—' 71" 71=1 s(7rna;). By the Weierstrass criterion, this series converges uniformly. Therefore, g(x) is continuous. By theorem 7.1.8, g(x) = j(x) = x2 on the interval [—1,1]. Thus our series approximates the function a;2 on the interval [0,1] better (i.e. faster) than we could achieve with the periodic extension of the function from [0,1] interval only. If we put x = 1 and rearrange, we obtain the remarkable result 7T2 1 7T2 f,(-l)2« ^1 71=1 71=1 We proceed with a further illustration. Because of the uniform convergence, we can differentiate term by term and calculate the Fourier series on the interval —1 < x < 1, for the function x E (-i) 71+1 - sin(7rna;). This series cannot converge uniformly since the periodic extension of the function x from the interval [—1,1] is not a continuous function. However, it does converge pointwise 461 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING of its five different values. Thus we write the cosine Fourier series in the form: oo z + E (^sin^cos^). 71=1 The sine Fourier transform of the function can be determined analogously from the odd extension of the given segment. Again, T = 8 and cu = tt/4 for the function /. This time it is the coefficients an,n G Nu{0}, which are zero. To find the remaining coefficients, use integration by parts and (1) (the product of two odd functions is an even function). For all n G N x0+T bn = j! J f(x) sin {nujx) Ax xo / 2 4 = 11 f(x-l)sin2f2- da;-/(a;-3) sin dx = [(l-a^cos^+^sin^]2, -[(3-^cos^^-^sinnp]* = ^((-l)n-l) + T^sinIiL- Immediately, bn = 0 when n is even. So the sine Fourier series can be written oo / E ((-!)"-!)+ sin ^) sin 77=1 \ —1 < x < 1, (see our reasonings about alternating series in 5.4.6), thus we have verified the above equality. Similarly, we can integrate the Fourier series of x2 term by term obtaining 1 2 4 ^ (_!)n _ -x H--77 y -ö—sin 7T?ia; . T3 „3 v ' This is valid for —1 < x < 1. It is not valid for other values of x, since the series is periodic, but the other two terms are not. Of course, we may substitute the above Fourier series of the function x and thus obtain the Fourier series for a;3 on the intervat [—1,1] this way. In this context, we use the following terminology: The sine and cosine Fourier series / / -4 i (-1)"-116 \ ■ (2n-l)iTX 2-j \ \ (2n-l)ir + (277-l)27T2 j Sm 4 □ 7.B.11. Express the function g(x) = cosx,x G (0,7r), as the cosine Fourier series and the sine Fourier series. Solution. Start with the sine series. This is the odd extension of the cosine function. Necessarily, an = 0, n G N U {0} for the sine series. tv tv bi = -\ J cos x sin x d x = -\ J sin(2a;) d x = 0. For all n o o bn = ^ Jcosxsin(na;) da; o = —Jsin((n + l)x) + sin((n — l)x) dx 71 o _ 1 |"cos((77+l)x) cos((n-l)x)]7r _ 2n((-l)" + l) 7T 77+1 (772-1)tT So bn = 0 for odd neN, and bn = ^n24^'1-)7i for even n. We conclude that OO , \ cosa; = Yi, [(AJ-i)tt sin(2na;)j , x G (0,tt). Of course, for the even extension of the function g in question, g(x) = cosx, x G (—7i,7r). Thus the right hand side is the uniquely given cosine Fourier For a given (real or complex) valued function / on an interval [0, T) of length T > 0, the Fourier series of it's even periodic extension (with period 2T) is called the cosine Fourier series of /, while the Fourier series of the the odd periodic extension of / is called the sine Fourier series of the function /. 7.1.11. General Fourier series and wavelets. In the case of a general orthogonal system of functions /„ and the series generated from it, we often talk about the general Fourier series with respect to the orthogonal system of functions /„. Fourier series and further tools built upon them are used for processing various signals, illustrations, and other data. In fact, these mathematical tools also underpin many fundamental models in science, including, for example, modeling of the function of the brain, as well as much of theoretical physics. The periodic nature of the sine and cosine functions used in classical Fourier series, and their simple scaling by increasing the frequency by unit steps, limit their usability. In many applications, the nature of the data may suggest more convenient and possibly more efficient orthogonal systems of functions. Requirements for fast numerical processing usually include quick scalability and the possibility of easy translations by constant values. In other words, we want to be able to zoom quickly in and out with respect to the frequencies and, at the same time, to localize in time. Fast scalability can be achieved by having just one wavelet mother function1 t/j, if possible with compact support, from which we create countably many functions series. □ ^The roots of wavelets go back to various attempts, how to localize the basic signals in both time and frequency, with diverse motivations from engineering and other applications. The name wavelet seems to be related to the idea of having a wave similar signal which begins and ends with zero amplitude. Since late 1970's, these attempts were related to many names (e.g. Morlet, Meyer) and the wavelet theory became the main tool in signal analysis. . Of course, first examples of wavelets are much older, the Haar's construction goes back to 1909. Actually, many of the wavelet types do not represent orthogonal systems of functions, they rather share the idea of a 462 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 7.B.12. Write the Fourier series of the 7r-periodic func--00 tion which equals cosine on the interval [♦^J/n (-7r/2,7r/2). Then write the cosine Fourier series of the 27r-periodic function y = | cos x |. Solution. We are looking for one Fourier series only, since the second part of the problem is just a reformulation of the first part. Therefore, we construct the Fourier series for the function g(x) = cos 2, x G [—7r/2,7r/2]. Since g is even, bn = 0, n G N. We compute rr/2 a„ = ^ J cosxcos(2na;) da; —ir/2 ir/2 = f / 2"(cos((2n + l)x) + cos((2n- l)a;)) d -ir/2 _ J_ |"sin((2re+l)3:) 2n+l 7t ( 2n+l - 4 (-1) + + sin((2Tt-l)a:) 2n-l tt/2 —ir/2 4n2-l for every 116N, The calculation is also valid for n = 0, thus a0 = 4/7T. The desired Fourier series is f + . E (^rCos(2na;) □ 7.B.13. Expand the function g(x) = ex into (a) a Fourier series on the interval [0,1); (b) a cosine Fourier series on the interval [0,1]; (c) a sine Fourier series on the interval (0,1]. Solution. Note the differences between the three cases as shown in the diagrams. The approximation differs in relation to the continuity. The first diagram uses n = 5, the second uses n = 3, while the last diagram uses n = 10. j, k G Z, by integer translations and dyadic dilations: %pj,k{x) = 2j/2ip(2jx - k). It is wise to rescale and choose ip with L2-norm equal to one (as a function on R). Then the coefficients 2J/2 ensure that the same is true for all ipj^- Of course, the shape of the mother wavelet ip should match and cover all typical local behaviour of the data to be processed. We say that ip is an orthogonal mother wavelet if the resulting countable system of functions ipjk is orthogonal and, at the same time, it is "reasonably dense" in the space of functions with integrable squares. We come to this concept in more detail later on. The effectivity of the wavelet analysis is another issue which needs further ideas and concepts to be built in. We do not have space here to go into details, but the readers may find amany excellent specialized books in this fascinating area of applied Mathematics. Here we consider one simple example. 7.1.12. The Haar wavelets. Perhaps the first question to start with is, how to effectively approximate any given function with piecewise constant ones. For various reasons, it is good if our mother wavelet ip has zero mean, too. Thus we want to consider an analogue of the Heaviside function ip(x) = 1 x G [0,1/2) -1 x G [1/2,1). As a straightforward exercise we may check that, indeed, the resulting system of functions ipj^ is orthonormal. Another exercise shows, using finite linear combinations of these functions, that we may approximate any constant function with given precision over a bounded interval. In an exercise we shall see ?? that this already verifies the density properties required for the orthogonal mother wavelet functions. Now we consider the question of effective treatment. Notice that we can also use the characteristic function ip of the interval [0,1) and write ip{x) = ip(2x)-ip(2x-l) = -^V2ph0(x)--^=V2p1A(x) The function p plays the role of the father wavelet function and it itself satisfies p(x) = y —i---—2"T-■-~ The function <£> has another useful feature. Namely we can obtain the unit constant function by adding all its integer translations £ 1 and 1/p + 1/q = 1, rb / r-b \ i/p |/(2;)| dx < \b-a\^q( / \f(x)\p dx < Ib-a^WfWp. Replace / with /„ — /. It is then clear from the above bound that Lp-convergence /„ -+ / implies, for any p > 1, Li-convergence. (The terminology Lp-convergence is stronger than L1 -convergence is sometimes used). With a modified bound, we can derive an even stronger proposition, namely that on a bounded interval, Lg-convergence is stronger than Lp-convergence whenever q > p; try this by yourselves. If uniform boundedness of the sequence of functions /„ is given, then there is a constant C independent of n, so that ll/nll < C. Then wecan assert that \ fn(x)—f(x)\ < 2C, and it then follows that Li-convergence implies Lp-convergence, since then fb \f(x)-fn(x)\P dx = \f(x)-fn(x)r1\\f(x)-fn(x)\dx <{2C) p-1 \f(x) - fn(x) \ dx which can be written l|/-/n||p<(2C)1^||/-/rl||^. It follows that the Lp-norms on the space S° [a, b] are equivalent with respect to the convergence of uniformly bounded sequences of functions. The most difficult (and most interesting) problem is to prove the first statement of the theorem 7.1.8, which in the literature is often refered to as the Dirichlet condition, which seems to have been derived as early as in 1824. We begin by proving how this property of piecewise convergence implies the statements (2) and (3) of the theorem. Without loss of generality, we assume that we are working on the interval [—tt, tt], i. e. with period T = 2ir. 466 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 2 SO (2n-l)4 tt J dx = - fx tv j 0 2 dx = 2f E (2n-l)4 77=1 V 7 2^2 3 tt ~2 tt 16 7T 96' □ There are other ways of obtaining this result, see for example (3) at the page 501. We recommend comparing the solutions of this exercise to the previous one. We started our discussion of Fourier series with the sim-S°°2> plest of periodic functions f(i) = asinfui + 6) for certain constants a, cu > 0, b e R. They appear as the general solution to the homogeneous linear differential equation (1) 0 y +u y which arises in mechanics by Newton's law of force for a moving particle. Recall the brief introduction to the simplest differential equations in 6.2.14 on page 406. Much more follows in Chapter 8. We mention that the function / has period T = 2tt/u. In mechanics, one often talks about frequency 1/T. The positive value a expresses the maximum displacement of the oscillating point from the equilibrium position and it is called the amplitude. The value b determines the position of the point at the initial time t = 0 and it is called the initial phase, while cu is the angular frequency of the oscillation. Similarly, the function z = g(t) describes the dependence of voltage upon time t in an electrical circuit with inductance L and capacity C and which is the solution of the differential equation (2) z" + cu2z = 0. The only difference between the equations (1) and (2) (besides the dissimilar physical interpretation) is the constant cu. In the equation (1), there is a/2 = k/ra where k is the proportionality constant and m is the mass of the point, while in the equation (2), there is cu2 = (L(7)_1. We illustrate how Fourier series can be applied in the theory of differential equations. Consider only the non-homogeneous (compare to (1)) differential equation (3) y" + a2y = f(x) with y an unknown in variable 168, with a periodic, continuously differentiable function / : R —> R on the right-hand As the first step, we prepare simple bounds for the coefficients of the Fourier series. One bound is of course i r \an\ < - \f(x)\ dx, T J-tt and similarly for all the coefficients bn. This is because both cos (x) and sin(a;) are bounded by 1 in absolute value. However, if / is a continuous function in S1 [a, b], we can integrate by parts, thus obtaining i r an(f) = — f(x)cos(nx) dx utt utt 1 f'(x) sin(na;) dx = -bn(f). We write an(/) for the corresponding coefficient of the function /, and so on. Iterating this procedure, we really obtain a bound for functions / in Sk+1 [—tt, tt] with continuous derivatives up to order k inclusive M/)l 0. Let T > 0 be the prime period of the function / and let its Fourier series on [-T/2, T/2] be known, i.e. we assume (4) /(*) = ^ + £ 2tttix 2imx\ An cos ——--h Bn sin - T T ) 7.B.18. (1) Prove that if the equation (3) has a periodic solution on R, then the period of this solution is also a period of the function /. Further, prove that the equation (3) has a unique periodic solution with period T if and only if 27171 a for every n£N. Solution. Let a function y = g(x), x e R, be a solution of the equation (3) with f(x) ^ 0 and with period p > 0. In order to substitute the function g into a second-order differential equation, its second derivative g" must exist. Since the functions g, g', g", ... share the same period, the function g"(x) + a2g(x) = f(x) is also periodic with period p. In other words, the function / is periodic as a linear combination of functions with period p. Thus, we have proved the first statement claiming that p = lT for a certain / G N. Suppose that the function y = g(x), x e R, is a periodic solution of the equation (3) with period T and that it is expressed by a Fourier series as follows: (2) oo g(x) = — + (an cos (umx) + bn sin (umx)), 71=1 where ui = 2-k/T. If g satisfies the equation (3), it has a continuous second derivative on R. Therefore, for i£8, oo g'(x) = Y2 (umbn cos (umx) — uman sin (umx)), 77=1 (3) oo g"(x) = (—üj2n2an cos (umx) — ui2n2bn sin (una;)), 77=1 Substituting (4), (2) and (3) into (3) yields oo a2^aQ + ((—u)2n2an + a2an) cos (nuix) + 77=1 (—ui2n2bn + a2bn) sin (nuix)) oo = if {An cos (nujx) + Bn sin (nwi)). 77=1 It follows that (4) a2y = y> ±Sitis a2a0 = A0, V2, /'II: 1 N Thus we have obtained not only a proof of the uniform convergence of our series to the anticipated value, but also a bound for the speed of the convergence: V2, sup |sjv(x) - f(x)\ < ll/'ll: N This proves the statement 7.1.8.(2), supposing the Dirichlet condition 7.1.8.(1) holds. 7.1.16. L2-convergence. In the next step of our proof, we derive L2-convergence of Fourier series under the condition of uniform convergence. The proof utilizes the common technique of approximation objects which are not continuous by ones which are. We describe it without further details. Interested readers should be able to fill in the gaps by themselves without any difficulties. First, we formulate the statement we need. Lemma. The subset of continuous functions f in 5° [a, b] on a finite interval [a, b] is a dense subset in this space with respect to the L2-norm. Here "dense" means that for any g in 5° [a, b] and any e > 0, there is some continuous / satisfying ||/ — g\\2 < e. We deal with abstract topological concepts like this in the last part of this chapter. The idea of the proof can be seen easily via the example of approximation of Heaviside's function h on the interval [—71,7i]. We recall that h(x) = —1 for x < 0, and h(x) = 1 for a; > 0. For every S satisfying tt > S > 0, we define the function fs as x/S for [a; < S and fs(x) = h(x) otherwise. All the functions fs are continuous, in fact, piecewise linear. It can be calculated easily that 11 h—fs \ 12 —> 0 so that h can be approximated in L2 norm by a sequence of continuous functions. All discontinuity points of a general function / can be catered for in exactly the same way. There are only finitely many of them, and so all of the considered functions are limit points of sequences of continuous functions. Now, our proof is already simple because for the given function /, the distance between the partial sums of its Fourier series can be bounded by using a sequence of continuous functions fe in this way. (All norms in this paragraph are the L2 norms): II/-*aK/)II < ll/-/e|| + ll/e-*Ar(/e)ll + ll*Ar(/e)-*Ar(/)ll and the particular summands on the right-hand side can be controlled. The first one of them is at most e, and according to the assumption of uniform convergence for continuous functions, the second summand can be bounded also by e. Notice that the third term has the value of the partial sum of the Fourier series for / — fe. Thus, \\f - fe - sN(f - fe)\\ < \\f-fe\\. 468 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING and for neN, Therefore by the triangle inequality, (5) (-.2n2 + a2)an = An, (-o/V + a2) bn = Bn. IM/-/.)II ^ + < 2||/-/e||. Altogether, ||/- sN(f)|| < 4e. There is exactly one pair of sequences {an}nGNu{o}> This verifies the L2 convergence of the continuous func-{bn}n€N satisfying these conditions if and only if tions sn(I) to / which is we wanted to prove. -cu2n2 + a2 = - 2 + a2 ^ 0 for every neN, 7.1.17. Dirichlet kernel. Finally, we arrive at the proof of tne ''rsl statement of theorem 7.1.8. It follows i.e., if (1) holds. In this case, the only solution of (3) with ^-^Wjf ■ from the definition of the Fourier series F(t) period T is determined by the only solution %m!rV'-' for a function /(£), using its expression with A B (6) an =-—j--, bn =-—y--, neN sums sN(t) can be written as —uiznz + gt — uiznz + gt V the complex exponential in 7.1.7, that the partial Tit N „T/2 f(x)e-lujkxelM dx 1 c of the system (5). We emphasize that we utilized the uniform SN(t) = — ^ / convergence of the series in (3). □ ^ k=-N J-T/2 where T is the period we are working with and cu = 2n/T. 7.B.19. Using the solution of the previous problem, find all This expression can be rewritten as 27r-periodic solutions of the differential equation rT/2 oo (1) sN(t) = / KN(t - x)f(x) dx, y" + 2y=£-^i, xeR. J-T/2 n=1 where the function Solution. The equation is in the form of (3) for a = \[2 and i JL KN(v) = - j"kv the continuously differentiable function ' T k=-N f(x) = J2 i a G K is called the Dirichlet kernel. The sum is a (finite) geometric n=1 series with common ratio eluly. By multiplying by eluly and with prime period T = 2ir. According to problem 7.B.18, then subtracting, we obtain the condition \/2 4. N implies that there is exactly one 27r- 1 N periodic solution. If we look for it as the value of the series ^y ^-^ ^ _ ±_ eiu(k+i)y oo T ^—' ^ + J2 (an cos (nx) + bn sin (nx)), k=~N n=l , we know also that (see (4) and (6)) (1 - etu>y)KN (y) = - I e"jM^ - e 0 i(60a of the end points. Hence, changing the coordinates, we can also use the expression sN(x) T/2 -T/2 Kn{v)Í{x + v) dy 2ua /i, a e K, a > 0. labels in drawings -proc ne, ale připsat rukou, popis je ale v textu stejné ? Solution. The function g is chosen to provide the mean of / over (small) intervals of length 2e and it is normalized so that the integral of g over all of R is one. We should expect some smoothing of the oscillations of the function f(x). Drawing the resulting functions by Maple shows: for the partial sums. Finally, we are fully prepared. First, we consider the case when the function / is continuous and differentiable at the point a. We want to prove that in this case, the Fourier series F(x) for the function / converges to the value fix) at the point x. We have sN{x) - f{x) = f'2 ifix + y)- f(x))KN(y) dy. J-T/2 The integrand can be rewritten fjx + y) -fjx) . . , , _^_7^_sm((iV +1/2)^) = = ifsx(y)(cos(a;y/2) sm(Nujy) + sin(o;y/2) cos(Nujy)), where 2(y) cos(Aro;y) dy with the continuous and bounded functions T T V'i(y) = ^-Vx(y)cos(o;y/2), ip2{y) = — R are called (real) linear functionals. Examples of such functionals can be given in two different ways — by evaluating the function's values (or its derivatives') at some fixed points or in terms of integration. We can, for instance consider the functional L given by evaluating the function at a sole fixed point xq £ /, i. e., L{f) = f(x0). 471 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING This last integral is improper iff — 1 < 0 < i + 1. For t Or, we can have the functional given by integration of the outside that interval, the integration gives product with a fixed function g{x), i.e., (/i*/2)(i) = iln t + 1 t-1 -2. If instead, —1 < t < 1, we can for small e > 0, replace f,*4",1 - da; with Jt — 1 X t+1 dx + dx which computes to In \t + 1 — In |e + In — e\ — In \t — 11. The terms in e cancel, so when we take the limit e obtain the same answer for the integral as before. Thus 0, we (/i*/2)(i)=iln t + 1 t - 1 L{f) f(x)g(x) dx. The function g(x)in the previous example is a function which weighs the particular values representing the function j(x) in the definition of the Riemann integral. The simplest case of such a functional is, of course, the Riemann integral itself, i. e. the case of g(x) = 1 for all points x. A good example is given by o if \x\ > e 2^ if a; < e. for any e > 0. The integral of the function g over R equals one, and our linear functional can be perceived as a (uniform) averaging of the values of the function / over the e-neighbourhood of the origin. Similarly, we can work with the function for all values of t except for t = 1, or for t = — 1. □ o il\x\ > e if la; I < e We calculate the convolution of two functions both of which have a bounded support. 7.C.3. Determine the convolution /i * fa where hi?) - x2 for a; G [-1,1] 0 otherwise, for a; G [0,1], otherwise. which we used in the paragraph 6.1.5. This function is smooth on R with compact support on the interval (—e, e). Our functional has the meaning of a weighted combination of the values, but this time, the weights of the input values decrease rapidly as their distance from the origin increases. The integral of g over R is finite, but it is not equal to one. Dividing g by this integral would lead to a functional which would have the meaning of a non-uniform averaging of a given function /. Another example is the Gaussian function l Solution. Since the integrand is zero when /i (x) = 0, /oo fi(x)fa(t-x) dx -oo = J (l-x2)fa(t-x) dx But the integrand is also zero when fa (t — x) = 0, so we need 0 < t — x < 1 ie. t — 1 < x < t for the integrand to be non zero. So for a non zero value of /i * fa(t), we integrate over the intersection of the intervals [t — l,t] and which also has its integral over R equal to one (we verify this later). This time, all the input values x in the corresponding "average" have a non-zero weight, yet this weight becomes insignificantly small as the distance from the origin increases. 7.2.2. Function convolution. Integral functionals from the previous paragraph can easily be modified to obtain a "streamed averaging" of the values of a given function / near a given point y G R: Ly(f) = / f(x)g(y - x) dx J — oo 472 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING [—1,1]. Consequently, (/i*/2)(i) = 0, if t>2 (1 - x2)(t - x)dx = At/3 - t2 + t4/12, L 1 < t < 2 (1 - x2)(t - a;)da; = -t2/2 + 1/4 + 2i/3, L 0 < i < 1 (1 - x2)(t - x)dx = -t4/12 + t2/2 + 1/4 + 2t/3, 1 < i < 0 0, if i < -1. □ 7.C.4. Determine the convolution f± * fa of the functions fi = l-x forxe[-2,l] 0 otherwise, h 1 for x £ [0,1] 0 otherwise. o The next topic is the Fourier transform, which is another <£gs> example of an integral operator. This time the kernel e~2"* is complex (see 7.2.5 for the terminology). Thus the values on real functions are complex functions in general, see 7.2.5. This is a basic operation in mathematics, allowing the time and frequency analysis of signals and also the transitions between local and global behaviour. 7.C.5. Fix 12 > 0. Recall that sgn/j = 1 if t > 0, and sgn/j = -1 if t < 0, sgnO = 0. Find the Fourier transform T(f) and the inverse Fourier transform J7-1 of the functions: (a) f(t) = sgni if t £ (—fl, fl), and zero otherwise. (b) f(t) = 1 if t £ (—fl, fl) and zero otherwise. Solution. The case (a). The Fourier transform of the given function is oo TU){u) = -j= J /(t)e-^dt — oo fl = —t== J sgnf (cos(a/i) — ism(cut)) dt Convolution of functions of a real variable The free parameter y in the definition of the functional Ly(f) can be perceived as a new independent variable, and our operation Ly actually maps functions to functions again, /-►/: /OO f(x)g(y - x) dx. -oo This operation is called the convolution of functions f and g, denoted f * g. The convolution is usually defined for real or complex valued functions on R with compact support. By the transformation t = z — x, we can easily calculate that (f*9)(z)= f(x)g(z-x)dx f(z-t)g(t) dt=(g*f)(z). Thus the convolution, considered as a binary operation of pairs of functions having compact support, is commutative. Similarly, convolutions can be considered with integration over a finite interval; we only have to guarantee that the functions participating in them are well-defined. In particular, this can be done for periodic functions, integrating over an interval whose length equals the period. Convolution is an extraordinarily useful tool for modeling the way in which we observe the data of an experiment or the influence of a medium through which information is transferred. For instance, an analog audio or video signal affected by noise. The input value / is the transferred information. The function g is chosen so that it would express the influence of the medium or the technical procedure used for the signal processing or the processing of any other data. 7.2.3. Gibbs phenomenon. Actually, we have already seen a useful case of convolution. In paragraph 7.1.17, we interpreted the partial sum of the Fourier series for a function / _ as a convolution with the Dirichlet kernel Kn(v) = S-t/2 etulky- The figure shows this convolution kernel with n = 5 and n = 15. 473 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING / n o = —A= i J (cos (ují) — i sin {ujt)) át— J (cos (o;i) — \o -n i sin Since cos and sin are respectively even and odd functions, / 2 cos(wí?) —1 The inverse Fourier transforms is given by almost the same integral, with the kernel eJ"x instead of e~J"x . The integration is in the frequency domain with variable cu. Thus, the only difference in the result is the sign: ^1(/)(i) = ^T/isin(orf)do; = ^ 2i [cosQt) 1 n 2 l-cos(tQ) Case (b) is computed similarly: J-(f)(uj) = —t= J (cos(ujt) — isin(o;i)) di žfc I cos(o;i) di 1 rsin(kit) ] ß ^tv l Lü J -ß 2 sin(k;ß) The latter expression is often expressed by means of the function sinc(i) = sin(i)/i 2Q sine (lj Í7). Here, the inverse Fourier transform has exactly the same result, because the sign change in the kernel does not affect the real part. Thus we only need to interchange the time and frequency variables: ^ (/)(*) = 2Í2 2ir z(W). □ The results are: Notice that instead of integrals over the entire real line, we employ the integration over the basic period T of the periodic functions in question. This interpretation allows us to explain the Gibbs phenomenon mentioned in paragraph 7.1.9. The point is that we know well the behaviour of the Dirichlet kernel near to the origin and thus, taken into account that the function / is bounded over the whole period and has all one-side limits of values and derivatives at each point of discontinuity, the effect of the convolution must be quite local. Consequently this leads to verification that the convolution with the Dirichlet kernel in the point x of jump of / behaves the same way as we computed explicitly for the Heaviside function at x = 0. There the overshooting by the Fourier sums can be computed explicitly and this explains the Gibs effect in general. We do not provide more details here. Readers may either work them out themselves (as a nontrivial exercise) or look them up in the literature. 7.2.4. Integral operators. In general, integral operators can depend on any number of values and derivatives of the function in its argument. For example, considering a function F depending on k + 2 free arguments, L(f)(y) = J F{y,xJ{x),f{x),...,fk\x))dx. Convolution is one of many examples of a special class of such operators on spaces of functions Hf)iy)= f f(x)k(y,x)dx. The function k(y, x), dependent on two variables, k : R2 ->• R, is called the kernel of the integral operator L. The theory of integral operators is very useful and interesting. We focus only on an extraordinarily important special case, namely the Fourier transform T, which has deep connections with Fourier series. 7.2.5. Fourier transform. Recall that a function f(t), given by its converging Fourier series, equals /(*)= E where the numbers c„ are complex Fourier coefficients, and cjn = n2ir/T with period T, see paragraph 7.1.7. After fixing T, the expression Acu = 2n/T describes the change of the frequency caused by n being increased by one. Thus it is just the discrete step by which we change the frequencies when calculating the coefficients of the Fourier series. The coefficient 1/T in the formula T/2 T/2 dt 474 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING The first two diagrams below show the imaginary values of the Fourier image of the signum function from 7.C.5(a) with n = 20 and fl = 50. The next two diagrams do the same for the characteristic function of the interval \x\ < fi from 7.C.5(b). The longer the interval with the constant values is, the more the image is concentrated around the origin. We can always use directly the simpler version of the transform for the odd and even functions. If the argument / is odd, then only the sine part of the formula contributes and its Fourier transform is oo Hf )(.<») = jr: J fit) sin (art) dt. Similarly, for even functions / J f(t) cos(o;i) dt. In particular, the odd functions have pure imaginary images while the images of the even functions are real. More generally, every real function / decomposes in to its odd and even parts / = /even +/0dd and the real and imaginary components of the Fourier image / are the images of these two parts, respectively. 7.C.6. Discover how the Fourier transform and its inverse behave under the translation ra in the variable, Taj(x)= f(x + a), and the phase shift ipa defined as Tf{u). Toyaf{u) = I f(t)ewte-^ldt f(t) e^"-^* dt = Ff{w - a). then equals Auj/2-k, so the series for f(t) can be rewritten as oo , T/2 \ /(*)= E ^(a,/ /(xje-^dzje-*. n=-oo v J-l/2 / Now imagine the values cun for all n e Z as the cho-fjj,, sen representatives for small intervals [cun, cun+1] of length Acu. Then, our expression in the big inner parentheses in the previous formula for / (t) describes f ^ the summands of the Riemann sums for the improper integral 1 r°° — / g(L,)e«*dL>, 21" J-oo where g(ui) is a function which takes, at the points cun, the values g(ujn) = f ' f(x)e-^x dx. J-T/2 We are working with piecewise continuous functions with a compact support, thus our function / is integrable in absolute value over R. Letting T —> oo, the norm Acu of our subin-tervals in the Riemann sum decreases to zero. We obtain the integral f(x)e- dx. The previous reasonings show that there is a large set of Riemann integrable functions / on R for which we can define a pair of mutually inverse integral IsSiIJ operators: Fourier transform For every piecewise continuous real or complex function / on R with compact support, we define 1 f°° 2tt This function / is called the Fourier transform of the function /. The previous ideas show that 1 /2tt /(o;)eJ"t duo. This says that the Fourier transform T just defined has an inverse operation T~x, which is called the inverse Fourier transform. Notice that both the Fourier transform and its inverse are integral operators with almost identical kernels Of course, these transforms are meaningful for a much larger class of functions. Interested readers are referenced to specialized literature. 475 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Thus is proved the formulae ToTa = fiaoT, To fia = T-a O T. Similarly, T~X OTa = ip-a O T~X , T~X Oipa = TaO T~x. □ The next problem displays the behaviour of the Fourier transform on the Gaussian function. This is a rare example, where the time and frequency forms are very similar. Again, we see the feature of exchanging the local and global properties in the time and frequency domains. 7.C.7. Compute the Fourier transform T(f) of the function f(t)=e-at\ teR, where a > 0 is a fixed parameter. Solution. The task is to calculate —00 A standard trick is to transform the problem into one of solving a (simple) differential equation. Hence: Differentiating (with respect to cu) and then integrating by parts gives T(f)(^ 1 1 _ . lim — e~at -wt - lim — e~at ~lult 2ir \t->-oo 2a t-t—00 2a i(—iuj) 1 2tt \2a t- 2a lim e 2a 2a t lim e '2a 1 /2tt ./_^ / 2a Therefore y(ui) = T(f)(ui) satisfies the differential equation i.e. - d y = — TT- du dj/ du 2^y, i.^. y - 2a unless y equals zero (y = 0 is a solution of the equation). Integration yields where C and K are constants. All solutions (including the zero solution) of the differential equation are given by the function 7.2.6. Simple properties. The Fourier transform changes the local and global behaviour of functions in an interesting way. We begin with a simple example in which there is a function j(t) which is transformed to the indicator function of the interval [—■!?, i?], i. e., f(ui) = 0 for > 17, and f(ui) = 1 for I a; I < 17. The inverse transform T~x gives /(*) = aw = 1 V2irt2i 2Ü sin(ßt) 27T nt Thus, except for a multiplicative constant and the scaling of the input variable, it is the very important function j(x) = sinc(a;) = Calculation of the limit at zero, by l'Hospital's rule or otherwise, gives /(0) = 2f7(27r)-1/2. The closest zero points are at t = ±n/fi and the function drops in value to zero quite rapidly away from the origin x = 0. This function is shown in the diagram by a wavy curve for 17 = 20. Simultaneously, the area where our function j(t) keeps waving more rapidly as 17 increases is also depicted by a curve. The indicator function of the interval [—17,17] is Fourier-transformed to the function /, which has takes significant positive values near zero, and the value taken at zero is a fixed multiple of 17. Therefore, as 17 increases, the / concentrates more and more near the origin. Now we derive the Fourier transform of the derivative j'(t) for a function /. We continue to suppose that / has compact support, so that both T(f') and T(f) exist. By integration by parts, 1 1 V2tt l icdT(f)(cd 2ir dt + dt Thus the Fourier transform converts the (limit) operation of differentiation to the (algebraic) operation of a multiplication 476 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING y(a;) = if e"tr, KeR. To find if, begin with the well known fact (proved in ??) J e~x d X = y^K, — 00 to obtain / e"a*2 dt = -W e"*2 d Therefore, J"(/)(0) if e° = if. So if = -4- and 1 \AF _ _ i-andJl(/)(0) □ 7.C.8. Determine the Fourier transform image of the Gaussian function 1 : e 2tr2 /(*) 0-v27t Solution. Use the result of the previous problem with a = 2^2 and the composition with the variable shift ra from the last but one problem. It follows that 1 - _ /2tt □ As mentioned, the most typical use of the Fourier trans-form is to analyse the frequencies in a signal. The next problem reveals the reason. For tech-11 nical reasons we cut the signal by multiplication with the characteristic function hn of the interval (—fi, fi). 7.C.9. Find the Fourier transform of the functions fit) = hfi(t) cos(nt), g(t) = hn(t) sin(ni). Solution. By definition /2ir cos(nt) e~iutdt n V2^ J 2K |e-^di 2v/2tt li{n-w) J(n-uj)t]n + 2y/2nVi{n + u>) z{{n — uj)fi) + sinc((n + uS)fi)). by the variable. Of course, this procedure can be iterated, to obtain 7.2.7. The relation to convolutions. There is another extremely important property to consider, namely the relation between convolutions and Fourier transforms. Calculate the transform of the convolution h = f * g, where, as usual, the functions are assumed to have compact support. Recall that we may change the order of integration, see 6.3.13. Then we change variable by the substitution t — x = u. The result is T{h)(oS) = 1 /2tt f(x)g(t - x) dx dt f(x)[ / g(t -x)e~lujt dt dx --= g(u)e-^+^ du^j dx /We' dx du A similar calculation shows that the Fourier transform of a product is the convolution of the transforms, up to a multiplicative constant. In fact, 2ir As we mentioned above, the convolution f * g often models the process of the observation of some quantity /. Using the Fourier transform and its inverse, the original values of this quantity are easily recognised if the convolution kernel g is known. We just calculate T(f * g) and divide it by the image T(g). This yields the Fourier transform of the original function /, which can be obtained explicitly using the inverse Fourier transform. This is sometimes called deconvolution. In real applications, the procedure often cannot be that straightforward since the Fourier image of the known convolution kernel might have zero values and therefore we hardly can divide by it as above. For example, take the convolution kernel sinc(i) whose image is an indicator function of some finite interval. So we need some more cunning techniques and there is a vast literature on them. 7.2.8. The L2-norm. As an illustration of the power of our simple results, look at the behaviour of the Fourier transform with respect to the L2-norm. We write g for the function g(t) = g(—t) and notice that (f*9)(t)= f(x)g(t-x)dx= f(x)g(x-t)dx J-co J-co In particular, the scalar product is given by the formula = (/*s)(o). 477 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING The same computation leads to the image of the sine signal, the only difference is one minus sign and an additional i in the formula: J-{g){uj) = i _(— sinc((n — ui)fl) + sinc((n + ui)fl)) □ 7.C.10. Find the Fourier transform of the superposition of the cos(ni) signals over the interval (—fl, fl), f(t) = hn(t)(3cos(5t) + cos(15f)). What happens, if fl —> oo? Solution. The Fourier transform is linear over scalars, thus we simply add the corresponding images from the previous problem with n = 1 and n = 3, multiplied by the proper coefficients. The illustration of the image with fl = 20 is ~*4, to* Each of the peaks behaves like the Fourier image of the characteristic function ha, shifted to the frequencies. If fl increases to infinity, the image / has four peaks at the same positions corresponding to the frequencies ±5 and ±15. But they become narrower and sharper. In the limit, this is no longer a function since the width of the peaks becomes zero. This is usually written J-(cos(nt))(cu) = Q o for an unknown function /. Solution. Multiply both sides of the equation by \J2j-k, to obtain the sine Fourier transform on the left-hand side. Apply the inverse transform to both sides of the equation to get oo f(t) = £ / e"* sin(xt) dx, t > 0. which can be seen from the calculation of the Fourier transform of the function Jq cos(nx) and then letting fi approach oo, see the solution to problem 7.C.10. We can obtain the Fourier transform of the sine function in a similar way. We can take advantage of the fact that the transform of the derivative of this function differ only by a multiple of the imaginary unit and the new variable. Alternatively, we can also use the fact that the sine function is obtained from the cosine function by the phase shift of tt/2. These transforms are a basis for Fourier analysis of signals (see also problem 7.C.9): If a signal is a pure sinusoid of a given frequency, then this is recognized in the Fourier transform as two single-point impulses exactly at the positive and negative value of the frequency. If the signal is a linear combination of several such pure signals, then we obtain the same linear combination of single-point impulses. However, since we always process a signal in a finite time interval only, we get not single-point impulses, but rather a wavy curve similar to the function sine with a strong maximum at the value of the corresponding frequency. The size of this maximum also yields information about the amplitude of the original signal. Another good way how to approximate the Dirac delta function is to exploit the Gaussian functions. As seen in the solution to problem 7.C.7, the Fourier image of the Gaussian function /(*) 1 /2tt is again Gaussian corresponding to the reciprocal values of a. In the limit a —> 0, the image converges fast to the multiple of constant function, see the illustrations, with a = 3 and a = 1/10. Notice that the rather large a in the first illustration corresponds to a wide Gaussian, while the image is the slim one. The second illustration provides the opposite case. The preim-age is the narrow Gaussian and the image is already reasonably close to the constant function. The Gaussians are chosen with Li-norm equal to one, but the Fourier transform preserves the L2-norm of the functions. 7.2.10. Fourier sine and cosine transform. If we apply the Fourier transform to an odd function f(t), where f(—t) = —f(t), the contribution in the integration of the product of 479 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Integrating by parts twice shows that J e~x sin (xt) dx = [— sin (xť) — t cos (aí)] + C. Hence e x sin(ai) da; t l+t2 So tt 1+í2' t > 0. □ 7.C.13. Use the Fourier transform to find solutions of the non-homogeneous linear differential equation (1) y' = ay + f where a € Risa non-zero constant and / is a known function. Can all solutions be obtained in this way? Solution. The key observation for this problem is the relation between the Fourier transform and the derivative see 7.2.6. Thus, if the Fourier transform is applied to the equation (1), we get the algebraic equation for y = F(y) iuy = ay + f. If it is assumed that F(f) = f exists and there is a solution y with the Fourier image y, then 1 -/ y = - iuj — a and using the general relation T~x (g ■ h) = \/2~kF~1 f * T~x g between the product and convolutions from 7.2.7 we arrive at the final formula 1 y ■ ■.T-1 i /27T \iu — a So it is necessary to compute the inverse Fourier transform of the simple rational function (ílu — a) in two steps. Assume first a < 0 and evaluate Guess the solution L a—ILO J 0 1 Similarly for a > 0 This provides the two desired results. Indeed, if the equation (1) comes with a > 0, we rewrite our rational function as — (a — jo/)_1. Next, the function — \/2~Ke~at for negative t f(t) and the function cos(±a/i) cancels for positive and negative values of t. Thus if / is odd, then -2i f (ť) sino/í dt. The resulting function is odd again, hence for the same reason, the inverse transform can determined similarly: F{f){oj) = -= / f(t)smutdt. v 2tt Jo Omitting the imaginary unit i gives mutually inverse transforms, which are called the Fourier sine transform for odd functions: fs(u) = \j- J f(t) sin(o/i) dt, fit) f s (ť) sin(o/í) dt. Similarly, we can define the Fourier cosine transform for even functions: /(*) = /(í) cos(o/í) dt, f s (ť) sino/í dt. 7.2.11. Laplace transforms. The Fourier transform cannot be applied to functions which are not integrable in absolute value over R (at least, we do not obtain true functions). The Laplace transform is similar to the Fourier transform: C(f)(s) = f(s) = f(t)e dt. The integral operator C has a rapidly reducing kernel if s is a positive real number. Therefore, the Laplace transform is usually perceived as a mapping of suitable functions on the interval [0, oo) to the function on the same or shorter interval. The image £ (p) exists, for example, for every polynomial p(t) and all positive numbers s. Analogously to the Fourier transform, we obtain the formula for the Laplace transform of a differentiated function for s > 0 by using integration by parts: poo £(f'(t))(s)= / f{i) e~st dt = if(t)e-sV + sJ f(t)e~st dt = -/(0)+ *£(/)(*). The properties of the Laplace transform and many other transforms used in technical practice can be found in specialized literature. We provide a few examples in the other column starting with 7.D.I. 480 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING provides the requested Fourier image. Immediately it is seen that the convolution y(t) eax f(t-x)dx is a solution. (The multiples V2tt in the expression with the convolution cancel.) Similarly, if a < 0 then y(t) eax f(t-x)dx is a solution. Not all solutions can be obtained in this way. For example, y' = y leads to y(t) = C e* with an arbitrary constant C, but this is not a function with a Fourier image. With j(t) = 0, our procedure produces the zero function, which is just one of the solutions. Similarly, if we deal with the equation y' = y + t, then the particular solution suggested above is y(t) = -f ex{t-x)dx = -t-l. J — co □ 7.C.14. Check directly that the two functions y(t) found above are indeed solutions to the equation y' = ay + f. o 7.C.15. As in the previous problem, solve the second order equation v" = ay + f. Solution. Use the fact that T(y") (cu) = —u2T(y)(u) and deduce the algebraic relation — cu2y = ay + f, for the Fourier images y and /. Hence -1 y ■ or + a In order to guess the correct preimage of the rational function in question, first assume a > 0 and compute r 1 ^a—iujxlu L a—iui J - 1 1 + + a — ioj a + ilj Thus it is verified that . a-\-iüJ 2a 1 2tt a + uj2 Immediately (the factors \/2~k cancel) —1 —1 r°° y(t) = -= e-^l'l *f(t) = -= / e-v^l f(t -x)d: 7.2.12. Discrete transforms. Fourier analysis of signals mentioned in the previous paragraph are realized by special analog circuits in, for example, radio technology. Nowadays, we work only with discrete data when processing signals by computer circuits. Assume that there is a fixed (small) sampling interval r given in a (discrete) time variable and that, for a large natural number N, the signal repeats with period Nt, which is the maximal period which can be represented in our discrete model. We should not be surprised that our continuous models allow for a discrete analogy. Consider an ^-dimensional vector, which can be imagined as the function r i-> /(r) G C, for r = 0,1,..., N — 1. Denote Acu = 2^ and kAu). The simplest discrete approximation of the Fourier transform integral suggests that JV-l />) = £/We_i** r=0 should be a promising transformation f f, whose inverse should not be far from JV-l /»= £/>K Actually, these are already the mutually inverse transformations: Theorem. The transformation above satisfies f(k) = f(k) forallk = 0A,---,N - 1. Proof. Let Then N and so by subtraction, (1 - e^k)T = 1 - el2l,k . The right hand side is 0 for all integers k. On the left side, the coefficient of T is not zero unless k is a multiple of N. Hence ir^j-k I N if is a multiple of N 0 otherwise. With k and s both confined to the range {0,1,2 ... N — 1}, k — s can only be a multiple of N when k = s. It follows that for such k and s £ ejr^(fe-s) = 8ksN r=0 where the Kronecker delta Sks = 0 for k =^ s and Sks = 1 if k = s. Finally, we compute: 481 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING The case a < 0 is a little more complicated. But we may ask Maple or look up in the literature that the function g(t) = sin(b|i|) has the Fourier image T(g)(u 1 2b 2tt b2 -uj2' We are nearly finished. The required preimage is /2~k h(t) = sin(->/ — n.\t\ and the resulting convolution is 1 y(t) = (\/—a\x\)f(t — x) d: If we rewrite the equation as y" + by = f with b > 0, the result says 1 y(t) = 2b \(b\x\)f(t — x)dx. □ 7.C.16. Check directly that the two functions y(t) found above are indeed solutions to the equation y" = ay + f. o D. Laplace Transform The Laplace transfer is another integral transform which \\ ^' /, interchanges differentiation and algebraic multiplica- fr tion. As with the Fourier transform, this is based on the properties of the exponential function, but this time we take the real exponential, see 7.2.11 for the formula. One advantage is that every polynomial has its Laplace image. 7.D.I. Determine the Laplace transform C(f)(s) for each of the functions (a) f(t) = eat; (b) f(t) (c) f(t) (d) f(t) (e) f(t) (f) /(*) (g) /(*) ci eai* + c2ea2*; ■ cos (bt); ■ sin(bt); ■ cosh (bt); ■ sinh (bt); ■ tk, ke N, where the constants £ K and a, ai, a2, ci, c2 e C are arbitrary. It is assumed that the positive number s e K is greater than the real parts of the numbers a,ai,a2 G C, and is greater than b in the problems (e) and (f). JV-l , JV-l r=0 x s=0 JV-l s=0 JV-l JV-l ^ E fWksN = f(k). N s=0 □ The computations in the proof also verify that the Fourier image of a periodic complex valued function with a unique period among the chosen sampling periods are just its amplitude at this particular frequency. Thus, if the signal has been created as a superposition of periodic signals with the sampling frequencies only, we obtain the absolutely optimal result. However, if the transformed signal has a frequency not exactly available among the sampling frequencies, there are nonzero amplitudes at all the sampling frequencies in the Fourier image. This is called frequency leaking in the technical literature. There is a vast amount of literature devoted to fast implementation and exploitation of the discrete Fourier transform, as well as other similar discrete tools. This is an extremely active area of current research. 3. Metric spaces At the end of the chapter, we will focus on the concepts of distance and convergence in a more abstract way. This also provides the conceptual background for some of the already derived properties of Fourier series and Fourier transform. We need these concepts in miscellaneous contexts later. It is hoped that the subsequent pages are a useful (and hopefully manageable) trip into the world of mathematics for the competent or courageous! 7.3.1. Metrics and norms. When we discussed Fourier series, the distance between functions on a space i of functions, was commonly referred to. Now oughly. we examine the concept of distance more thor- 482 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Solution. The case (a). It follows directly from the definition of the Laplace transform that oo £(/) « = /< Je-(s-a)t dt= lim /e-'—"M 1 s —a' The case (b). Using the result of the above case and the linearity of improper integrals, we obtain oo oo C (/) (s) = ci J" eaiť e"sť d í + c2 / e"2* e"sť d f = 0 0 ci _|_ c2 s-ai s—a2 The case (c). Since cos (bť) = \ (ýht + e" ibt\ the choice ci = 1/2 = c2, ai = ib, a2 = —ib in the previous case gives oo £ (/) (s) = / (|ejW + ie-jbt) e"s* di = o 2{s-ib) 2(s+ib) — s2+b2 ' The cases (d), (e), (f). Analogously, the choices (d) c\ = —i/2, c2 = i/2, ai = ib, a2 = — ib; (e) c\ = 1/2 = c2, a\ = b, a2 = —b; (f) ci = 1/2, c2 = -1/2, ai = b,a2 = -b lead to (d) £(/) W = 545T; (e) £(/) (S) = ^t; (f) £(/) W = -^5T. Finally, the last one is obtained by a straightforward repetition of integration by parts: C(tk)(s) = / tke-stdt Jo k\ sk .10 k\ Qk+1 ' Axioms of a metric and a norm A set X together with a mapping tl:IxX^R such that for all x,y, z e X, the following conditions are satisfied (1) d(x, y) > 0; and d(x, y) = 0 if and only if x = y, (2) d(x,y) = d(y,x), (3) d(x,z) R is a function satisfying (4) ||a;|| > 0; and ||a;|| = 0 if and only if x = 0, (5) ||Aa;|| = |A| ||a;||, for all scalars A, (6) ||z + y|| < ||x|| + ||y||, then the function || || is called a norm on X, and the space X is then a normed vector space. The L1 norm || || i and the L2 norm || ||2 on functions, as well as the Euclidean norm on R™, satisfy these properties. A norm always determines the metric d(x, y) = \\x — y\\, which is also the case with the Euclidean distance. But not every metric can be defined by a norm in this way. At the beginning of this chapter, we defined the distance between functions using the Li-norm. In Euclidean vector spaces, it is the norm \\x\\, which is induced by the bilinear inner product by the relation 11 x \|2 = (x,x). Similarly, we work with the norm on unitary spaces. We obtained the L2-norm on continuous functions in the same way. Metrics given by a norm have very specific properties since their behaviour on the whole space X can be derived from the properties in an arbitrarily small neighbourhood of the zero element x = 0 G X. □ 7.D.2. Use the definition of the Gamma function r(t) in Chapter 6 in order to prove 1 £(iQ)(s) = r(a + l)- ,Q + 1 for general a > 0. Compare the result to the one of 7.D. 1(g). o 7.D.3. For s > — 1, calculate the Laplace transform £ (g) (s) of the function g(t)=te~t. 7.3.2. Convergence. The concepts of (close) neighbourhoods of particular elements, convergence of sequences of elements and the corresponding "topological" concepts can be defined on abstract metric spaces in much the same way as in the case of the real and complex numbers and their sequences. See the beginning of the fifth chapter, 5.2.3-5.2.8. We can almost copy these paragraphs; although the proof of the theorem 5.2.8 is much harder. We begin with the concept of convergent sequences in a metric space X with metric d: 483 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Further, for s > 1, calculate the Laplace transform C (h) (s) of the function Cauchy sequences h(ť) = t sinhi. O The basic Laplace transforms are enumerated in the following table: y(t) tK eat teat sino;i cos a;í eat sinaii eať(cosaií + ^ sinaií) t sin aii sin aii — aii cos u>t ~k~r sk+i (s-a)2 re! (s-a)"+! s2+lo2 s2+ui2 _uj_ (s — a)2-\-uj2 _s_ (s-a)2+u)2 2lüs (s2+u)2)2 2lo3 (s2+u)2)2 7.DA. Establish the 5th and 6th rows of the table above using o Euler's formula e" ■ cosoii + i sinoii. As expected, using the features of the Laplace transform allows us to find explicit solutions to some differential equations. By 7.H.8, it is straightforward to incorporate the initial conditions into the solution. We present just two such examples in the problems at the conclusion of this Chapter, see 7.H.11. We return to this topic in Chapter 8. £. Metric spaces The concept of metric is an abstract version of what we understand as the distance in Euclidean geometry. It is always based on the triangle inequality. 11 The axioms in Definition 7.3.1 follow the Euclidean experience, saying that our "distance" of two elements has to be strictly positive (except if the two elements coincide), should be symmetric in the arguments, and should satisfy the triangle inequality. Other concepts available in the literature are more abstract and might lead to more general objects (the most important ones being pseudomet-rics, ultrametrics, and semimetrics). 2 The first axiomatic definition of a "traditional" metric was given by Maurice Frechet in 1906. However, the name of the metric comes fromFelix Hausdorff, who used this word in his work from 1914. Consider an arbitrary sequence of elements xq,xi,.. . in X. Suppose that for any fixed positive real number e, d(xi,Xj) < e for all but finitely many pairs of terms x{, Xj of the sequence. In other words, for any given e > 0, there is an index N such that the above inequality holds for alii, j > N. Loosely put, the elements of the sequence are eventually arbitrarily close to each other. Such a sequence is called a Cauchy sequence. Just as in the case of the real or complex numbers, we would like every Cauchy sequence of terms x{ e X to converge to some x in the following sense: Convergent sequences Let xq, x\,... be a sequence in a metric space X and let a; be an element of X. We say that the sequence {x{} converges to the element x, if, for every positive real number e, there is an integer N > 0, so that i > N implies d(xi,x) < e. By the triangle inequality, it follows that for each pair of terms x{, Xj from a convergent sequence with sufficiently large indices, d(xi,Xj) < d(xi,x) + d(x,Xj) < 2e. Therefore, every convergent sequence is a Cauchy sequence. Conversely however, not every Cauchy sequence is convergent. Metric spaces where every Cauchy sequence is convergent are called complete metric spaces. 7.3.3. Topology, convergence, and continuity. Just as in the case of the real numbers, we can formulate the convergence in terms of "open neighbourhoods". Open and closed sets Definition The open e-neighbourhood of an element x in a metric space X (or just e-neighbourhood for short) is the set Oe(x) = {y el; d(x,y) < e}. A subset U C X is open if and only if for all x e U, U contains some e-neighbourhood of x. We define a subset W C X to be closed if and only if its complement X \ W is an open set. Instead of an e-neighbourhood, we also talk about (open) e-ball centered at x. In the case of a normed space, we can consider e-balls centered at zero: along with x, e-balls determine an e-neighbourhood. The limit points of a subset A c X are defined as those elements x e X such that there is a sequence of points in A other than x converging to x. We prove that a set is closed if and only if it contains all of its limit points: 484 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 7.E.I. The discrete metric space X is denned as the set X with the function d:Ix!->R d(x,y) = 1 x=£y 0 x = y. Show that this is a metric space according to Definition 7.3.1. Show how to introduce a metric on Cartesian products of metric spaces, so that product of two discrete metric spaces is again discrete. Solution. All three axioms of a metric from 7.3.1 are obviously satisfied in our definition of the discrete metric space. Consider two metric spaces X and Y with metrics dx and dy. The first obvious idea seems to add the distances of the components, i.e. d((xuyi), (x2,y2)) = dx(x!,x2) + dY{yi,y2). Clearly this is a metric (verify in detail!), but if the metric spaces X and Y are discrete, then considering points u = (x1,y1) and w = (x2,y2) such that x1 x2, y1 y2 we arrive at d(u,w) = 2. Thus, this is not a discrete metric space. But there is another simple possibility of introducing a metric onX x Y using the maximum of the distances: d{{xi,yi), (x2,y2)) = max{(dx(x1,x2),dY(yi,y2)}. We call this the product of the metric spaces X and Y. The triangle inequality as well as the other axioms are obvious (write down the explicit arguments!). Moreover, if both X and Y are discrete, then d is also a discrete metric. □ 7.E.2. Decide whether or not the following sets and mappings form a metric space: i) N, d(m, n) = gcd(m, n) ii) N, d(m,n) = _ 1 iii) World population, d(P1,P2) = n, Pi = Xq,Xi,...,Xn+i = P2 is the shortest seuquence of people, such that Xi knows Xi+i for i = 0,...n. Solution. i) No. The "distance" d does not satisfy that d(m, m) = 0. ii) No. The first and second conditions in the definition 7.3.1 are fulfilled, but the triangle inequality (property (3)) is not. The distance of 8 and 9 is 8, the distance of 8 and 6 is 3 and the distance of 6 and 9 is 2, thus d(8,9) > d(8,6) + d(6,9). Suppose A is closed and a; is a limit point of A but not belonging to A. Then x e X\A which is open, so there is an e-neighbourhood of x not intersecting with A. But in every e-neighbourhood of x, there are infinitely many points of the set A, since a; is a limit point. This is a contradiction. Conversely, suppose A contains all of its limit points and suppose x 6l\A If in every e-neighbourhood of the point x, there is a point xe e A, then the choices e = 1/n provide a sequence of points xn e A converging to x. But then, the point x would have to be a limit point, thus lying in A, which again leads to a contradiction. For every subset A in a metric space X, we define its interior as the set of those points x in A for which a neighbourhood of x also belongs to A. We define the closure A of a set A as the union of the original set A with the set of all limit points of A. As easily as in the case of the real numbers, we can verify that the intersection of any system of closed sets as well as the union of any finite system of closed sets is also closed. On the other hand, any union of open sets is again an open set. A finite intersection of open sets is again an open set. Prove these propositions by yourselves in detail! We also advise the reader to verify that the interior of a set A equals the union of all open sets contained in A, (alternatively put, the interior of A is the largest open subset of ^4). The closure of A is the intersection of all closed sets which contain A, (alternatively put, the closure of A is the smallest closed superset of ^4). The closed and open sets are the essential concepts of the mathematical discipline called topology. Without pursuing these ideas further, we have just familiarised ourselves with the topology of the metric spaces. The concept of convergence can be reformulated now as follows. A sequence of elements x{, i = 0,1,..., in a metric space X converges to x e X if and only if for every open set U containing x, all but finitely many points of our sequence lie in U. Just as in the case of the real numbers, we can define continuous mappings between metric spaces: Let W and Z be metric spaces. A mapping / : W —> Z is continuous if and only if the inverse image /~1 (V) of every open set V C Z is an open set in W. This is equivalent to the statement that / is continuous if and only if for every z = f(x) £ Z and positive real number e, there is a positive real number S such that for all elements y e W with distance dw(x, y) < 8, dz(z, f(y)) < e. Again, as in the case of real-valued functions, a mapping / from one metric space to another is continuous if and only if it preserves the convergence of sequences (check this yourselves!). 7.3.4. Lp-norms. Now we have the general tools with which we can look at examples of metric spaces created by finite-dimensional vectors or functions at our disposal. We restrict ourselves to an extraordinarily use-ffl 1 ful class of norms. 485 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING iii) No. The "distance" is not symmetric. It would be a metric space, if the definition the word "knows" is changed to mean "know each other". □ 7.E.3. Consider the set of binary words of the length n. Define the distance between two words as the number of bits in which they differ. This is called the Hamming distance, see 12.4.2). Show that it defines a metric. Solution. The first two axioms of a metric are satisfied. For the third one let the words x and z differ in k bits. Let y be another word. Then consider just k bits, in which x and z differ. Clearly y differs in each of these bits exactly from one each of x and z. Thus considering only the parts of words xp, yp, 2P in the fc bits, we.have.d(xp,yp)+d(yp, zp) = d(xp,zp). In the other bits, the words x and z are the same, while x and y or y and z may differ. Thus d(x, y) + d(y, z) > d(x, z) and the third axiom is satisfied also. □ 7.E.4. Consider any connected subset S C K™ (any two points in S can be connected with a path lying in S). Define the distance between two points as the length of the shortest path between the points. Is it a metric on 5"? Solution. It is a metric. All the axioms of the metric are trivially satisfied. But this metric has a special significance. The principle of "shortest way" is often met in reality. Recall for example Fermat's principle (see 5.F.10) of the least time, where we measure the length of a path by the time it is traveled by light. Generally, shortest paths in a metric space are called geodesies. □ 7.E.5. Consider a space of integrable function on the interval [a, 6]. Define the (Li) distance of the functions /, g as \f(x) -g(x)\dx . Why it is not a metric space? Solution. The first axiom of the metric space in 7.3.1 is not satisfied. Any function of zero measure has distance 0 from the null function. But if we consider an equivalence where two functions are equivalent, if they differ by a function of measure zero, then we get the space 5° (a, 6). The given distance considered on the equivalence classes of this equivalence is the L1 metric. □ We begin with the real or complex finite-dimensional vector spaces R™ and C", and for a fixed real number p > 1 and any vector z = (z1,..., zn), we define -Ei i/p We prove that this indeed defines a norm. The first two properties from the definition are clear. It remains to prove the triangle inequality. For that purpose, we use Holder's inequality: This is Holder inequality Lemma. For a fixed real number p > 1 and every pair of n-tuples of non-negative real numbers xi and yi, n , n n 1/p , n \l/q <(£*?) -fey? i=l M=l ' M=l where 1/q = 1 — 1/p. Proof. Denote by X and Y the expressions in the product on the right-hand side of the inequality to % be proved. If all of the numbers x{ or all of the numbers are zero, then the statement is clearly true. Therefore, we can assume that X 0 and Y 0. We need to use the fact that the exponential function is a convex function. (This can be stated: the graph of the exponential function lies below any of its chords). Hence, for any a and b, with p, q as above, e(i/P)a+(i/9)b < (l/p)ea + (l/q)eb (in fact, this is a lensen inequality, see). Define the numbers and so that Then xk = Xev*/P, yk = Yew^«. sVk/p+Wk/q < I g-"* _|_ i &wk By substitution, it follows immediately that 1 ^lfxkY lfyk XYXkVk-p\x) +-q\Y Summing over k = 1,..., n, gives ^ n " XY — V- pXP qY 1 n = Ar^ + 4rY" = - + - = l. pXP qYi p q Multiplying this inequality by XY finishes the proof. □ Now we can prove that || || is indeed a norm: 486 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 7.E.6. Let r be a rational number and p a prime number. Then r can be uniquely written in the form r = pk^, where u G Z and v G N are coprime and p does not divide both, the numerator u and the denominator v. Consider the map ||. ||p : Q —> R, II r II i-> Show that it is a norm on Q as a vector space over Q. It is called the p-adic norm O Solution. It is an exercise in elementary number theory. □ 7.E.7. Consider the power set (the set of all subsets) of cjg3> a given finite set. Determine whether the functions di and d2, defined for all subsets X, Y by (a) d!(X,Y) :=|(IUľ)\ (X n Y) |, (b) d2(X,Y) := l(*%fcffny)l , for X U Y é 0, d2(0,0) :=0 are metrics. ( X |, is meant the number of elements of a set X, thus the metric di measures the size of the symmetric difference of the sets, while d2 measures the relative symmetric difference.) Solution. We omit verifications of the first and second conditions from the definition of a metric in exercises on deciding whether a particular mapping is a metric. We analyze the triangle inequality only. The case (a). For any sets X, Y, Z, (1) (xuz)^(xnz) c ((xur)\(xnr))u((yuz)\(ynz)) To show this, suppose first that x is an element satisfying x G X and x <£ Z. Then either x G Y in which case x G (Y U Z)^(YnZ),oľx <£ F in which case a; G (Iuľ)\(ľnľ). It follows that x belongs to the union, that is, the right side of 1. By symmetry, the same result holds if a; is an element satisfying x <£ X and x G Z. Since then all possibilities when x belongs to the left side of 1 are accounted for, the inclusion 1 is established. But then, dx{x,z) = |(iuz)\(inz) <|((xuľ)\(in Y)) u ((Y u2)\(ľn Z)) < (iuľ)\(inľ)| + |(ľuZ)\(ľnZ) = d1(XX) + d1(YZ). The case (b). Proceed similarly to the case of di. Denote by X' the complement of a set X. The equalities (X U Y) \ (X n Y) = (xnY'nz)u(xnY'nz')u(X'nYnz)u(X'nYnz'), Minkowski inequality For every p > 1 and all n-tuples of non-negative real numbers (xu ..., xn) and (yu ..., yn), l/p / n \ l/p / 71 \1/p Vi < £*n + £»r To verify this inequality, we can use the following trick. By Holder's inequality, (recall p > 1) 5>(zi-r-ž/i)p-1 < (£. i=l M=l and 5>(zi+2/i)p-1 < (E i/p i/p ,(p-1)9 (p-i)q 1/9 1/9 Adding the last two inequalities, and taking into account that p + q = pq, and so (p — l')q = pq — q = p, we arrive at , n x l/p , n ^ l/p <'E-n + (£»? Ľľ=i(^ + yOp Ľľ=i(^ + yi)p) 1/9 that is \ 1-1/9 / n \ i/p / re \ i/p E^+w)p ^ E-n + UX =i ^ ^j=i ^ ^=1 or n x l/p , n x l/p ^ n ^ l/p E(^n < E-f + £tf i=l ' M = l 7 M = l since 1 — 1/q = l/p. This is the Minkowski inequality which we wanted to prove. Thus we have verified that on every finite-dimensional real or complex vector space, there is a class of norms || || for all p > 1. The case p=l was considered earlier. We can also consider p = oo by setting \\z\\oo = max{|zj|, i = l,...,n}. This is a norm. We notice that Holder's inequality can, in the context of these norms, be written for all a; = (x1,..., xn), y = (yi,...,y„) as for all p > 1 and q satisfying l/p + 1/q = 1. For p = 1, we set q = oo. 7.3.5. Lp-norms for sequences and functions. Now we can easily define norms on suitable infinite-dimensional vector spaces as well. We begin with sequences. The vector space £p, p > 1, is the set of all sequences of real or complex numbers x0, x1,... such that 487 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING (F U Z) \ (F n Z) = (xnFnz')u(xnF'nz)u(X nFnz')u(X nF'nz), ((x u z) \ (x n z)) u (f \ (x u z)) = (x n y n Z) u (XnF'nZ)u(X nFnZ)u(X nF'nZ)u(X nFnZ), which, again, can be proved by listing several possibilities, imply a stronger form of (1), namely ((x u z) \ (x n z)) u (f \ (x u z)) c ((XuF) \ (xnF)) u ((FuZ) \ (Fnz)). Further, we invoke the inequality (XuZ)^(XnZ) < xuz I — XliZLI[Y^(XliZ)) | This is based on calculations with non-negative numbers only since in general x x + y z z + y y > 0, z > 0, x e [0, z]. Since X U Z U (F \ (X U Z)) = X U F U Z, we obtain rf2(X,Z)='^^ ((XuZ)\(XnZ))u(y\(XuZ)) < < xuzuJy-^(xuz)) ((^uy)\(xny))u((yuz)s(ynz)) — | (xuy)\(jyny) | +| (yuz)s(mz) xu7uz < jxuY)^{xnY) (yuz)x(ynz) _ — | XuY | + | YuZ | — d2(X, F) + d2(F,Z), if X U Z ^ 0 and F ^ 0. However, for X = Z = 0 or F = 0, the triangle inequality clearly still holds. Therefore, both mappings are metrics. The metric d1 is quite elementary, but the other metric, the metric d2 has wider applications. In the literature, it is also known as Jaccard's metric.3 □ 7.E.8. Let d(x,y) : = \x-y\ x,y e. 1 + I x — y\ Prove that d is a metric on R. Solution. We prove the triangle inequality only (the rest is clear). Introduce an auxiliary function t (1) /(*) t > o. 3It is named after the biologist Paul Jaccard, who described the measure of similarity in insects populations using the function 1 — cfe in 1908. \Xi \p < oo. i=0 If x = (x1,x2 ■ ■ ■) e £p,p > 1, then the norm is given by i=0 i/p That 11 x 11 p is a norm follows immediately from the Minkowski inequality by letting n —> oo. The vector space l^, is the set of all bounded sequences of real or complex numbers x0, x1,.... If x = (x0, x1,...) G £oo, then its norm is given by 11x1100= sup{|xj|, i = 0,1,2, 3... } It is easily checked that this indeed a norm. Eventually, we return to the space of functions 5° [a, 6] on a finite interval [a, 6] or cfP [a, 6] on an unbounded interval. We have already met the L\ norm || ||i. However, for every p > 1 and for all functions in such a space of functions, the Riemann integrals rb \f(x)\pdx b x i/p \f(x)\pdx surely exist, so we can define ll/llP = The Riemann integral was denned in terms of limits, using the Riemann sums which correspond to splittings E with representatives &. In our case, those are the finite sums £i/(&)r(*i-*i-i)- Holder's inequality applied to the Riemann sums of a product of two functions f(x) and g(x) gives n E \f(&)\\9(&)\(xi - xi-i) = ll , n . 1/p, n x 1/q < m/&)r(*i-*i-i)) •(^iste)i'(^-^-i) where on the right-hand side, there is the product of the Riemann sums for the integrals 11 /11 and 11 g \ \ . Moving to limits, we thus verify Holder's inequality for integrals: rb / rb \ 1/p/ rb \ 1/q f(x)g(x) dx<[ f{xf dx) / g[x)i dx) which is valid for all non-negative real-valued functions / and g in our space of piecewise continuous functions with compact support. 488 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Note that f(s) - f(r) = l+s 1+r (l+s)(l+r whenever s > r > 0. It follows that / is increasing, a fact which can also be verified by examining the first derivative. Therefore, d(x, z) \x — y + y — z\ < 1 + \ x — z \ l + \x — y + y — z\ \x — y I + I y — z\ 1 + I a; — y\ + \y — z \ \x-y\ + \y-z\ 1 + I a — y\ + \y — z\ 1 + I x — y\ + \y — z\ < \x ~ y\ _|_ \y ~ z\ 1 + | a — y\ 1 + I y — z\ = d(x,y) + d(y,z), a, y, z G R. □ The metrics in the next problems are defined by norms on vector spaces of functions. See the definitions and discussion in 7.3.1. 7.E.9. Determine the distance between the functions /(a) = a, g(x) = --7=5, x G as elements of the normed vector space 5° [1,2] of (piecewise) continuous functions on the interval [1,2] with norm (a) II/Hi = if 1/(20 I da; 0>) ||/||00 = maX{|/(a;)|;a;e[l,2]}. Solution. The case (a). We need only compute the norm of the difference of the functions 2 2 f\f(x)-g(x) I da = fx+^JL=T da = (22- + vTT^2"]' = ! + v/5-v/2. The case (b). It is necessary to compute max (a H—,x , max I /(a) - g{x) xG[l,2] xGfl,2] Since a + ■ 1 + it follows that / — g is increasing, and so attains its maximum at the right end point of the interval when a = 2. So max a + . „ xG[l,2] V %/1+x2 = 2 + > 0, In just the same way as in the previous paragraph, we can derive the integral form of the Minkowski inequality from Holder's inequality: ll/ + sllP 1 (we verified this forp = 1 long ago). We use the word "norm" for the entire space 5° [a, 6] of piecewise continuous functions in this context; however, we should bear in mind that we have to identify those functions which differ only by their values at points of discontinuity. Among these norms, the case of p = 2 is special because of the existence of the inner product. In this case, we could have derived the triangle inequality much more easily using the Schwarz inequality. For the functions from 5° [a, 6], we can define an analogy of the Loo-norm on n-dimensional vectors. Since our functions are piecewise continuous, they always have suprema of absolute values on a finite closed interval, so we can set ll/IL = sup{/(a), a G [a,b]} for such a function /. If we considered both the one-sided limits (which always exist by our definition) and the value of the function itself to be the value /(a) at points of discontinuity, we can work with maxima instead of suprema. It is apparent again that it is a norm (except for the problems with values at discontinuity points). 7.3.6. Completion of metric spaces. Both the real numbers R and the complex numbers C are (with the metric given by the absolute value) a complete metric space. This is contained in the axiom of the existence of suprema. Recall that the real numbers were created as a "completion" of the space of rational numbers which is not complete. It is evident that the closure of the set Q C R is R. Dense and nowhere-dense subsets We say that a subset A c X in a metric space X is dense if and only if the closure of A is the whole space X. A set A is said to be nowhere dense in X if and only if the set X \ A is dense. Evidently, A is dense in X if every open set in the whole space X has a non-empty intersection with A. > 0, aG[l,2], In all cases of norms on functions from the previous para- graph, the metric spaces defined in this way are not complete since it can happen that the limit of a Cauchy sequence of functions from our vector space 5° [a, 6] should be a function which does not belong to this space any more. Consider the interval [0,1] as the domain of functions /„ which take zero on [0, l/n) and are equal to sin(l/a) on [1/n, 1]. They converge to the function sin(1/a) in all Lp norms, but this func-□ tion does not lie in the space. = 2 + V5- 489 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING The Li or L2 distances, discussed in the beginning of this chapter (cf. 7.1.2), reflect the basic intuition about fr the distance between graphs of the functions. However, in practice we need to understand more subtle concepts of distances. The most obvious way is to include the derivatives in a way similar to the values of the functions. 7.E.10. Consider the space S1 [a, b] of piecewise differen-tiable (real or complex) functions on the interval [a, b] and show that the formula \\f\\=(£\f(x)\2 + a2\fXx)\2dxy2 with any real a > 0 is a norm on this vector space (up to the identification of functions differing only in the points of discontinuity). Compute the distance between functions j(x) = sin(a;) + 0.1 [sin(6a;)]2 — 0.03 sin(60a;) and g{x) = sin(a;) on the interval [—tt, it] in this norm and explain its dependence on a. Solution. The formula (f,9)^ f f(x)gZ^) + a2f\x)7(xjdx J a defines a scalar product on S1 [a, b]. The mapping is linear in the first argument /, provides complex conjugate value if the arguments are exchanged and clearly is positive if / = g is non-zero on any interval (Ignore the values in the points of discontinuity, cf. the the discussion in 7.1.2). Thus the corresponding quadratic form defines a norm on the complex vector space S1 [a, b]. The distance in this norm is easily computed to obtain \/0.02639 + 11.3097a2. Its dependence on a can be seen in the illustration — the values of the function / (x) are nearly equal to sin x, but the very wiggly difference is well apparent in the derivatives. Completion of a metric space Let X be a metric space with metric d which is not complete. A metric space X with metric d such that X c X, d is the restriction of d to the subset X and the closure X in X, is the whole space X is called a completion of the metric space X. The following theorem says that the completion of an arbitrary (incomplete) metric space X can be found in essentially the same way as the real numbers were created from the rationals. 7.3.7. Theorem. Let Xbea metric space with metric d which is not complete. Then there exists a completion X of X. Proof. The idea of the construction is identical to the one used when building the real numbers. Two Cauchy sequences x{ and yi of points belonging to X are considered equivalent if and only if d(xi,yi) converges to zero for i approaching infinity. This is a convergence of real numbers, thus the definition is correct. From the properties of convergence on the real numbers, it is clear that the relation defined above is an equivalence relation. The reader is advised to verify this in detail. For instance, the transitivity follows from the fact that the sum of two sequences converging to zero converges to zero as well. We define X as the set of the classes of this equivalence of Cauchy sequences. The original points x e X can be identified with the class of sequences equivalent to the constant sequence xi = x, i = 0,1,.... It is now easy to define the metric d. We put d(x,y) = lim d(xi,yi) i—Yoo for sequences x = {x0, x1,... } and y = {yo, y±,... }. First, we have to verify that this limit exists at all and is finite. Using the triangle inequality, and the fact that both the sequences x and y are Cauchy sequences, it follows that the considered sequence is also a Cauchy sequence of real numbers d(xi,yi), so its limit exists. If we select different representatives x = {x^x^,...} and y = {y'0, y[,... }, then from the triangle inequality for the distance of real numbers (we need to consider the consequences for differences of distances) we see that \d(xi,y'i) - d(xi,yi) \ < |d(a;-,y-) - d(x'i,yi)\ + \d{x'i,yi) - d{xi,yi)\ < d(xux'i) + d(yuy'i). Therefore, the definition is indeed independent of the choice of representatives. We verify that d is a metric on X. The first and second properties are clear, so it remains to prove the triangle inequality. For that purpose, choose three Cauchy representatives of 490 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING If a = 0, the distance 0.162 is the usual L2 distance. If q — 1 the distance is 3.367.4 □ Now we move to more theoretical considerations. Though these exercises may not look particularly practical, they should be of help in understanding the basic concepts of metric spaces, the convergence as well as their links to the topological concepts. 7.E.11. Show that the definition of a metric as a function d defined on X x X for a non-empty set X and satisfying (1) d(x, y) = 0, if and only if x = y, x, y e X, (2) d(x,z) 0, x,yeX, (4) d(x,y) = d(y,x), x, y G X. However, if we set x = z in (2), we get the non-negativity of the metric from (1). Similarly, the choice y = z in (2) together with (1) implies that d(x, y) < d(y, x) for all points x,y G X. Interchanging the variables x and y then gives d(y, x) < d(x, y), i.e. (4). Thus, it is proved that the definitions are equivalent. □ F. Convergence 7.F.I. Describe all sequences in a discrete metric space X, which are convergent or Cauchy. Solution. Since the distance between two points x, y in X is either 1 or zero, the sequence xi,x2, ■ ■ ■ is Cauchy if and only if all Xi are equal, except for a finite number of them. But then, the sequence is convergent. □ This problem shows a behaviour quite different from the convergence of sequences in the metric spaces X = R or X = C. But sequences of integers would behave in a very ^ere is an illustration of the very important concept of Sobolev spaces, where any number of derivatives can be involved. Moreover, we can use Lp, p > tin the definition of the norm instead of p — 2. There is much literature on this subject. the elements x, y, z, and we obtain d(x,z) = lim d(xi,Zi) i—Yao < lim d(xi,yi) + lim d(yi,Zi) = d(x,y) + d(y,z). The restriction of the metric d just defined to the original space X is identical to the original metric because the original points are represented by constant sequences. It is required to prove that X is dense in X. Let x = {x{} be a fixed Cauchy sequence, and let e > 0 be given. Since the sequence x{ is a Cauchy sequence, all pairs of its terms xn, xm for sufficiently large indices m and n become closer to each other than e. Then the choice y = xn for one of those indices necessarily implies that the elements y and xm are closer together than e, and so, d(y, x) < e. Hence there is an element y of the original space such that the distance of the sequences of y's from the chosen sequence x{ does not exceed e. This establishes the denseness of X. It remains to prove that the constructed metric space is complete. That is, that Cauchy sequences of points of the extended space X with respect to the metric d are necessarily convergent to a point in X. This can be done by approximating the points of a Cauchy sequence Xk by points yk from the original space X so that the resulting sequence y = {yi} would be the limit of the original sequence with respect to the metric d. Since X is a dense subset in X, we can choose, for every element Xk of our fixed sequence, an element z^ £ I so that the constant sequence z^ would satisfy d(xk, z,k) < 1/k. Now consider the sequence z = {z0, z1,...}. The original sequence x is Cauchy. So for a fixed real number e > 0, there is an index n(e) such that d(xn, xm) < e/2 whenever both m and n are greater than n (e). Without loss of generality, the index n(e) is greater than or equal to 4/e. Now, for m and n greater than n(e), we get: d(zm, zn) — di^Zjyi, zn) ^ d^yZra, Xin) 4~ d(xm, Xn) -\- d(xn, Zn) < 1/m + e/2 + 1/n < 2- + - = e. - i i i - 4 2 Hence it is a Cauchy sequence zi of elements in X, and so z e X. From the triangle inequality, d{z,xn) < d(z,zn) + d(z n: Xn J. From the previous bounds, both terms on the right-hand side converge to zero. Hence the distances d(xn, z) approach zero, thereby finishing the proof. □ 7.3.8. Uniqueness. We consider now the uniqueness of the completion of metric spaces. 491 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING similar way. On the other hand, we deal mostly with metrics on spaces of functions, where intuition gained in the real line R may be useful. 7.F.2. Determine whether or not the sequence {xn}n€N where xi = l, xn = l + ±+ + neN\{l}, is a Cauchy sequence in R using the standard metric. Solution. Recall that isometries oo ^ oo, i.e. oo ^ — = oo, m £ N. (1) fc=i Therefore, oo lim | xn - xm | = J2 i = 001 m G N. Hence the sequence {i„} is not a Cauchy sequence. Alternatively, {xn} is not a Cauchy sequence, since if it is, then it is convergent in the complete metric space R, which contradicts the divergence shown in (1). □ 7.F.3. Repeat the question from the previous problem with the metric d given by (cf. 7.E.8) \x-y\ d(x,y) : = x,y e. 1 + I x — y\ Solution. Instead of repeating the arguments, we point out the difference between the given metric from the standard one. The difference is expressed by the function / introduced in (1). This is a continuous function and, moreover abijection between the sets [0,00) and [0,1), having the property that /(0) = 0. Further, the property of a sequence being Cauchy or convergent in a metric space is denned by being Cauchy or convergent for the real numbers describing the distances between the elements in the sequence. But the continuous mappings preserve convergence or the property being Cauchy, and hence the solution for the new metric is the same as with the standard one. □ 7.F.4. Determine whether or not the metric space C[—l, 1] of continuous functions on the interval [—1,1] with metric given by the norm (a) \\f\\p=(jL1\f(.x)\'>dxy/PfBrp>l; 0>) ||/||00 = maX{|/(a;)|;a;e[-l,l]} is complete. Solution. The case (a). For every n e N, define a function fn(x) = 0, x e [-1,0), fn(x) = i,xe (i, 1], fn(x) =nx, x e [0, i]. A mapping p : X\ —> X2 between metric spaces with metrics d1 and d2, respectively, is called an isometry if and only if all elements x,y e X satisfy d2(p(x), X1 and i2 : X —> X2, into two completions of the space X, and denote the corresponding metrics by d, di, and d2, respectively. The mapping ■X- ■X, p: Ll(X) - is well-defined on the dense subset i\ (X) c X1. Its image is the dense subset i2(X) c X2 and, moreover, this mapping is clearly an isometry. The dual mapping i\ oi2_1 works in the same way. Every isometric mapping maps, of course, Cauchy sequences to Cauchy sequences. At the same time, such Cauchy sequences converge to the same element in the completion if and only if this holds for their images under the isometry p. Thus if such a mapping p is denned on a dense subset X of a metric space X\, then it has a unique extension to the whole Xi with values lying in the closure of the image p(X), i. e. X2. By using the previous ideas, there is a unique extension of p to the mapping p : X\ —> X2 which is both a bijection and an isometry. Thus, the completions X\ and X2 are indeed identical in this sense. Thus it is proved: Theorem. Let X be a metric space with metric d which is not complete. Then the completion X of X with metric d is unique up to bijective isometries. In the following three paragraphs, we introduce three theorems about complete metric spaces. They are highly applicable in both mathematical analysis and verifying convergence of numerical methods. 7.3.9. Banach's contraction principle. A mapping F X —> X on a metric space X with metric d ___is called a contraction mapping if and only if there is a real constant 0 < C < 1 such that for all elements x, y in X, d(F(x),F(y))n,ra,n£N,we compute the inequality / 1 \ l/P /1/n \ 1/P (yi/m(z) -/„(*) I'd*) <(/lda;J = (t^)1/p- It follows that the sequence {/n}n6n C C[—l, 1] is a Cauchy sequence of functions. Suppose the sequence {/„} has a || • || limit / inC[—1,1]. We show that this limit cannot be continuous in x = 0. For every e G (0,1), there exists an n(e) G N such that fn(x) = 0, X G [-1,0], fn(x) = 1, X G [ff, 1] for all n > 71(e). Imagine, f(y) ^ 1 at some y > e. Then II/— /nil > <5 > Oforalln > ?i (e) and some <5, since/is continuous. Thus / 7^ 1 on some bounded interval containing y. Therefore / must satisfy /W = 0,x£[-1,0], f(x) = 1, x G [e, 1] for an arbitrarily small e > 0. Thus, necessarily, /(aO = 0, xG [-1,0], /(x) = l,x£(0,l]. But this function is not continuous on [—1,1], so it does not belong to the considered metric space. Therefore, the sequence {/„} does not have a limit in C[—l, 1], so this space is not complete. The case (b). Let an arbitrary Cauchy sequence {/n}nGN C C[—l, 1] be given. The terms of this sequence are continuous functions /„ on [—1,1] having the property that for e > 0 (or for every e/2 if you want) there is an 7i(e) G N such that £ (1) max fm(x) - fn(x) | < -, 771,71 > 7i(e). xG[-l,l] 2 In particular, for every x G [—1,1], we get a Cauchy sequence {/n (x) }nGN C Kof numbers. Since the metric space R with the usual metric is complete, every (for x G [—1,1]) sequence {jn(x)} is convergent. Set f(x) := lim fn(x), x G [-1,1]. n—>-oo Letting 77i^ooin(l), we obtain max | f(x) — fn{x) < § < e, n > n(e). xe[—1,1] It follows that the sequence {/n}nGN converges uniformly (that is, with respect to the given norm), to the function / on [—1,1]. Since the uniform limit of continuous functions is continuous, so is /, so / G C[—l, 1], see 6.3.4. Therefore, the metric space is complete. □ The same reasoning as above, and hence the same results, apply to the more general metric space C[a, b] of continuous The metric space X, of course, needs to be complete; otherwise it could happen that the limit point does not exist in it. Choose an arbitrary z0 G X and consider the sequence zu i = 0,1,... 2i = F(z0), z2 = F{z{),..., zi+i = F(zi),... From the assumptions, we have d(zi+1,Zi) = < Cdiyz1,z1-1) <■■■< (Jdizuzo). The triangle inequality then implies that for all natural numbers j, j d(zi+j,Zi) < y^d(zj+fc,Zj+fc-i) k=l j j < E Ce+fc"1 d(zu z0) = C d(zuzo) E c*"1 k=l k=l KOdiz^zo)^^-1 =T^—d{z1,z0). k=l Now, since 0 < C < 1, limn_C1 = 0, so for every positive (no matter how small) e, the right-hand expression is surely less than e for sufficiently large indices i, that is, C ■ d(z1,z0) < e. 1-C However, this ensures that thre sequence z{ is a Cauchy sequence. Since X is complete, the sequence has a limit z, and all that remains to be proved is F(z) = z. Every contraction mapping is continuous. Therefore, F(z) = F( lim zn) = lim F(zn) = z. n—>-oo n—>-oo This finishes the proof. □ The next two theorems extend the intutive understanding jjj i of "density" of closed intervals [a, 6] c R, not allowing for any "holes" there. They are essential for the understanding of compactness of metric spaces. In f ^ fact, they are both special cases of more general theorems on topological spaces. 7.3.10. Cantor intersection theorem. For any set A in a metric space X with metric d, the real number diam^4 = sup d(x,y) is called the diameter of the set A. The set A is said to be bounded if and only if diam^4 < oo. Theorem. If A1 D A2 D • • • D A{ D ... is a non-increasing sequence of non-empty closed subsets in a complete metric space X and if ^diam A\ —> 0, then there is exactly one point x G X belonging to the intersection of all the sets Aj.2 Georg Cantor is considered as the founder of the set theory which he introduced and developed in the last quarter of 19th century. At this time, 493 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING functions on any closed bounded interval [a, b] or on the space Cc of continuous functions with compact support. 7.F.5. Prove that the metric space £2 is complete. Solution. Recall that £2 is the space of sequences of real numbers with the L2-norm, see 7.3.5. Consider an arbitrary Cauchy sequence {in}n6n in the space £2. Every term of this sequence is again a sequence, i.e., xn = {x^jkeN, n G N. Of course, the range of indices does not matter - there is no difference whether n, k G N or n, k G Nu{0}. Introduce auxiliary sequences yk for k G N so that yk = {yfcjnGN = {xn}n&H- If {xn } is a Cauchy sequence in £2, then each of the sequences yk is a Cauchy sequence in R (the sequences yk are sequences of real numbers). It follows from the completeness of R (with respect to the usual metric) that all of the sequences yk are convergent. Denote their limits by Zk, k G N. It suffices to prove that z = {zk}k&t £ £2 and that the sequence {xn} converges for n —> 00 in £2 just to the sequence z. The sequence {xn}n€N c £2 is a Cauchy sequence; therefore, for every e > 0, there is an 71(e) G N with the property that E (xm ~ xn) < ei m,n> Tl(ff), 771, 71 G N. fc=l In particular, I 2 Y^(Xm~Xn) < e' 771, 71 > 7l(e), 771, 71, / G N, fc=l whence, letting m —> oo, i 2 E^fc-^n) <^ 71 > 7l(e), 71, / G N, fc=l i.e. (this time / —> oo) oo (1) £ (zfc - xkn)2 < e, ?i > 7i(e), ?i G N. k=l Especially, oo 2 E (zfc — z£) < oo, 7i > 7i(e), n G N fc=i and, at the same time, oo 2 E (xn) < °°, neN, fc=i which follows straight from {sn}n6N C ^2. Since (cf. the special case of Holder's inequality forp = 2 in 7.3.4) OO / OO / oo E 0, for every positive real number e, we can find an index 71(e) such that for all A{ with indices i > 71(e), their diameters are less than e. For sufficiently large indices i, j, d(xi,Xj) < e, and thus our sequence is a Cauchy sequence. Therefore, it has a limit point x G x. x must be a limit point of all the sets A{, thus it belongs to all of them (since they are all closed). So x belongs to their intersection. This proves the existence of x. Assume there are two points x and y, both belonging to the intersection of all the sets A{. Then d(x, y) must be less than the diameter of the sets A{. But diam A{ —> 0, so d(x, y) = 0, hence x = y. This proves the uniqueness of x. □ 7.3.11. Theorem (Baire theorem). If x is a complete metric space, then the intersection of every countable system of open dense sets A{ is a dense set in the metric space x? Proof. Suppose x contains a system of dense open sets ;,; A{, i = 1,2____It is required to show that the ■£\ set A = tlfl-^Ai has a non-empty intersection wjm guy 0pen set jj c x. Proceed inductively, invoking the previous theorem. Since A1 is dense in x, there is a point z1 G A1 n u. Let Ui be an open ball, centre z\, of positive radius e\, such that its closure B1 is contained in U. Suppose the points Zi and their open e^-neighbourhoods Ui are already chosen for i = 1,..., 71 with zi G A{ n U. Since the set j4„+i is open and dense in x, there is a point zn+1 G ^4n+i n t/„; however, since An+1 n C/n is open, the point 2n+i belongs to it together with a sufficiently small en+1 -neighbourhood [/„+1. Since Ai is dense, there is a 21 £ Aifl (7, but since the set Ai is open, the closure of an e\-neighbourhood U\ (for sufficiently small e{) of the point z1 is contained in A1 as well. Denote the closure of this ei-ball Ui by B1. Further, suppose that the points z{ and their open e^-neighbourhoods Ui are already chosen for i = 1,..., 71. Since the set An+1 is open and dense in x, there is a point zn+1 G An+1 n Un; however, since An+1 n C/n is open, the point zn+1 belongs to it together with a sufficiently small en+1 -neighbourhood [/„+1. Then, the closures surely satisfy Bn+1 = Un+i C [/„, and so the closed set Bn+1 is contained in An+1 n t7„. Moreover, we can assume that en < 1/n. If we proceed in this inductive way from the original point z1 and the set B\, we obtain a non-decreasing sequence of non-empty closed sets Bn whose diameter approaches the new abstract approach to fundamentals of Mathematics caused fierce objections. It also lead to the severe internal crises of Mathematics in the beginning of the 20th century. This part of the history of Mathematics is fascinating. 3This theorem is a part of considerations by Rene-Louis Baire in his 1899 doctoral thesis. More generally, a topological space satisfying the property as in the theorem is called a Baire space and the theorem simply says that every complete metric space is a Baire space. 494 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING E (**-4) = E K -+ K) ), "€N. fc=l k=l Hence < oo. fn(x) = E4 k=l It is proved that z e £2. The fact that {xn} converges for n —> oo to z in £2 follows from (1). □ The next problem addresses the question of the power of different metrics on the same space of functions in terms of convergence. We deal with the space Sc of piecewise continuous functions with compact support, equipped with the Lp metrics. We write briefly Cp for these metric spaces. In particular, we show that convergence in Cp for some positive p does not always imply convergence in Cq for another positive q^p. 7.F.6. Let 0 < p < oo. For each positive integer n, define the sequence of functions iLlp -\/n oo, then \fn(x)\"dx^0. J —oo So || fn \\q —> 0, and the sequence converges to the zero function. Similarly if 0 < p < q and if n —> oo, then \fn(x)\qdx -> OO. So || /„ || diverges, and in particular, /„ cannot converge to any limit. Finally, for q = p we have J \jn(x)\pdx = 2 for all positive integers n, and so as n —> oo we get || /„ || —> 2xlp. At the same time, for any g g Sc, if g(x) ^ 0 at some x ^ 0 where g is continuous, its distance from /„ cannot converge to zero. It follows that /„ converges to 0 in Cq, 0 < q < p, but it does not converge in Cq with q > p. □ The next problem deals with the extremely useful Banach fixed point theorem, showing the necessity of all the requirements in Theorem 7.3.9. zero. Therefore, there is a point z common to all of these sets. That is, z g n°ZiUt = n°ZiBt c fti4 n u, which is the statement to be proved. □ 7.3.12. Bounded and compact sets. The following concepts facilitated our discussions when dealing with the real and complex numbers. They can (Rl-Ofe be reformulated for general metric spaces with almost no change: An interior point of a subset A in a metric space is such an element of A which belongs to it together with some of its e-neighbourhoods. A boundary point of a set A is an element x g X such that each its neighbourhood has a non-empty intersection with both A and the complement X \ A. A boundary point may or may not belong to the set A itself. A limit point of a set A is an element x equal to the limit of a sequence Xi g A, such that Xi =^ x for all i. Clearly a limit point may or may not belong to the set A. An isolated point of a set A is an element a g A such that one of its e-neighbourhoods in X has the singleton intersection {a} with A. An open cover of a set A is a system of open sets Ui C X, i e I, such that their union contains A. Compact sets A metric space X is called compact if every sequence of points Xi g X has a subsequence converging to some point xex. Any subset iclina metric space is called compact if it is compact as the metric space with the restricted metric. Clearly, the compact subsets in discrete metric spaces X are exactly the finite subsets of X. In the case of the real numbers R, our definition reveals the compact subsets discussed there and we would also like to come to useful properties as we did for real numbers in the paragraphs 5.2.7-5.2.8. It is suprisingly easy to see, that the continuous functions behave similarly on compact sets in general: Theorem. Let f : X —> Y be a continuous mapping between metric spaces. Then the images of compact sets are compact. Proof. Recall that any convergent sequence of points Xi —> x in X is mapped onto the convergent sequence j(xi) —> j(x) in Y. Thus, the statement follows immediately from our definition of the compactness via convergent subsequences. □ In particular we obtain the most useful consequence on the minima and maxima of continuous functions on compact subsets: Corollary. Let f : X —> R be a real function defined on a compact metric space. Then there are the points x0 and yo in 495 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 7.F.7. Show that the mapping / : (0,00} —> (0, 00} given by f(x)=x + e-x satisfies, for all x ^ y, the condition l/fa:) - f(y)\ <\x- y\, but it does not have any fixed point, i.e. f(x) =^ x for any x. (Thus the condition \f(x) — f(y) R denned by the formula d{x,y) := 1, x ^ y, d(x, y) := 0, x = y . (a) Decide whether (X, d) is complete. (b) Describe all open, closed, and bounded sets in (X, d). (c) Describe the interior, boundary, limit, and isolated points of an arbitrary set in (X, d). (d) Describe all compact sets in (X, d). Solution. The case (a) was essentially dealt with in 7.F.I. For an arbitrary sequence {sn}n6N to be a Cauchy sequence, it is necessary in this space that there is an index n e N such that xn = xn+m for all m e N. Any sequence with this property then necessarily converges to the common value xn = xn+i = ■ ■ ■ (we talk about almost stationary sequences). So the metric space (X, d) is complete. The case (b). The open 1-neighbourhood of any element contains this element only. Therefore, every singleton set is open. Since the union of any number of open sets is an open set, every set is open in (X, d). By complements, X such that f(x0) = maxxeX{f(x)}, f(y0) = minxeX{f(x)}. Proof. The image f(X) must be a compact subset in R, thus it must achieve both maximum and minimum (which are the supremum and the infimum of the bounded and closed image). □ The concept of boundedness is a little more complicated in the case of general metric spaces. For any point x and subset B c X in a metric space X with metric 0!, we define their distance4 dist(a;, B) = inf {d(x, y)}. yeB We say that a metric space X is totally bounded if for every positive real number e, there is a finite set A such that dist(a;, A) < e for all points x e X. We call such an A an e-net of X. Note that a metric space is bounded if X has a finite diameter. We can immediately see that a totally bounded space is always bounded. Indeed, the diameter of a finite set is finite, and if A is the set corresponding to e from the definition of total boundedness, then the distance d(x, y) of two points can always be bounded by the sum of dist(a;, A), dist(y, A), and diam A, which is a finite number. In the case of a metric on a subset of a finite-dimensional Euclidean space, these concepts coincide since the boundedness of a set guarantees the boundedness of all coordinates in a fixed orthonormal basis, and this implies total boundedness. (Verify this in detail by yourselves!) The next theorem provides the promised very useful alternative characterisations of compactness: 7.3.13. Theorem. The following statements about a metric space X are equivalent: (1) X is compact; (2) every open covering of X, X = VJiejUi, contains afinite covering X = Vj^=1Ujk, where all j k G I; (3) X is complete and totally bounded. Proof. We show consecutively the implications (1) => (3) =^ (2) => (1). (1) => (3). Assume X is compact. Then for each Cauchy sequence of points x{, there is a sub-sequence xin converging to a point x e X. We just have to verify that the initial sequence also converges to the same limit, x{ —> x. This is easy and we leave it to the reader. So X is complete. Suppose X is not totally bounded. Then there is e > 0 such that no finite e-net exists in X. Then there is a sequence of points Xi such that d(xi,xf) > e for all i ^ j. (Verify this Notice, that the distance between two subsets A, B C X should express how "different" they are. Thus we define the (Hausdorff) distance as follows dist(A, B) = max{sup{dist(a;, B), x G A}, sup{dist(y, A), y G B}}. This difference is finite for bounded sets and it is easy to see that it vanishes if and only if the closures of A and B coincide. 496 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING this also means that every set is closed. The fact that the 2-neighbourhood of any element coincides with the whole space implies that every set is bounded in (X, d). The case (c). Once again, we use the fact that the open 1-neighbourhood of any element contains this element only. It follows that every point of any set is both its interior and its isolated point and that the sets have neither boundary nor limit points. The case (d). Every finite set in an arbitrary metric space is compact (it defines a compact metric space by restricting the domain of d). It follows from the classification of convergent sequences (see (a)), that no infinite sequence can be compact in (X, d). □ 7.G.2. In the metric space (2) is more demanding. So assume X is complete and totally bounded, but X does not satisfy (2). Then there is an open covering Ua, a e I, of X, which does not contain any finite covering. Choose a sequence of positive real numbers Ek —> 0 and consider the finite e^-nets from the definition of total bound-edness. Further, for each k, consider the system Ak of closed balls with centres in the points of the e^-net and diameters 2ek- Clearly each such system Ak covers the entire space X. Altogether, there must be at least one closed ball C in the system Ai which is not covered by a finite number the sets Ua. Call it C\ and notice that diamC\ = 1e\. Next, consider the sets C\ n C, with balls C e A2 which cover the entire set C\. Again, at least one of them cannot be covered by a finite number of Ua, we call it C2. This way, we inductively construct a sequence of sets Ck satisfying Ck+i C Ck, diam Ck < 2ek, £k —> 0, and none of them can be covered by a finite number of the open sets Ua. Finally we choose one point Xk £ Ck in each of these sets. By construction, this must be a Cauchy-sequence. Consequently, this sequence of points has a limit x since X is complete. Thus there is Uaa containing x and containing also some ^-neighbourhood B$(x). But now, if diamCfc < 2ek < S, then Ck C Bg(x) c Uaa, which is a contradiction. The remaining step is to show the implication (2) => (1). Assume (2) and considering any sequence of points Xi e X, we set Cn = {xk\ k >n}. The intersection of these sets must be non-empty by the following general lemma: Lemma. Let X be a metric space such that property (2) in the Theorem holds. Consider a system of closed sets D a, a G I, such that each its finite subsystem Dai,..., Dak has nonempty intersection. Then also This simple lemma is proved by contradiction, again. If the latter intersection is empty, then X = X\ (f)aeIDa) = UaeI(X \ Da) = UaeIVa, where VQ = X\Da are open sets. Thus, there must be a finite number of them, {Vai, VQn }, covering X too. Thus, we obtain X =iVa, ■U?=1(X\Dai)=X\(n?=1Dai). This is a contradiction with our assumptions on Da and the lemma is proved. Now, let x e n^LjCn. By construction, there is a subsequence xnk in our sequence of points xn e X, so that d(xUk ,x) < 1/k. This is a converging subsequence, and so the proof is complete. □ 497 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Solution. That the requirement (1) cannot be omitted is probably contrarily to many readers' expectations. For a counterexample, consider the set X = N with metric d(m, n) = 1 + mjj_n, m 7^ n, d(m, n) = 0, m = n. It is indeed a metric. The first and second properties are clearly satisfied. To prove the triangle inequality, it suffices to observe that d(m, n) g (1,4/3] if m ^ n. Hence the only Cauchy sequences are those which are constant from some index on. These sequences are constant except for finitely many terms, sometimes called almost stationary sequences. Thus, every Cauchy sequence is convergent, so the metric space is complete. Define An := {m g N; d(m, n) < 1 + ^} , ti g N. As the inequality in their definition is not strict, it is guaranteed that they are closed sets. Since An = {ti, n + 1,... }, it follows that {An} are nested, but with empty intersection (contrary to (1)). If the requirement (1) is omitted, then the metric space is not complete, contradicting the data. Of course, in this case the condition (1)) is not met, as lim sup {d(x,y); x,y g An} n—yoo l = lim 1 + 2n + 1 = 1/0. □ 7.G.4. Determine whether the set (known as the Hilbert cube) A = {{xn}neN g £2; | xn | < i, n g N} is compact in £2. Then determine the compactness of the set B = {{xn}neN g £oo; | Xn | < 71 g N} in the space £oo ■ Solution. The space £2 is complete (see 7.F.5). Every closed subset of a complete metric space defines a complete metric space. The set A is evidently closed in l2, so it suffices to show that it is totally bounded, and from the theorem 7.3.13(3) it is compact. To do that, construct an e-net of A for any given e > 0: Begin with the well-known series oo „ J_ — zL 2^ k2 — 6 k=l (see (1)). For every e > 0, there is an n(e) g N satisfying As an immediate corollary of the latter theorem, each closed subset in a compact metric space is again compact. For subsets of a totally bounded set are totally bounded, and closed subsets of a complete metric space are also complete. Another consequence is an alternative proof that a subset K c R™ is compact, if and only if it is closed and bounded. Notice also that while the conditions (1) and (3) are given in terms of the metric, the equivalent condition (2) is purely topological. 7.3.14. Continuous functions. We revisit the questions related to continuity of mappings between metric spaces. If fact, many ideas understood for the functions of one real variable generalize naturally. In particular, every continuous function / : X —> R on a compact set X is bounded and achieves its maximum and minimum. Indeed, consider the open intervals Un = (n — l,7i+l) c R, 7i g Z covering R. Then their preimages S^iUi) cover X, so that there is a finite number of them, covering X as well. Thus / is bounded and the supremum and infimum of its values exist. Consider sequences f(xn) and f(yn) converging to the supremum and infimum, respectively. Then there must be covergent subsequences of the points xn and yn in X and their limits x and y are in X too. But then f(x) and f(y) are the supremum and infimum of the values of / since / is continuous and thus respects convergence. We should also enjoy to see the differences between the "purely topological" concepts, as the continuity (possibly defined merely by means of open sets), and the next stronger concepts, which are "metric" properties. Uniformly continuous mappings E k2 k=n(e) + l A mapping / : X —> Y between metric space is called uniformly continuous, if for each e > 0 there is a S > 0, such that dy (/(x), / (y)) < e for all x, y g X with dx(x, y) < S. Notice that this requirement on the uniform continuity of / is equivalent to the condition that for each pair of sequences x^ and y^ in X, dx(xk,yk) —> 0 implies dY(f(xk),f(Vk))^0. This observation leads to the following generalization of the behavior of real functions: Lemma. Each continuous mapping f : X —> Y on a compact metric space X is uniformly continuous. Proof. Assume / is a continuous function. Consider any two sequences x^ and yk with d(xk, yk) —> 0. Since X is compact, there is a subsequence of Xk converging to some point x g X and so we may assume x^ —> x, without loss of generality. Now, dx(x,yk) < dx(x,xk) + dx(xk, Vk) -> 0 and so lim^oo yk = x, too. 498 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING From each of the intervals [— 1/n, 1/n] for n e {1,n(e)}, choose finitely many points x\,..., („) so that for any x e [—1/n, 1/n] that min x — x™ < -4=. j€{l.....ra(n)} 1 J 1 V5" Consider such sequences {yn}n&i from l2 whose terms with indices n > n(e) are, zero, and at the same time, r- / 1 1 \ r- / n(e) n(£) X Vl t ^l, • • ^xm(l)( ' • • • ' ^ \X1 ' • • • 'Xm(n(e))/- There are only finitely many such sequences and they create the desired e-net for A: let xn e l2 is arbitrary. According to our choice of the sequences yn, there is yn such that d(xn,yn) = \ Vk) k=l < < < e \ k=l Y(Xk VkV + \ E + e k=n(e) + l e 5«(e) + 2 1 T-l+2=e- Since e > 0 is arbitrary, the set ^4 is totally bounded, which implies compactness. The closure of the set B is B = {{xn}n&i e loo; | %n | < «• G N} . Hence B is not closed, and so it is not compact. The set B is compact. The proof of this fact is much simpler than for the set A, thus we leave it as an exercise for the reader. □ 7.G.5. Prove that on each metric space X, the given metric d is a continuous function IxI->R, O 7.G.6. Show that if F is a continuous mapping on a compact metric space X, then the inequality d(F(x),F(y)) 0. If a/0, then d(F(x0),F(F(x0))) < d(x0,F(x0)) = a which is a contradiction. □ Next, notice that the metric dy : Y x Y —> R is always a continuous function (cf. the problem 7.E.1 in the other column). But then the continuity of / ensures lim dY(f(xk), f{yk)) = dY(f(x), f(x)) = 0. k—too By the latter observation, this is equivalent to the uniform continuity of /. □ A very useful variation on the theme of continuity is the following definition. LlPSCHITZ continuity A function / : X —> Y between metric spaces is called Lipchitz continuous if there is a constant C > 0 such that for all points x,y e X dY(f(x),f(y)) Y between matric spaces. We say that the functions in M are equicontinuous, if for each e > 0 there is a S > 0, such that dY(f(x),f(y)) < e for all x,y e X with dx(x,y) < S, for all functions f e M. Consider the metric space C(X) of all continuous (real or complex) functions on a compact metric space X, with its || | |oo norm. This means that the distance between two functions f,gis the maximum of the distance between their values j(x) and g(x) for x e X. We say that a set M C C(X) of real functions is uniformly bounded, if there is a constant K e R such that \f(x) < K for all functions f e M and points x e X. Of course, bounded sets M of functions in C(X) are always uniformly bounded, by the definition of the norm. Theorem. Consider a compact metrix space X. A set M C C(X) in the space of continuous functions with the supremum norm \\\\oo is compact if and only if it is bounded, closed, and equicontinuous.5 Proof. Suppose M is compact. Then M is totally bounded (and thus also uniformly bounded as noticed above). Since every compact subset is closed, it remains to verify the equicontinuity. DA weaker version providing a sufficient condition was first published by Ascoli in 1883, the complete exposition was given by Arzela in 1895. Again, there are much more general versions of this theorem in the realm of topological spaces. 499 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Given e > 0, consider the corresponding e-net (fi, f2, ■ ■ ■, fk) C M from the definition of the total boundedness of M. Recall that all the functions f are uniformly continuous (as continuous functions on a compact set). Thus there is a Si for each f, such that dx (x, y) < S{ implies \fi(x) - fi(y)\ <£■ Of course, we take S to be the minimum of the finite many Si,i = l,...,k. Then the same equality holds for all f in the e-net. But now, considering an arbitrary function f e M, there is a function/j in our e-net with ||/—fj| and so dx(x, y) < S implies that \f(x) — f(y) | is at most \f(x) - fj(x)\ + \fj(x) - fj(y)\ + \fj(y) - f(y)\ < 3e, and the equicontinuity has been proved. Conversely, suppose that M is a bounded, closed, and \\ equicontiuous subset oiC(X), with X as a compact metric space. * till First we show that M is complete. This was W shown in the case when X is a closed bounded interval of reals in the problem 7.F.4. Exploit the equicontinuity to see that the limit function / is again continuous. The same argument works in general. Thus, we need to find a Cauchy (sub)sequence within any sequence of functions fn e M. The compact space X itself is totally bounded and therefore it contains a countably dense set A c X (we may take the points in all 1/fc-nets for k e N). Write A = {a±, a2,... } as a sequence. Choose a subsequence of functions fij, j = 1,2,... within the functions /„, so that the sequence of values fij (ai) converges. (This is possible, since the set M is bounded in the || | |oo norm). Similarly, the subsequence f2j can be chosen from fik, so that f2j(a2) converges. In general, the m-th subsequence is chosen from f(m-i)k and have the values fmj(a~m) converging (and by our construction, it converges in all Oi,i 0 and find Se > 0, such that \f(x) — f(y)\ < e whenever the arguments x and y are closer than Se. Let Ae c A be subset forming a n and all a e As, we know \gi(a) — gj(a) | < e. But then, for every x e X, there is some a e Ae with dx (x, a) < Se and so \gi (x) — gj (x) | can be at most \gi(x) - gi{t)\ + \gi{i) - gj{t)\ + \gj(t) - gj(x)\ < 3e. Thus, the sequence g^ is a Cauchy sequence in C(X), and so M is compact. □ 500 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING H. Additional exercises to the whole chapter 7.H.I. Expand the function sin2 (2) on the interval [—7r, 71] into a Fourier series. O 7.H.2. Expand the function cos2 (x) on the interval [—71, it] into a Fourier series. O 7.H.3. Sum the two series v- J_ v- (-1)" n4 ' Z^ n4 77=1 77=1 Solution. We hint at the procedure by which the series 00 00 n2k, n2 77=1 77=1 for general k g N can be calculated. Use the identities Esm na; % a;e(0,27r), 77=1 (2) x2 = — +4V-V^-47rV-—, a; g 0,2ti, 77=1 77=1 which follow from the constructions of the Fourier series for the functions g(x) = x and g(x) = x2, respectively, on the interval [0, 2tt). By (1), £^ = ^, x g (0,271). Substituting into (2) gives Since the values of the series g cg^= 3^-6^+2^ XG (0,271). Z^ 772 6 ' Z^ 772 6 ' Z^ 772 12 77=1 77=1 have been already determined, substitution then proves the validity of this last equation at the marginal points x = 0, x = 2 71. The left-hand series is evidently bounded from above by J2n°=i ~h^< mus it converges absolutely and uniformly on [0, 27i]. Therefore, it can be integrated term by term: OO OO r -|X X OO ^ Sin(77x) _ ^ Sin(777/) _ j COs(t77/) ^ 77=1 71 77=1 L 71 JO 0 77=1 71 = j 3^-6^+2^ dy^ x3-3^2+2^Xi ^[^^ 0 In fact, every Fourier series may be integrated term by term. Further integration gives El—cos(t7jt) v—* T cos(ny)~\X r sin(ny) , 77=1 77=1 L J U 0 77=1 X = 1^ 0 Substituting x = 71 leads to = I y3-^(+2*2y dy= S-^W*', xg[0,2ti]. 0 l + (-l)" + 1 _ l-COs(777r) _ ^ El+^-lJ 1 _ ^-v 1 —COS^7 ,7* — Z^ ~ 77=1 77=1 Since the numerator on the left-hand side is zero for even numbers n and is 2 for odd numbers n, the series can be written 2 714 (3) E(2n-i)4 48- 77=1 v ! From the expression 501 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING En4 2~2 (2n)4 + 2~2 (2n-l)4 16 E „4 + E (2n-l)4' 77=1 77=1 77=1 77=1 77=1 it follows that oo oo r i_ I p _ _ (2n-l)4 15 2 V _L - W V 1 _ 16 1 2-i n4 15 ^ (2n-l 77=1 77=1 thereby having summed up the first series. As for the second one, OO , oo oo oo oo V - V i _ V 1 - V —±___L V -A- „4 ^ (2n-l)4 ^ (2n)4 ^ (2n-l)4 16 ^ n4 71 = 1 71 = 1 71 = 1 71 = 1 71=1 = I 2I _ J_ nL = lat 2 48 16 90 720' (-1) One can proceed similarly to sum the series oo oo 5E ^Tfc' 5E 71=1 71=1 for other k e N. It is natural to ask for the value of the series J2t7=i ■ This problem has been tackled by mathematicians for centuries without success. The reader may justifiably be surprised by this since the procedure above is applicable to all the odd powers as well. For instance, one can start with the identity oo E cos^i = _ln (2 sin I), I€(0,2f), 71 = 1 which, by the way, can be proved by expanding the right-hand function into a Fourier series. If, similarly to the above, integrate the left-hand series term by term twice and substituted x —> 0+ in the limit, we get the series Y^=i ^s- Thus, it should suffice to integrate the right-hand function twice and calculate one limit. However, the integration of the right-hand side leads to a non-elementary integral. That is, the antiderivative cannot be expressed in terms of the elementary functions. 5 □ 7.H.4. In problem 7.45 there occurs the following integral: oo — oo There, the integral was evaluated by converting the complex exponential to real trigonometric functions. Evaluate it by converting the real function sin(oi) to its complex exponential form. O 7.H.5. Determine the convolution of the functions f± and f2, where '1 forme [-1,0] fi = h = 0 otherwise x for x e [0,1] 0 otherwise. o 7.H.6. Determine the function / whose Fourier transform is the function Solution. We might have noticed that the sine function appeared as the image of the characteristic function hn of the interval (—17,17) in one of the previous problems: 217 hn(u>) = _sine (oi 17). V 27T The function C{p) — Y^=i is caMe(l me Riemann zeta function. EXPAND THE FOOTNOTE! 502 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING In this case Q = 1 and the function / is the half of h\. The result can be computed directly. The inverse Fourier transform is oo — oo 0 / 0 oo \ \-oo 0 / Substitute —uj for uj in the first integral, to obtain (oo oo \ 0 0 / oo oo = £ / ^ 2(e—* + e-*) du = I / ^ cos M) ^. o o Continue via trigonometric identities to obtain (oo oo \ rj^(^dw+ rj^(^dw \ 0 0 / The substitutions u = a; (1 + i), v = cu (1 — t) then give (oo oo \ {^du-/^d»U0, t>l; 0 0 / (oo oo \ oo 0 0/0 (oo oo \ -J^du+js^d^ =0, i<-l. 0 0 / Thus the function / is zero for | i | > 1 and constant (necessarily non-zero) for | £ | < 1. (Throughout, we assume that the inverse Fourier transform exists). The constant is j(t) = 1/2 for \t\ < 1, from the standard result oo f SiEJi du = f. J u 2 0 Alternatively, we can 'guess' that the constant is one, i.e. g(t) = l, \t\1 and compute i = ^ / di = -fa J cos (ut) di = jj-^. So /(0) = g(0)/2 = 1/2, which also establishes f sin« du = s J u 2 0 □ 7.H.7. Using the relation (1) £(f)(s) = s£(f)(s)-lun+f(t), derive the Laplace transforms of both the functions y = cos t and y = sin t. Solution. Notice first that from (1), it follows that £ (/") (s) = sC (/') (s) - tlm1+ f'(t) = s (s£ (/) (s) - f(t)) - /'(i) = S2£(/) (s)-SťHm/(í)-tlm1+/'(í). Therefore, —C (siní) (s) = £ (— siní) (s) = £ ((siní)") (s) = s2£ (siní) (s) — s lim siní — lim cosi = s2£ (siní) (s) — 1, whence we get 503 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING —£ (siní) (s) = s2£ (siní) (s) — 1, i. e. £ (siní) (s) = jjqrf-Now, invoking (1), we determine £ (cosi) (s) = £ ((siní)') (s) = s s21+1 — lim siní : □ 7.H.8. Using the discussion from the previous problem, prove that for a continuous function y with enough sufficiently higher order derivatives n £(y^)(s) = sn£(y)(s) _ £V-y<-i)(0). i=l Solution. Clearly £(y%s) = s£(y)(s)-y(0) £(y")(s) = s2£(y)-Sy(0)-y'(0) and the claim is verified by induction. □ 7.H.9. Find the Laplace transform of Heaviside's function H(t) and, for real a, the shifted Heaviside's function Ha(t) = H(t-a): !0 for t < 0, \ for t = 0, 1 for t > 0. Solution. poo poo £(H(ť))(s) = / H(ť)e-Stdt= e-stdt Jo Jo st e s £(Ha(t))(s) = £(H(t - a))(s) = -l(0-l) = i 0 b /*oo /*oo / H(t-a)e-stdt = / e-stdi JO in p-as □ 7.H.10. Show that for real a, (1) £(f(t) ■ Ha(ť))(s) = e-M + a))(s) Solution. /•OO fOO £(f(t)Ha(t))(s)= / /(í)JÍ(í-a)e-sťdí= / /(í)e"sťdí JO J a /*oo /*oo = / /(í + a) e-s(ť+a) dí = e_as / /(f + a) e"3* df io jo = e-M £(f(t + a))(s). □ 504 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING More details ? 7.H.11. Find a function y{i) satisfying the differential equation y"{t) + Ay{t) = sin2/j and the initial conditions y(0) = 0 and y'(0) = 0. Solution. From the example 7.H.8: s2£(y)(S)+A£(y)(s) = £(sin2i)(s) Now, by 7.D. 1(d) £(sin2f)(s) = It follows that The inverse transform then gives y(t) = ±sin2/j - ±fcos2/j. 7.H.12. Find a function y(t) satisfying the differential equation and the initial conditions: y"(t)+4y(t) = f(t), 2/(0) = 0, y'(0) = -1, where j(t) is the piecewise continuous function s2 +4' 2 □ /(*) cos(2i) for 0 < t < TT, 0 for t > jr. check the reference Solution. This problem is a model for the undamped oscillations of a spring (excluding friction and other phenomena like non-linearities in the toughness of the spring and so on). It is initiated by an exterior force during the initial period only and then ceases. The function j(t) can be written as a linear combination of Heaviside's function H(t) and its shift. That is, f(t) = cas(2t)(H(t)-H„(t)) up to the values t = 0, t = it, ... Since W = s2C{y) - sy(0) - y'(0) = s2C{y) + 1, we get, making use of the previous example 7.H.10 , the right-hand sides to the calculation of the Laplace transform s2C(y) + 1 + 4C(y) = £(cos(2i)(ff(i) - H„(t))) = £(cos(2/j) ■ H(t)) - £(cos(2/j) ■ H^(t)) = £(cos(2f)) -e_,rsi:(cos(2(/j + 7r)) s Hence, m = - = a-' s2 + 4 s2 +4 (s2 + 4)2 The inverse transform then yields the solution in the form s y(t) = -I sin(2í) + \t sin(2i) + C'1 e (s2+4)s 505 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING According to (1), ^(e^ Tyrh)*) = K-He-™£(*sin(2i))) = (i - tt) sin(2(i - tt)) ■ ^(i). Since the Heaviside function Hw(t) is zero for t < tt and equals 1 for t > tt, we get the solution in the form t=Z sin(2i) for 0 < t < tt (^=2 - tt) sin(2i) fori>7r. 506 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING Key to the exercises 7A.4. We have just to check the orthogonality of the couples Lm(x), Ln(x) with respect to the inner product {f,g)u = Jo°° /Wff W e~* d t. This can be done by integration by parts. 7A.5. The claim follows from the fact that the powers xk appear in the polynomials Tk or Lk the first time. Thus the linear hulls of the first k functions always coincide. 7A.6. x, —-pjx + sin(x); the projection does not change the function \ sin(x) since it already lies in the space. 7A.7. cos(x), ^ cos(x) + x. The projection is 37r/(7r4 — 24)(4cos(x) +itx). Notice that this is a very bad approximation. 7.B.I. We have already checked the orthogonality of the cosine terms in the solution to the example 7.A.3. The sine terms are obtained the same way since they are just shifts of cosines by it/2 in the argument. The mixed couples provide an odd function to be integrated on a symmetric interval around the origin and so the integral also vanishes. 7.B.2. Look at the previous example and use the substitution y = ujx. 7.B.7. Compute exactly as in the exercise 7.B.6 and check the result against the real versions of the same Fourier series, as computed before. The complex coefficients for the real functions are always related by the complex conjugation. That is, c-n = c^, see 7.1.7. OThB^n^checid6 7.B.8. This is again a straightforward computation. 7.C.4. t-f-+A forte [-2,-1] forte [-1,1] ij-2t + 2 forte [1,2] 0 otherwise. 7.C.14. It is a good exercise on derivation and integration by parts. We may differentiate with respect to t inside the integral and f(t—x) can be interpreted as — ^ f(t — x). 7.C.16. Another good exercise on derivation and integration by parts. We may differentiate twice with respect to t inside the integral and /'(t — x) can be interpreted either as a derivative with respect to t or x. 7.D.2. The definition of r(t) reveals C(ta)= r e-st ta At =e-xxaAx= r(Q + 1). V ^ Jo Jo 7.D.3. Integrate by parts to obtain C (g) (S) = Jte- e- dt = Jte^ dt = Inn (*^) - 0 - dt = - (inn ^ - ^) (S + l)2 0 0 V / q Differentiating the Laplace transform of a general function —/ (i. e., an improper integral) with respect to the parameter s gives / co \ ' co co /-/(t)e--dt =J-f(t) (e-*)'d* =/*/(*) e-rtdt. \o Jo 0 This means that the derivative of the Laplace transform C(—f) (s) is the Laplace transform of the function tf(t). The Laplace transform of the function y = sinht has already been determined as the function y = s21_1- Therefore, C (h) (s) We could also have determined C (g) (s) this way. 7.D.4. C(coscjt)(s) + i£(smcjt)(s) = C(e"^t)(s) s — IUJ o :Lllm ~^7t ) s — iw t^oo est s — iw (s — iw)(s + iui) S Id --h 1- S2 + Id2 S2 + Id2 507 CHAPTER 7. CONTINUOUS TOOLS FOR MODELLING 7.G.5. Recall the definition of the product metric of the Cartesian product X x X where the distance is given as the maximum of the distance of the components. The claim follows directly from the triangle inequality and the topological definition of the continuity. 7.H.I. \ - ±cos(2x). 7.H.2. \ + f cos(2x). 7.H.5. Determine the convolution of the functions fi and j'2, where fori 6 [-1,0] for t e [0,1] otherwise. 508 CHAPTER 8 Calculus with more variables one variable is not enough? - never mind, just recall vectors! A. Multivariate functions We start this chapter with a couple of easy examples to "grasp" a little multivariate functions. 8.A.I. Solve the system of inequalities. Mark the resulting area in the plane. x2 + y2 < 4 a) b) c) V > y < arctana; y < 1 ,2 + (y-l)2 >4 y + x2 - 2x > 0 y>0 At the beginning of our journey through the mathematical landscape, we saw that vectors can be manipulated nearly as easily as scalars. Now, we return to situations where the relations between objects are expressed with the help of more (yet still finitely many) parameters. This is really necessary when modeling processes or objects in practice, where functions R —> R of one variable are seldom adequate. At least, functions dependent on finitely many parameters are necessary, and the dependence of the change of the results on the parameters is often more important than the result itself. There is little need for brand new ideas. Many problems we encounter can be reduced to ones we can solve already. We return to the discussion of situations when the values of functions are described in terms of instantaneous changes. That is, we consider ordinary differential equations. In the next chapter, we consider partial differential equations and provide a gentle introduction to variational problems. 1. Functions and mappings on 1 8.1.1. Solution. Whenever you have to solve an inequality of the form f(x, y) > 0 (/ : R2 —> R is a function of two variables, In coordinates, the value is J2 The world of functions. In the sequel, the main objects are mappings between Euclidean spaces, f : Rn —> Rm. We have seen many such exam-M^t~_ pies already. The complex valued real functions correspond to n = 1, m = 2, while the power series converge inside of a circle in the complex plane, providing examples of / : R2 —> R2. We have also dealt with vector valued real functions, representing parametrized curves c : R —> Rn (see e.g. the paragraphs on curvatures and Frenet frames in 6.1.15 on page 387). In linear algebra and geometry, we saw the linear and affine maps / : Rn —> Rm denned with the help of matrices A e MatmiTl(R) and constant vectors y e Rm: Rn 3 x^y + Ax e Rm. In coordinates, the value is given by the expression J2j aijxj + Vi> where A = (a^) and y = (y,). Finally, the quadratic forms were mappings R™ —> R given by symmetric matrices S = (s^) and the formula Rn 3 x ^ xT Sx e R. CHAPTER 8. CALCULUS WITH MORE VARIABLES but the same method is valid for inequalities with more variables), you just consider the border curve f(x, y) = 0. This curve divides the plane into some areas. Then all points in any of the areas either satisfy the inequality or the whole area does not satisfy it. If we have a system of inequalities, we solve each inequality separately and then intersect the result. In our cases we get 8.A.2. Determine the domain of the function R2 a) xy b) c) d) y(x3 + x2 + x + 1)' ln(x2-y2), ln(-a:2-y2), arcsin(2xQ(a;)), where xq denotes the indicator function of the rational numbers, e) j(x,y,z) = \/\tix ■ arcsin(y2z). Solution, a) The formula correctly expresses a value iff the denominator of the fraction is non-zero. Therefore, the formula defines a function on the set R2\{([a;,0], [—l,y],x,y G R}. b) The formula is correct iff the argument of the logarithm is positive, i. e., \x\ > \y\. Therefore, the domain of this function is {(x, y) G R, \x\ > \y\}. You can see the graph of this function in the picture. In general, all such mappings / : I posed of m components of functions f start with this case. are com- EL So we 8.1.2. Multivariate functions. We can stress the depen-dance on the variables xi,...,xn by writing the functions as f(Xl,x2,...,xn) : Rn ->R. The goal is to extend methods for monitoring the values of functions and their changes for this situation. We speak about functions of more variables or, more compactly, multivariate functions. We often work with the cases n = 2 or n = 3 so that the concepts being introduced are easier to understand. In these cases, letters like x, y, z are used instead of numbered variables. This means that a function / defined in the "plane" R2 is denoted K2 3 (x,y)^f(x,y)eR, and, similarly, in the "space" R3 R3 3 (x,y,z) M- f(x,y,z) G R. lust as in the case of univariate functions, the domain A c R™ on which the function in question is defined needs to be considered. When examining a function given by a concrete formula, the first task is often to find the largest domain on which the formula makes sense. It is also useful to consider the graph of a multivariate function, i. e., the subset G/ cl"xl = Rn+1, defined by Gf = {(xii ■ ■ -,xn,f(xi,. ■ .,xn)); (x1,... ,xn) G A}, where A is the domain of /. For instance, the graph of the function defined in the plane by the formula ti \ x+v f{x,y) = —.—ä xz + yz is quite a smooth surface, see the illustration below. The maximal domain of this function consists of all the points of the plane except for the origin (0, 0). When defining the function, and especially when drawing its graph, fixed coordinates are used in the plane. Fixing the value of either of the coordinates, implies only one variable remains. Fixing the value of x, for example, gives the mapping R^R3, y^(x,y,f(x,y)), i.e., a curve in the space R3. Curves are vector functions of one variable, already worked with in chapter six (see 6.1.13). The images of the curves for some fixed values of the coordi nates x and y are depicted by lines in the illustration. add the "coordinate lines" in the picture 510 CHAPTER 8. CALCULUS WITH MORE VARIABLES c) This formula is again a composition of a logarithm and a polynomial of two variables. However, the polynomial — x2 — y2 takes on only non-positive real values, where the logarithm is undefined (as a function R —> R). d) This formula correctly defines a value iff the argument of the arc sine lies in the interval [—1,1], which is broken by exactly those pairs (= [x, y] e R2 whose first component is rational. The formula thus defines a function on the set {[x,y],xeR\Q}. e) The argument of the square root must be non-negative, that is either the image of the logarithm is positive and the image of arcsine as well, or both images are negative. Thus we get that the domain is the set {[x,y,z] eB3;(i>lAi/^0AO<4) yZ W(x e (0, l)Ay ^ OA-^r < z < 0)V(x > OAy = 0)}. yZ □ In the following examples k([x,y];r) means a circle with the center [x, y] and the radius r. 8.A.3. Determine the domain of the function / and mark the resulting area in the plane: i) f(x,y) = ^(x2+y2-l)(4- y Ü) f(x,y) = Vl-x2 + y/1 - y2, iü) f(x,v) = y/g±äEfr, iv) f(x,y) = arcsin f - , , + , V v) f(x,y) = V7!3" vi) f(x,y,z) = y/l- Solution, a) It has to hold (x2 + y2 -1 > 0,4 - x2 - y2 > 0) or (x2 + y2 - 1 < 0,4 - x2 - y2 < 0), that is (x2 + y2 > 8.1.3. Euclidean spaces. In the case of functions of one vari-j§t * ab^e> me entire differential and integral calcu-lus is based on the concepts of convergence, sC2g3~g open neighbourhoods, continuity, and so on. In the last part of chapter seven, these concepts were generalized for the metric spaces, rather than only for the Euclidean spaces R™. Before proceeding it is appropriate to revise these ideas, and do further reading if necessary. We present a brief summary: The Euclidean space En is perceived as a set of points in R™ without any choice of coordinates, and its modelling vector space R™ is considered to be the vector space of all increments that can be added to the points of the space En (the modelling vector space). Moreover, the standard scalar product n u ■ v = y^^Xjyj, i=l is selected on R™, where u = (x±,..., xn) and v = (y1,..., yn) are arbitrary vectors. This gives a metric on En, i.e. a function describing the distance Q\\ between pairs of points P, Q by the formula \p-QW' where u is the vector which yields the point P when added to the point Q. In the plane E2, for instance, the distance between the points Pi = (x1,y1) and P2 = (x2,y2) is given by ll^i-i^ II2 = (xi-x2)2 + (yi-yz)2. The metric denned in this manner satisfies the triangle inequality for every triple of points P,Q, R: ||P-fill = \\(P-Q) + (Q-R)\\ < ||P-Q|| + ||Q-P||. 511 CHAPTER 8. CALCULUS WITH MORE VARIABLES 1, x2 + y2 < 4) or (x2 + y2 < 1, x2 + y2 > 4), which is an annulus between the circles k([0,0]; 1) and k([0,0]; 2). b) It is a circle with the center [0,0] and verticies [±1,±1] c) The area between circles k([^, 0]; ^) and k([l, 0]; 1), the smaller circle belongs to the area, the bigger one does not. d) The area between the lines y = x and y = — x (without these lines). e) The ellipse (together with the inner space) with the center [0,0], with the major axis lying on the 2-axis with the major radius a = 1, and the minor axis on the y-axis with the minor radius b = \. f) The ellipsoid (with the inner space) with the center [0, 0,0] a semiaxes lying on the x, y, z axis respectively, with radii a ,b, and c. □ B. The topology of En In the previous chapter, we have denned general metric spaces and we have studies especially metric spaces consisting of the set of functions. As we have already seen in the previous chapter, many metrics can be defined on the space R™ (or on its subsets). For instance, considering a map of a state as a subset of R2, the distance of two points may be defined as the time necessary to get from one of the points to the other by public transport or on foot. In France, for example, the shortest paths between most pairs of points in this metric are far from line segments. In this chapter we will focus on the space En, that is R™ with the usual metric (distance) known to the mankind for a long time. The property, that the shortest path between any two points of this space is the line segment conecting them could be seen as the defining property (for example the above example does not satisfy it). Let us examine the space En in more detail. 8.B.I. Show that every non-empty proper subset of En has a boundary point (which need not lie in it). Solution. Let U C En be a non-empty subset with no boundary point. Consider a point X e U, a point Y e U' := En \ U, and the line segment XY c En. Intuitively, going from X to Y along this segment, we must once get from U to U', and this can happen only at a boundary point (everyone who has ever been to a foreign country is surely well acquainted with this fact). Formally, let A be the point of XY for which \XA\ = sup{\XZ\,XZ e U} (apparently, there is exactly one such point on the segment XY). This point See 3.4.3(1) in geometry, or the axioms of metrics in 7.3.1, or the same inequality 5.2.2(2) for complex scalars. The concepts defined for real and complex scalars and discussed for metric spaces in detail can be carried over (extended) with no problem for the points Pi of any Euclidean space: Topology and metric in Euclidean spaces (1) a Cauchy sequence: a sequence of points Pi such that for every fixed e > 0, \\Pi — Pj\\ < e holds for all indices but for finitely many exceptional points Pk\ (2) a convergent sequence: a sequence of points Pi converges to a point P if nad only if for every fixed e > 0, ||Pi — P|| < e holds for all but finitely many indices i; the point P is then called the limit of the sequence Pi; (3) a limit point P of a set A c En: there exists a sequence of points in A converging to P and different from P; (4) a closed set: contains all of its limit points; (5) an open set: its complement is closed; (6) an open S-neighbourhood of a point P: the set Os(P) = {Qe En; ||P - Q\\ < 5}, S e R, S > 0; (7) a boundary point P of a set A: every ^-neighbourhood of P has non-empty intersection with both A and the complement En \ A; (8) an interior point P of a set A: there exists a ^-neighbourhood of P which lies inside A; (9) a bounded set: lies inside some ^-neighbourhood of one of its points (for a sufficiently large 5); (10) a compact set: both closed and bounded. (11) limit of a mapping: a e Rm is the limit of function / : R" —> Rm in a limit point x0 of its domain A, if for each e > 0, there is a ^-neighbourhood U of x0, such that ||/(a:) — a\\ < e for all x e U; this happens if and only if for each sequence xn e A converging to x0, the values f(xn) converge to a. (12) continuity: mapping / : A c R™ —> Rm is continuous in x0 e A if the limit limx^Xo f(x) exists and equals to / (x0); the mapping / is continuous on A, if it is continuous in all points in A. Both the first and second items deal with norms of dif-. ... . . hodil by se obrazek na ferences of points approaching zero. Since the square of the iiustracipojmu,napr. norm is the sum of squares of the individual compoments, it is clear that this happens if and only if the individual components approach zero. In particular, the sequences of points Pi are Cauchy or convergent if and only if these properties are possessed by the real sequences obtained from the particular coordinates of the points Pi in every Cartesian coordinate system. Therefore, it also follows from Lemma 5.2.3 that every Cauchy sequence of points in En is convergent. Especially, En is a complete metric space. Similarly, the mappings from the item (11) are m-tuples of the compoment functions and the limits are given as the m-tuples of limits of these components. Recall some further results already discussed at the more general level of the metric spaces in chapter seven: 512 CHAPTER 8. CALCULUS WITH MORE VARIABLES is a boundary point of U: it follows from the definition of A that any line segment XB (with B e XA) is contained in U; in particular, B e U. However, if there were a neighborhood of A contained in U, then there would exist a part of the line segment XY longer than XA which would be contained in U, which contradicts the definition of the point A. Therefore, any neighborhood of the point A contains a point from U as well as a point from En\U. □ 8.B.2. Prove that the only non-empty clopen (both closed and open) subset of En is En itself. Solution. It follows from the above exercise 8.B.1 that every non-empty proper subset U of En has a boundary point. If U is closed, then it is equal to its closure; therefore, it contains all of its boundary points. However, an open set (by definition) cannot contain a boundary point. □ 8.B.3. Show that the space En cannot be written as the union of (at least two) disjoint non-empty open sets. Solution. Suppose that En can be expressed thus, i. &.,En = Ujg/C/j, where I is an index set. Let us fix a set U from this union. Then, we can write En = U U U, where both U and U (being a union of open sets) are open. However, they are also complements of open sets; therefore, they are closed as well. This contradicts the result of the previous exercise 8.B.2. □ 8.B.4. Prove or disprove: a union of (even infinitely many) closed subsets of E™ is a closed subset of E™. Solution. The proposition does not hold. As a counterexample, consider the union u j=3 1 " 1 of closed subsets of R, which is equal to the open interval (0,1). □ 8.B.5. Prove or disprove: an intersection of (even infinitely many) open subsets of E/1 is an open subset of E/1. Solution. The proposition does not hold in general. As a counterexample, consider the intersection of open subsets of R, which is equal to the closed singleton {!}■ □ A mapping is continuous if and only if its preimages of open sets are open (check this carefully!). Further, each continuous function on a compact set A is uniformly continuous, bounded and attains its maximum and minimum, cf. the paragraph 7.3.14 on the page 498. The reader should make an appropriate effort to read the paragraphs 3.4.3,5.2.5-5.2.8,7.3.3-7.3.5, and 7.3.12 as well as try to think out/recall the definitions and connections of all these concepts. 8.1.4. Compact sets. Working with general open, closed, or compact sets could seem useless in the case of the real line Ei since intervals are almost always used. In the case of metric spaces in the last part of chapter seven, the ideas are complicated at first sight. However, the same approach is easy in the case of Euclidean spaces R™. It is also very useful and important (and it is, of course, a special case of general metric spaces). Just as in the case of E\ or E2, we deal with the open covers of sets (i.e., systems of open sets containing the given sets), and Theorem 5.2.8 is again true (with mere reformulations): Theorem. Subsets A C En of Euclidean spaces satisfy: (1) A is open if and only if it is a union of a countable (or finite) system of S-neighbourhoods, (2) every point a G A is either interior or boundary, (3) every boundary point of A is either an isolated or a limit point of A, (4) A is compact if and only if every infinite sequence contained in it has a subsequence converging to a point in A, (5) A is compact if and only if each of its open covers contains a finite subcover. Proof. The proof from 5.2.8 can be reused without changes in the case of claims (l)-(3), yet now the concepts have to be perceived in a different way, and the "open intervals" are substituted with multidimensional ^-neighbourhoods of appropriate points. However, the proof of the fourth and fifth claims has to be adjusted properly. Actually, we proved the claims there for R and C, thus in dimensions one and two. Thus the reader may either extend the two-dimensional reasoning, or to rewrite the proof of the corresponding propositions for general metric spaces in 7.3.12, while noticing the parts which can be simplified for Euclidean spaces. □ 8.1.5. Curves in En. Almost all the discussion about limits, derivatives, and integrals of functions in chapters 5 and 6 concerned functions of a real variable and real or complex values since only the triangle inequality valid for the magnitudes of the real and complex numbers is used. This argument can be carried over to any function of a real variable with values in a Euclidean 513 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.B.6. Consider the graph of a continuous function / : R2 —> R as a subset of £3. Determine whether this subset is open, closed, and compact, respectively. Solution. The subset is not open since any neighborhood of a point [x0, y0, f(x0, y0)] contains a segment of the line x = x0, y = y0. However, there is a unique point of the graph of the function on this segment, and that is the point [xo,yo,f(x0,y0)]. The continuity of / implies that the subset is closed -we will show that every convergent sequence of points of the graph of / converges to a point which also lies in the graph: If such a sequence is convergent in E3, then it must converge in every component, so the sequence {[i„, yn] }^=1 is convergent in R2. Let us denote this limit by [a,b]. Then, it follows from the definition of continuity that its function values at the points [xn, yn] must converge to the value f(a, b). However, this means that the sequence {[xn,yn, f (xn,yn)]}^=1 converges to the point [a, b, f(a, &)], which belongs to the graph of the function /. Therefore, the graph is a closed set. The subset is closed, yet it is not compact since it is not bounded (its orthogonal projection onto the coordinate plane xy is the whole R2. (A subset of En is compact iff it is both closed and bounded.) □ And now let us study the limits of functions (a limit is denned thanks to the topology of En, see 8.1.3) space R™. Several tools for the work with curves are introduced in paragraphs 6.1.13-6.1.16. For every (parametrized) curve1, that is, a mapping c = (ci(t),..., cn(t)) : R —> R™ in an n-dimensional space, the concepts simply extend the ideas from the univariate functions with some extra thoughts: First note that both the limit and the derivative of curves make sense in an affine space even without selecting the coordinates (where the limit is again a point in the original space, while the derivative is a vector in the modeling vector space!). In the case of an integral, curves are considered in the vector space R™. The reason for this can be seen even in the case of one dimension, where the origin is needed to be able to see the "area under the graph of a function". It is apparent that limits, derivatives, and integrals have to be considered via the n individual coordinate components in R™. In particular, their existence is determined in the same way: Basic concepts for curves (1) limit: lim c(t) = (lim ci(/j), . .., lim cn(t)) G R™ t—Ytn t—Ytn t—Ytn (2) derivative: c'(io) = nm 1 {c{t) - c(t0)) = (c'1(f0),---,c;(io))GRn (3) integral: fb / pb rb c(t)dt=[ / ci(i)di,/ cn(t) dt I G R™. C. Limits and continuity of multivariate functions If we approach limits of multivariate functins, there is one fact we have to deal with: Let us emphasize that there is no analogy of L'Hospital 2 or 2°, 0 00' rule for multivariate functions. Counting limits 2 or —, we have to be "clever". In one dimension we can approach a point either from right or left (and the limit in the point exists, if both one-sided limits exist and are equal to each other). In more dimensions we can approach a point from infinitely many directions, and the limit in the point exists, iff limits of the function narrowed to any path leading to the point exist and must be equal to each other. The easest way to obtain a limit is (as with the functions of one variable) to plug in the given point to the function prescription and if we get a meaningful expresion, we are done. Otherwise we can get "undeterminate" expression. There are some tricks, we can use to count such a limit: We can directly formulate the analogy of the connection between the Riemann integral and the antiderivative for curves in R™ (see 6.2.9): Proposition. Let cbe a curve in Rn, continuous on an interval [a, b}. Then its Riemann integral J c(t)dt exists. Moreover, the curve C{t) c(s)ds G is well-defined, differentiable, andC'(t) = c(t) for all values t G [a,b]. It is not simple to extend the mean value theorem and, in general, the Taylor's expansion with remainder, see 5.3.9 and 6.1.3. They can be applied in a selected coordinate system to the particular coordinate functions of a differentiable function c(t) = (ci(t),..., cn(t)) on a finite interval [a, b]. In the case of the mean value theorem, for instance, there are numbers t{ such that Ci(b) - Ci(a) = (b - a) • c'^U), i = l,...,n. 'in geometry, one often makes a distinction between a curve as a subset of En and its parametrization M —> M". The word "curve", means exclusively the parametrized curve here. 514 CHAPTER 8. CALCULUS WITH MORE VARIABLES (1) factorize the numerator or the denominator acording to some known formula and then reduce, (2) expand the numerator and the denominator with an appropriate term and then shorten, (3) bomdedgPression = 0,0 • (bounded expression) = 0; (4) use an appropriate substitution to get a limit of a function of one variable (5) try polar coordinates x = r cos ip, y = r sin p (it usually works with the expression x2 + y2, we have x2+y2 = r2 cos2 p+r2 sin2 p = r2(cos2 (,2+sin2 p) = r2, which is independent of p); (6) try y = kx or y = kx2 or generally x = j(k) ay = g(k) (to prove the non-existence of the limit: if the limit after the substitution depends on k, the original limit does not exists) 8.C.I. lim^^^^) ^ O 8.C.2. \mHx^(4A) O 8.C.3. lini(Xify)^(ii00) O 8.C.4. lim(Xify)^(o,2) S~IXZ± O 8.C.5. lini(Xij/)_j.(00i00) fi^4 O 8.C.6. lim(Xify)^(0i0) ^±s! O 8.C.7. lim(Xify)^(0i0) O 8.C.8. Hm^^^iJ^r2 O 8.C.9. lim(a!,I/)^(1,1) -jgL- O 8.C.10. lim(a!iI/)_(o,o) O 8.C.11. \im{XtV)^{0fi) xy2 cos O 8.C.12. lim(x^m) ^ O 8.C.13. lim(a!ij/)_(o,o) f^j£ O 8.C.14. lim(XiJ/)^(00i00) (a2 + y2)e-(-+«) O «.C75. lim(Xij/)^(00il)(l + i)^7 O 8.C.16. lim(a.i3/)^.(0i0) O S.C./7. lim(a!ij/)_(o,o) ^gffiff O 8.C.18. Prove that lim^^-^o.o) ^2-^ does not exists. Q These numbers f, are distinct in general, so the difference vector of the boundary points c(b) — c(a) cannot be expressed as a multiple of the derivative of the curve at a single point, in general. For example, for a differentiable curve c(t) in the plane E2,c(t) = (x(t),y(t)) c(b)-c(a) = (xXZ)(b-a),y'(r,)(b-a)) for two (in general different) values £,77 e [a,b]. However, this reasoning is still sufficient for the following estimate: Lemma. If c is a curve in En with continuous derivative on a compact interval [a, b], then for all a < s < t < b \\c(t) - c(s)\\ < v^(maxr6[ait] ||c'(r)ll) \t - s\. Proof. Direct application of the mean value theorem gives for appropriate points r{ inside the interval [s, t] the following: n n Ht) - c(s)\\2 = £(c2(i) - c2(S))2 < ^>Ur2)(i - s))2 i=l i=l n < (f - s)2EmaX,-6[s.*] C^r)2 i=l < 7i(maxI.G[Sit]i i=iv..iTl\c[(r)\)2(t - s)2 < nmaxrg[Sif] ||c'(r)||2(t - s)2. Another important concept is the tangent vector to a curve c : R —> En at a point c(t0) e En. It is defined as the vector in the modelling vector space R™ given by the derivative c'(t0) e R™. Consider c to be the path of an object moving in the space in time. Then the tangent vector at a point to can be perceived physically as the instantaneous velocity at this point. majbe ^ aP»ct The straight line T given parametrically as T: c(to) + (t-to)-c'(t0) is called the tangent line to the curve c at the point to. Unlike the tangent vector, the (unparametrized) tangent line T is independent of the parametrization of the curve c. The chain rule ensures that changing the parametrization leads to the same tangent vector, up to multiple. 8.1.6. Partial derivatives. If we look at the multivariate function f(x1,..., xn) : R™ —> R as at the - function of one real variable xt while the other a5l8pi? variables are assumed constant, we can consider the derivative of this function. This is called the partial derivative of the function f with respect to Xi, and it is denoted as J^, i = 1,..., n, or (without referring to the particular function) as the operator g|- on the functions. More generally, for every function / : R™ —> R and an arbitrary curve c : R —> R™, their composition (/ o c)(t) : R —> R can be considered. This composite function F = foe expresses the behaviour of the function / along the curve 515 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.C.19. Prove that lim 2x + xy — y — 2 (x,yyi~(i,-2) x2 + y2 — 2x + Ay + 5' does not exist. O Solution. The lines through [1, —2]) have the equation y = kx — k — 2. As we aproach [1, —2] along one of these lines, we get the limit j-^j , which is different for differnt k, thus the limit does not exists. □ Let us recall, that a funtion is continuous in points, where the limit exists and is equal to to function value. 8.C.20. Find the discontinuity points of f(x, y) = xlx+~2-i- 8.C.21. Find the discontinuity points of f(x,y) = sin(x2y+xy2) p. 8.C.22. Find the discontinuity points of f(x,y) x +y x2-\-y2 0 pro[a;,y] ^ [0,0], pro [x,y] = [0,0]. O D. Tangent lines, tangent planes, graphs of multivariate functions 8.D.I. A car is moving at velocity given by the vector (0,1,1). At the initial time t = 0, it is situated at the point [1, 0,0]. The acceleration of the car in time t is given by the vector (— cosí, — siní, 0). Describe the dependency of the position of the car upon the time t. Solution. As we have already discussed in paragraph 8.1.5, we got acquainted with the means of solving this type of problem as early as in chapter 6. Notice that the "integral curve" C(t) from the theorem of paragraph 8.1.5 starts at the point (0,0,0) (in other words, C(0) = (0,0, 0)). In the affine space R™, we can move it so that it starts at an arbitrary point, and this does not change its derivative (this is performed by adding a constant to every component in the parametric equation of the curve). Therefore, up to the movement, this integral curve is determined uniquely (nothing else than constants can be added to the components without changing the derivative). When we integrate the curve of acceleration, we get the curve of velocity (— sin t, cos t — 1,0). Considering the initial velocity as well, we obtain the velocity curve of the car: (— sin t, cos t, 1) (we shifted the curve of the vector (0,1,1), i. e., so that now the velocity curve at time t = 0 agrees with the given initial velocity). Further integration leads to the curve c. The simplest case is using parametrized straight lines c and choosing the lines ci(t) = (x1,..., x{ + t,..., xn), the derivative of /oc, yields just the partial derivatives 4J- _ More generally, derivatives can be defined in any direction: directional and partial derivatives Definition. / : R™ —> R has derivative in the direction of a vector neR"ata point x G En if and only if the derivative dvf(x) of the composite mapping t M- f(x + tv) exists at the point t = 0, i.e. dvf(x) = Ytmj(f(x + tv)-f(x)). O The partial derivatives are the values 4/- = dBz f where e dx are the elements of the standard basis of R™. In other words, the directional derivative expresses the infinitesimal increment of the function / in the direction v. For functions in the plane, d 1 ^/(^y) = lim- (f (x + t,y) -f(x,y)) d 1 ^/(i, y) = bm -(/(x, 1/ + f) - / (x, y)). Especially, the partial differentiation with respect to a given variable is just the casual one-variable differentiation while considering the other variables to be constants. 8.1.7. The differential of a function. Partial or directional derivatives are not always good enough to obtain a fair approximation of the behaviour of a function by linear expressions. There are three concerns for a function / : R™ —> R there. First, the directional derivatives at a point may not exist in all directions, although the partial derivatives are well defined. Second, the dependence of the directional derivatives dvf(x) on the direction v need not be linear. Third, even if dvf(x) is a linear mapping in the argument v, the function still may be not 'well behaved' around the point x. As an example, consider the functions in the plane with coordinates (x, y) given by the formulae g(x,y) = h(x,y) k(x,y) 1 if xy = 0 0 otherwise ify = 0 y if x = 0 0 otherwise x if y = x2 =jí 0 0 otherwise. Both partial derivatives of g at (0,0) exist and no other directional derivatives do, and g is even not continous at the origin. The functions h and k are continuous at (0,0) and h has all its directional derivatives at the origin equal zero, 516 CHAPTER 8. CALCULUS WITH MORE VARIABLES (cosi—1, siní, ť). Shining this of the vector (1,0,0) then fits with the initial position of the car. Therefore, the car moves along the curve [cos t, sin t, t] (this curve is called a helix). □ 8.D.2. Determine both the parametric and implicit equations of the tangent line to the curve c : R —> R3, c(t) = (ci(í),c2(í),c3(í)) = (t,t2,t3) at the point which corresponds to the parameter's value t = 1. Solution. The value t = 1 corresponds to the point c(l) = [1,1,1]. The derivatives of the particular components are c[(t) = 1, c'2(t) = 2t, c3(t) = 3i2. The values of the derivatives at the point t = 1 are 1, 2, 3. Therefore, the parametric equations of the tangent line are: x = ci(l)s + ci(l) =i + l, y = c2(l)s + c2(l) = 2t + l, z = c'3(l)s + c3(l) = 3i + 1. In order to get the implicit equations (which are not given canonically), we eliminate the parameter t, thereby obtaining: 2x-y = l, 3x- z = 2. □ 8.D.3. Determine the tangent line p to the curve c(t) = (\nt, aidant, esin^) at the point t0 = 1. O 8.D.4. Find a point on the curve c(t) = (t2 - 1, -2i2 + 5í, í — 5) such that the tangent line passing through it is par-alell to the plane g: 3x + y — 2 + 7 = 0. Solution. The direction c'(t0) of the curve c(t) in io has to be perpendicular to the normal of g, that is the scalar multiple of these two vectors is 0. The tangent vector in the point c(t) is (2t, —At + 5,1), the normal vector of the plane p is (3,1, —1) (just read off the coefficients by x, y and z in the equation of p. That is 3 ■ 2t + 1 ■ (-At + 5) + 1 ■ 1 = 0, which gives [3,-18,-7]. □ 8.D.5. Find the parametric equation of the tangent line of the curve given as the intersection of surfaces x2+y2+z2 = A and x2 + y2 — 2x = 0 in the point [1,1, \/2]. Solution. p= {[1 - V2s,l,V2 +s};s e R}. □ except for the partial derivatives, which are equal to 1. In particular, dvh(0,0) is not a linear mapping in the argument v. More generally, consider a function / which, along the lines (r cos 6, r sin 6) with a fixed angle 6, takes the values a(6)r, where a(6) is a periodic odd function of the angle 6, with period 27r. All of its directional derivatives dvf at (0,0) exist, yet these are not linear expressions depending on the directions v for general functions a (6). The graph of / can be visualized as a "deformed cone" and we can hardly hope for a good linear approximation at its vertex. Finally, k has all directional derivatives zero, i.e. dvh(0) = 0 for all directions v, which is a linear dependence on ii G R2. But still, the zero mapping is a very bad approximation of k along the parabolla y = x2. Check all these claims in detail yourselves! Therefore, we imitate the case of univariate functions as thoroughly as possible, and avoid such a pathological behaviour of functions directly by defining and using the concept of differential: Differential of a function Definition. A function / : R™ -+ R has got the differential at a point x if and only if all of the following three conditions hold: (1) the directional derivatives dvf(x) at the point x exist for all vectors »6l", (2) dvf(x) is linearly dependent on the argument v, (3) lim^o (f(x + v) - f(x) - dvf(x)) = 0. The linear expression dvf (in a vector variable v) is then called the differential df of the function f evaluated at the increase v. In words, it is required that the behaviour of the function / at the point x is well approximated by linear functions of increments of the variable quantities. It follows directly from the definition of directional derivatives that the differential can be defined solely by the property (3). If there is a linear form df(x) such that the increments v at the point x satisfy the property (3) with dvf(x) = df(x)(v), then df(x)(v) is apparently just the directional derivative of the function / at the point x, so the properties (1) and (2) are automatically satisfied. 8.1.8. Examine what can be said about the differential of jjV " a function f(x, y) in the plane, supposing both par-1fHf>\ derivatives |^ exist and are continuous in a H£fT neighbourhood of a point (x0, y0). To this purpose, 1>v consider any smooth curve t i-> (x(t),y(t)) with x0 = x(0),yo = y(0). The idea is to use the mean value theorem for univariate functions for differences of function values, where only one ofthe variables changes: f(x,y)-f(x0,y) = ^(x1,y)(x- x0) for suitable x1 between x0 and x. 517 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.D.6. The set of differentiable functions. We can notice iy^X tnat multivariate polynomials are differentiable on the whole of their domain. Similarly, the composi-*§* tion of a differentiable univariate function and a differentiable multivariate function leads to a differentiable multivariate function. For instance, the function sin(a; + y) is differentiable on the whole R2; ln(a; + y) is a differentiable function on the set of points with x > y (an open half-plane, i. e., without the boundary line). The proofs of these propositions are left as an exercise on limit compositions. Remark. Notation of partial derivatives. The partial derivative of a function / : R™ —> R in variables x1,..., xn with respect to the variable x\ will be denoted by both J^- and the shorter expression fXl. In the exercise part of the book, we will rather keep to the latter notation. On the other hand, the notation dX\ better catches the fact that this is a derivative of a f in the direction of the vector field g^- (you will learn what a vector field is in paragraph 9.1.1). 8.D.7. Determine the domain of the function / : R2 —> R, f(x,y) = x2^fy. Calculate the partial derivatives where they are denned on this domain. Solution. The domain of the function in question in R2 is the half-plane {(x,y),y > 0}. In order to determine the partial derivative with respect to a given variable, we consider the other variables to be constants in the formula that defines the function. Then, we simply differentiate the expression as a univariate function. We thus get: fx = 2xy afy = \_xf_ The partial derivatives exist at all points of the domain except for the boundary line y = 0. □ 8.D.8. Determine the derivative of the function / : R3 —> R, f(x, y, z) = x2yz at the point [1, —1,2] in the direction v = (3,2,-1). Solution. The directional derivative can be calculated in two ways. The first one is to derive it directly from the definition (see paragraph 8.1.6). The second one is to use the differential of the function; see 8.1.7 and theorem 8.1.8. Since the given function is a polynomial, it is differentiable on the whole R3. Apply this in both summands of the following expression separately, to obtain j(f(x(t),y(t))-f(x0,y0)) = \ (f(x(t),y(t))-f(x0, y(t))) + ±(f(x0,y(t))-f(x0,yo)) df df = l(x(t)-xoy^(x(0,y(t))^(y(t)-yoy^(x0,y(V)) for suitable numbers £ and rj between 0 and t. Indeed, by exploiting that the curve (x(t),y(t)) is differentiable, there must be such values £ and rj. Especially, for every sequence of numbers tn converging to zero, the corresponding sequences of numbers and r]n also converge to zero (by the squeeze theorem for three limits) and they all satisfy the above equality. If t converges to 0, the continuity of the partial derivatives, together with the test for convergence of functions using subsequences of the input values (cf. 5.2.15), as well as the properties of the limits of sums and products of functions (cf. Theorem 5.2.13) imply ftf(x(t),y(t))lt=o=x'(0)^(xo,yo)+y'(0)^(xo,yo), which is a pleasant extension of the theorem on differentiation of composite functions of one variable for the case foe. Of course, with the special choice of parametrized straight lines with direction vector v = (£, rf), (x(t),y(t)) = (x0 + t£, y0 + trj), the calculation leads to the derivative in the direction v = (£, rf) and the equality dvf(x0,yo) = ^-(xo,yo)£ + ^(x0,yo)v- This formula can be expressed in a neat way to describe coordinate expressions of linear functions on vector spaces: df df df = —dx + -rrdy, ox ay where dx stands for the differential of the function (x, y) i-> x, i.e. dx(v) = £, and similarly for dy. In other words, the directional derivative dvf is a linear function R™ —> R on the increments, with coordinates given by the partial derivatives. Now we could similarly prove that the assumption of continuous partial derivatives at a given point guarantees the approximation property of the differential as well. In particular, note that the computation for / oc above excluded phenomena like the function k(x, y) above (there dvk(0,0) = 0, but the derivative along the curve (t, t2) was one). We shall better do this for the general multivariate functions straightaway. 8.1.9. The following theorem provides a crucial and very useful observation. 518 CHAPTER 8. CALCULUS WITH MORE VARIABLES Let us follow the definition: fv(x,y,z) = limj[f(x + 3t,y + 2t,z-t) - f(x,y, z)} = lim j[(x + 3t)2(y + 2t)(z - t) - x2yz] = \imj[t(6xyz + 2x2z - x2y) + t2{.. .)] = Qxyz + 2x2z — x2y. We have thus derived the derivative in the direction of the vector (3, 2, —1) as a function of three real variables which determine the point at which we are interested in the value of the derivative. Evaluating this for the desired point thus leads to/„(l,-l,2) = -7. In order to compute the directional derivative from the differential of the function, we first have to determine the partial derivatives of the function: fx = 2xyz, fy = x2z, fz = x2y. It follows from the note beyond theorem 8.1.8 that we can express /„(1, -1, 2) = 3/41, -1, 2) + 2/„(l, -1, 2)+ + (-l)/z(l,-l,2) = = 3-(-4)+2-2 + (-l)-(-l) = -7. □ 8.D.9. Determine the derivative of the function / : R3 -s-R, f(x,y,z) = at the point [0,0,2] in the direction of the vector (1,2,3). Solution. The domain of this function is R3 except for the plane 2 = 0. The following calculations will be considered only on this domain. The function in question is differentiable at the point [0,0, 2] (this follows from the note 8.D.6). We can determine the value of the examined directional derivative by 8.1.7, using partial derivatives. First, we determine the partial derivatives of the given function (as we have already mentioned in exercise 8.D.7, in order to determine the partial derivative with respect to x, we differentiate it as a univariate function (in x) and use the chain rule; similarly for other partial derivatives): _ 2xy sin(x2y) Jx — , Jy i(x2y) fz = - s(x2y) Evaluating the expression gives fx(0,0, 2) +2 -/„(0,0,2) + 3- /z(0,0,2) = 1- 0 + 2- 0 + 3- (-i) = -f. Continuity of partial derivatives Theorem. Let f : En ■ "- be a function of n variables with continuous partial derivatives in a neighbourhood of the point x G En. Then its differential df at the point x exists and its coordinate expression is given by the formula df = -r—dx1 + -—dx2 H-----h -r—dxn. ax i ax2 oxn □ Proof. This theorem can be derived analogously to the procedure described above, for the case n = 2. % Care is needed in details to finish the reasoning about the approximation property. As above, consider a curve c(t) = (Cl(t),...,cn(t)), c(0) = (0,0) and a point x e R™, and express the difference f(x + c(f)) — f(x) for the composite function f(c(f)) as follows: f{xi + ci(i), ...,xn + cn(t)) - f(x1,x2 + c2(t),...) + f(xi,x2 + c2(t),...)) - f(x1,x2, ...,xn + cn(t)) + f(xi,x2, ...,xn + cn(t)) - f(x1,x2, ■ ■ -,xn). Now, apply the mean value theorem to all of the n summands, obtaining (similarly to the case of two variables) df (ci(f) - ci(0))^-(a;i + ci(0i), ...,xn + cn{t)) df + (c2(f) - c2(0)) — {Xi,X2 + c2(02), ...,xn + cn{t)) df + (Cn(t) - Cn(0))-^-(x1,X2, ...,Xn + Cn(6n)), for appropriate values 6i, 0 < 6i < t. This is a finite sum, so the same reasoning as in the case of two variables verifies that ^/(, + C(i))|t.0 = C'1(0)^(a:) + -. + C;(0)|-(a:). The special choice of the curves c(f) = x+tv for a directional vector v verifies the statement about existence and linearity of the directional derivatives at x. Finally, apply the mean value theorem in the same way to the difference f(x + v)- f(x) = dvf(x + Ov) df df = wi (x + 6v) H-----\-vn (x + 6v) with an appropriate 0, 0 < 0 < 1, where the latter equality holds according to the formula for directional derivatives derived above, for sufficiently small v's. Since all the partial derivatives are continuous at the point x, for an arbitrarily small e > 0, there is a 519 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.D.10. Having a function / : Rn -+ R with differential df(x) and a point x e R™, determine a unit direction »£l„ in which the directional derivative dv(x) is maximal. Solution. According to the note beyond theorem 8.1.5, we are maximizing the function fv(x) = v1 fXl (x) + v2 fX2 (x) + ■ ■ ■ + Vnfx„(x) in dependence on the variables v±,... ,vn which are bound by the condition v\ + ■ ■ ■ + vn = 1. We have already solved this type of problem in chapter 3, when we talked about linear optimization (viz 3.A.1). The value fv (x) can be interpreted as the scalar product of the vectors (fx!, ■ ■ ■, fxn) and (v1,... ,vn). And this product is maximal if the vectors are of the same direction. The vector v can thus be obtained by normalizing the vector (fXl,..., jXn). In general, we say the the function grows maximally in the direction (fx!, ■ ■ ■, fxn) ■ Then, this vector is called the gradient of the function /. In paragraph 8.1.25, we will recall this idea and go into further details. □ Counnting the differential of a function is technicly very easy, just plug into the definition. 8.D.11. Find the differential of a function / in a point P: i) /(a:,y) = arctan^,P=[v/3,l] ii) f(x,y) = arcsin ^x2+y2, P = [1, V^]. iii) f(x,y)=xy+z,P= [1,1]. Solution. i) df(V3,l) = \dx + \dy, ii) df(l,V3) = ^dx-\dy, iii) df(l, 1) = 2dx. □ Let us realize, that differential of a function is a linear map: 8.D.12. Count the differential of the function f(x,y, z) = 2X sin y arctan z in the point [—4,^,0] evaluated on dx = 0.05, dy = 0.06, and dz = 0.08. Solution. d/(-4, f, 0) = Odx + Ody + ±dz = 0.005. □ The differential thus can be used to approximate the values of a function. 8.D.13. Approximate ^2.982 +4.052 with the use of differential (and not with a calculator). Solution. We use the differential of the function f(x,y) = in [3,4]. Then f'x = (^-neighbourhood U of the origin in R™ such that for w e [/, all partial derivatives J^- (x + w) differ from J^- (x) by less than e. Hence the estimate jj^jj-H/vZ + W) - f(X) - dwf(x)\\ < < -^\\f(x + w)-f(x)-dwf(x + 6w)\\ + -^r\\dwf(x + 9w) - dwf(x)\\ = Rl|wi(^(a; + M_^(a;)) + ---+ wn(^-(x + ew)-^-(x))\\ ydxn dxn "11 n IfII where 6 is the parameter for which the expression on the second line vanishes. Thus, the approximation property of the differential is satisfied as well. □ The approximation property of the differential can be written as f(x + v) = f(x) + df(x){y) + a(v), where the function a(v) satisfies lim^o = 0, i.e. a(v) = o(||d||) in the asymptotic terminology introduced in 6.1.16 on the page 389. 8.1.10. A plane tangent to the graph of a function. The linear approximation of the function behaviour by its differential can be expressed in terms of its fjsJZJ graph, similarly to the case of univariate functions. We work with hyperplanes instead of tangent lines. In the case of a function on E2 and a fixed point (x0, y0) G E2, consider the plane in E3 given by the equation on the three coordinates (x, y, z): z = f(x0,y0) + df(x0,y0)(x -x0,y - y0) df df = f(xo,yo) + r-^(x0,y0)(x-x0) + —(x0,y0)(y-y0). It is already seen that the increase of the function values of a differentiable function / : En —> R at points x + tv and x is always expressed in terms of the directional derivative dvf at a suitable point between them. Therefore, this is the only plane containing the point (x0,y0) with the property that all derivatives, and so the tangent lines of all curves c(t) = (x(t),y(t)J(x(t),y(t))) lie in this plane, too. It is called the tangent plane to the graph of the function /. Two tangent planes to the graph of the function f(x,y) = shi(x) cos(y) are shown in the illustration. The diagonal line is the image of the curve c(t) = (t, t, f(t, t)). 520 CHAPTER 8. CALCULUS WITH MORE VARIABLES V = Jv Vx2+y'' =, thus \/2.982 + 4.052 = /(32,42) + d/(2.98 - 3,4.05 - 4) = \/32 + 42 + = 5_0I06 + V32 + 42 M 5 (-0.02) + V32 + 42 (0.05) 5,028. □ 8.D.14. With the help of a differential calculate i) arctani^l, y 0.95' ii) ln(0,972 + 0,052), iii) arcsin iv) 1,042-02'. o 8.D.15. What is approximately the change (in cm3) in the volume of the cone with a base radius r = 10 cm and height h = 10 cm, if we increase the radius by 5 mm and we decrease the height by 5 mm? Solution. The volume is (as a fuction of the radius r and a height h) V(r,h) = ^nr2h. The change is approximately given by the diffefential of V in [10,10] evaluated on dr = 10.5 - 10 = 0.5 and dh = 9.5 - 10 = -0.5. We get fircm3. □ 8.D.16. Find the tangent plane to the graph of a function / : R2 -> R in a point P = [x0,y0, f(x0,y0): i) f(x,y) = \J\-x2 - y2, P = [x0,y0,z0] = \J- J_ ?l ii) f{x,y) = e*2+y\P= [x0,y0,z0] = [0,0,?], iii) f(x, y)=x2 + xy + 2y2, P = [x0, y0, z0] = [1,1,?], iv) f(x, y) = arctan §, P = [x0, y0, z0] = [1, -1, ?]. Solution. i) f(x0,y0] zo = f = -Jy l l vT l 3 3 Further j'x = — thus Vi-x2-y2 = l/y/3 = l/s/3 ^-Ä—.thus/^o.yo) = -7^75 = -1. _ l/y/3 _ l/y/3 —1. The equation of a tangent planem [^, 73=, 7j]is ^^"^"^"^"^ ii) z0 = 1, z = 1, iii) 2o = 4, 3a; + 5y — z = 4, iv) z0 = -z,x + y- 2z = f. or x + y + z = y/3, For the case of functions of n variables, the tangent plane is denned as an analogy to the tangent plane to a surface in the three-dimensional space. Instead of being overwhelmed by many indices, it is useful to recall affine geometry, where hyperplanes can be used, see paragraph 4.1.3. Tangent (hyper)planes Definition. A tangent hyperplane to the graph of a function / : R™ —> R at a point x e R™ is the hyperplane containing the point (x,j(x)) with the modelling vector space which is the graph of the linear mapping df(x) : R™ —> R, i.e. the differential at the point x e En. The definition takes advantage of the fact that the directional derivative dvf is given by the increment in the tangent (hyper)plane corresponding to the increment v. Many analogies with the univariate functions follow from the latter fact. In particular, a differentiable function / on En has zero differential at a point x e En if and only if its composition with any curve going through this point has a stationary point there, i.e., is neither increasing, nor decreasing in the linear approximation. In other words, the tangent plane at such a point is parallel to the hyperplane of the variables (i.e., its modelling space is En c En+1, having added the last coordinate set to zero). Of course, this does not mean that / should have a local externum at such a point, lust as in the case of univariate functions, this depends on the values of higher derivatives. But it is a necessary condition to the existence of extrema. 8.1.11. Derivatives of higher orders. The operation of differentiation can be iterated similarly to the case of univariate functions. This time, choose new directions for each iteration. _ Fix an increment v e R™. The enumeration of the differentials at this argument defines a (differential) operation on differentiable functions / : En —> R f^dvf = df(v), and the result is again a function df(v) : En —> R. If this function is differentiable as well, repeat this procedure with another increment, and so on. In particular, work with iterations of partial derivatives. For second-order partial derivatives, write dxj dxi J dxidx dxidxj 521 CHAPTER 8. CALCULUS WITH MORE VARIABLES □ 8.D.17. Find all points on the conic k : x2 + 3y2 — 2x + Qy — 8 = 0 such that the normal of the conic is parallel to the y axe. For each point write the equation of the tangent in the point. Solution. The normal to k in a point is parallel to the y axe iff the tangent line in the point is parallel to the x axe. The normal to k in [x0, y0] e k is parallel to y axis iff one of the tangents to k in [xq, yo] is parallel to x axis, and this happens iff y'(x0) = 0, where y is a function given implicitly by k in a neighborhood of [x0, y0]. Derivation of the eqution of k . Thus gives 2x + Qyy' — 2 + Qy' = 0, that is y' = 3(1+2/) y(x0y = 0, iff x0 = 1. Substituting to the equation of k we get 1 + 3y$ - 2 + Qy0 - 8 = 0, thus y0 = 1 or y0 = -3. The saught points are [1,1], resp. [1, —3], the equations of tangents in the points are y = 1, resp. y = — 3. □ 8.D.18. On the conic given by the equation 3x2 + Qy2 — 3x + 3y — 2 = 0 find all points where the normal to the conic is parallel with the line y = x. For each point give the equation of the tangent in the point. O 8.D.19. On the conic given by the equation x2 + xy + 2y2 — x + 3y — 54 = 0 find all points where the normal to the conic is parallel to the x axis. For each point give the equation of the tangent in the point. O 8.D.20. On the graph of the function u(x,y,z) =x \Jy2 + z2 find all points where the tangent plane is parallel to the plane x + y — z — w = 0. O 8.D.21. Findthepointsontotheellipsoida;2+2y2+2:2 = 1, where the tangent planes are parallel to x — y + 2z = 0. Solution. The equation of the tangent plane is determined by the partial derivatives of z = z(x,y) given implicitly by the equation x2 + 2y2 + z2 = 1 of the ellipsoid. The normal vector in [x0, y0, z0] is (z'x (x0, y0), z'y (x0, y0), -1). This vector has to be parallel to the normal (1, —1, 2) of the plane, thus (-2z'x(x0,y0), -2z'y(x0,y0),2) = (1,-1,2). which yields 2xo = «o,4yo = —«o and after substituting to the ellipsoid's equation we get the sought points: [^=, -^=, -j=\ Another solution. It is useful to realize, that the normal vector in [xo,yo,zo] of the surface In the case of the repeated choice i = j, write dxi dxi ) dx2 dx2 ' Proceed in the same way with further iterations and talk about partial derivatives of order k dxi . dxi More generally, one can iterate (assuming the function is sufficiently differentiable) any directional derivatives; for instance, dv o dwf for two fixed increments v,w e Rn. fc-times differentiable functions Definition. A function / : En —> R is k-times (continuously) differentiable at a point x if and only if all its partial derivatives up to order k (inclusive) exist in a neighbourhood of the point x and are continuous at this point. / is fc-differentiable if it is fc-times (continuously) differentiable at all points of its domain. From now on, the work is with continuously differentiable functions unless explicitly stated otherwise. To show the basic features of the higher derivatives in the simplest form, work in the plane E2, supposing the second-order partial derivatives are continuous. In the plane as well as in the space, iterated derivatives are often denoted by mere indices referring to the variable names, for example: f -_L f f -AtL f - &f Ix ~ dx ' Ixx ~ dx2 ' Ixy ~ Oxdy ' Iyx ~ dydx ' We show that the continuous partial derivatives commute. That is, the order in which differentiation \>, is carried out does not matter. Suppose that the partial derivatives exist and are continuous, i.e., the limits fxy(x,y) \nn-t(fx(x,y + t) = lim — t->o t lim - s-t-0 S fx(x,y)) f(x + s,y + t)-f(x,y + t) (f(x + s,y) - f(x,y)) exist. However, since the limits can be expressed by any choice of values tn —> 0 and sn —> 0 and the limits of the corresponding sequences, the second derivative can be expressed as fxy(x,y) = \im ^(^(f(x + t,y + t) - f(x,y + t)) - (f(x + t,y)-f(x,y)) if the limit on the right hand side exists (notice we cannot take this as granted without further arguments). Consider the expression from which the last limit is taken as the function ip(x,y,t), and try to express it in terms of partial derivatives. For a temporarily fixed t, denote g(x, y) = 522 CHAPTER 8. CALCULUS WITH MORE VARIABLES given implicitly by F(x, y,z) = 0 is the vector {F'x{xo,Vo, zo),Fy(x0, y0, z0),Fz(x0, y0, z0)). □ 8.D.22. Determine whether the tangent plane to the graph of the function / : R x R+ -> R, f(x, y) = x ■ ln(y) at the point [1, i] goes through the point [1,2, 3] g R3. Solution. First of all, we calculate the partial derivatives: fx(x,y) = ln(y), fy(x,y) = |; their values at the point [1, are —1, e; further /(l, -j) = —1. Therefore, the equation of the tangent plane is z = f^l)+fx{l,l) + (y-I) = —1 — a; + ey. The given point does not satisfy this equation, so it does not lie in the tangent plane. □ 8.D.23. Determine the parametric equation of the tangent line to the intersection of the graphs of the functions / : R2 -> R, f(x, y) = x2 + xy - 6; g : R x R+ -> R, g(x,y) = x ■ ln(y) at the point [2,1]. Solution. The tangent line to the intersection is the intersection of the tangent planes at the given point. The plane that is tangent to the graph of / and goes through the point [2,1] is z = f (2,1) + fx(2, l)(x - x0) + fy(2, l)(y - y0) = 5x + 2y-l2. The tangent plane to the graph of g is then z = f(2,l) + gx(x,y)(2,l)(x - x0) + g(x,y)y(2A)(y-y0) = 2y-2. The intersection line of these two planes is given parametri-cally as [2,f,2f — 2], í g R. Another solution. The normal to the surface given by the equation f(x,y,z) = 0 at the point b = [2,1,0] is (fx(b), Syip), fzip)) = (5, 2, —1); the normal to the surface given by g(x, y,z) = 0 at the same point is (0,2, —1). The tangent line is perpendicular to both normals; we can thus obtain a vector it is parallel to as the vector product of the normals, which is (0,5,10). Since the tangent line goes through the point [2,1,0], its parametric equation is [2,1 + t,2t], t g R. □ 8.D.24. V. ypočtěte všechny parciální derivace prvního a druhého řádu funkce f(x,y,z) = x*. Solution. fx = ^zxv--\ťy = xv-\nx-\,fz=xv-\nx- =£, fxx = ^(^~^)x* ''i fyy = x"^m x'~Z?i f(x + t,y) — f(x, y). Then the expression in the last large parentheses is, by the mean value theorem, equal to g(x,y + t) - g{x,y) = t ■ gy{x,y + t0) for a suitable t0 which lies between 0 and t (the value of t0 depends on t). Now, gy{x,y) = fy(x + t,y) - fy(x,y), so we may rewrite ip as y + t0))-Another application of the mean value theorem yields 0 guarantees the desired equality fxy(x,y) = fyx(x,y) at all points (x,y). 8.1.12. The same procedure for functions of n variables proves the following fundamental result: commutativity of partial derivatives Theorem. Let f : En —> R be a k-times differentiable function with continuous partial derivatives up to order k (inclusive) in a neighbourhood of a point x g Rn. Then all partial derivatives of the function f at the point x up to order k (inclusive) are independent of the order of differentiation. Proof. The proof for the second order is illustrated above in the special case when n = 2. In fact, it yields the general case as well. Indeed, notice that for every fixed choice of a pair of coordinates xt and Xj, the discussion of their interchanging takes place in a two-dimensional affine subspace, (all the other variables are considered to be constant and do not affect in the discussion). So neighbouring partial derivatives may interchanged. This solves the problem in order two. In the case of higher-order derivatives, the proof can be completed by induction on the order. Every order of the indices ii,...,ik can be obtained from a fixed one by several interchanges of adjacent pairs of indices. □ 523 CHAPTER 8. CALCULUS WITH MORE VARIABLES f'Jz=xt \n2x-£ + x^ hix-% f" = \x^ + ^-Hnx ■ ljx> E^-Hnx ■ =%,f" ^x^lrfx-^+x* Inx ■ + □ 8.D.25. Find all first and second order partial derivatives of z = f(x, y) in [1, \/2,2] defined in a neighborhood of the point by x2 + y2 + z2 — xz — \/2yz = 1. o 8.D.26. Find all first and second order partial derivatives of z = f(x, y) in [—2, 0,1] defined in a neighborhood of the point by 2x2 + 2y2 + z2 + 8xz -2 + 8 = 0. O 8.D.27. Determine all second partial derivatives of the function / given by f(x, y, z) = y'xylaz. Solution. First, we determine the domain of the given function: the argument of the square root must be non-negative, and the argument of the natural logarithm must be positive. Therefore, Df = {(x, y, z) e R3, (z > lk(xy > 0)) V (0 < z < \)k{xy < 0)}. Now, we calculate the first partial derivatives with respect to each of the three variables: fx yln(z) f' fy x 111(2) = , fz xy 2^/xy 111(2) JV 2^/xy 111(2) J 2z^/xy 111(2) Each of these three partial derivatives is again a function of three variables, so we can consider (first) partial derivatives of these functions. Those are the second partial derivatives of the function /. We will write the variable with respect to which we differentiate as a subscript of the function /. fXX fxy fxz fyy fyz fzz 2 1 2 y In 2 4(xy In z)'i xy In2 2 In 2 4(xylnz)i 2yjxy\nz' xy2 In 2 + Az(xy\az)i 2z^xy In 2 21 2 x In 2 4(xy In z)'i x2yIn 2 + 4z(xy\nz)2 2zy/xy In 2' xy x2y2 422(xyln2)f 2z2^xy In 2 8.1.13. Hessian. The differential was introduced as the linear form df(x) which approximates the function / at a point x in the best possible way. Similarly, a quadratic approximation of a function / : En -+ R is possible. Hessian Definition. If / : R™ -+ R is a twice differentiable function, the symmetric matrix of functions / a2/ (x) a2/ (x)\ V 9 f (x) 9 f (x) I is called the Hessian of the function /. It is already seen from the previous reasonings that the vanishing of the differential at a point (x, y) e E2 guarantees stationary behaviour along all curves going through this point. The Hessian Hf(x, y)=( -H^'y] fxy(ix'y}] JK ,U> \fyx(x,y) fyy(x,y)) plays the role of the second derivative. For every parametrized straight line c(t) = (x(t),y(t)) = (x0 + y0 + -qt), the derivative of the univariate function a(t) = j(x(t),y(t)) can be computed by means of the formula -^f(t) = fx(x(t),y(t))x'(t) + Jy(x(t),y(t))y'(t) (derived in 8.1.8) and so the function df df P{t) = f(x0,yo) +tg--(x0,y0)£ + t — (x0,y0)Ti + fxx(x0,yo)£2 + 2fxy(x0,y0)£ri + fyy(x0,y0)i]: shares the same derivatives up to the second order (inclusive) at the point t = 0 (calculate this on your own!). The function (3 can be written in terms of vectors as = f(x0,yo) + df(x0,y0)(tv) + ^Hf(x0,y0)(tv,tv), where v = (£, 77) is the increment given by the derivative of the curve c(t), and the Hessian is used as a symmetric 2-form. This is an expression which looks like Taylor's theorem for univariate functions, namely the quadratic approximation of a function by Taylor's polynomial of degree two. The following illustration shows both the tangent plane and this quadratic approximation for two distinct points and the function f(x,y) = sin(a;) cos(y). popisobrazku 524 CHAPTER 8. CALCULUS WITH MORE VARIABLES By the theorem about interchangeability of partial derivatives (see 8.1.12), we know that fxy = fyx, fxz = fzx, fyz = fzy. Therefore, it suffices to compute the mixed partial derivatives (the word "mixed" means that we differentiate with respect to more than one variable) just for one order of differentiation. □ £. Taylor polynomials 8.E.I. Write the second-order Taylor expansion of the function / : R2 -> R, f{x,y) = \n(x2 + y2 + 1) at the point [1.1]- Solution. First, we compute the first partial derivatives: fx 2x -J, 2y x2 + y2 + VJy x2+y2 + V then the Hessian: Hf(x,y) = 4Xy 02+j/2 + l)2 (x2+j/2 + l)2 4Xy ' (x2+j/2 + l)2 (x2+j/2 + l)2 The value of the Hessian at the point [1,1] is 2 _4 \ 25 9 9 Altogether, we get that the second-order Taylor expansion at the point [1,1] is T2(x,y) =/(l, 1) + fx(l, l)(x - 1) + fy(l, l)(y - 1) + \(x-l,y-l)Hf(l,l) ^yZ\ = m(3) + |(z-l) + |(y-l) + ±(z-l)2 =\ix2 + V2 + 8x + % - AxV - 14) + ln(3)-9 □ Remark. In particular, we can see that the second-order Taylor expansion of an arbitrary differentiable function at a given point is a second-order polynomial. 8.E.2. Determine the second-order Taylor polynomial of the function / : R2 —> R2, f(x, y) = xy cosy at the point [tt, 7t] . Decide whether the tangent plane to the graph of this function at the point [7r,7r, /(tt, tt)] goes through the point [0,7T,0]. Solution. As in the above exercises, we find out that TV "i 1 2 2 3 , 1 4 T(xi y) = 277 y ^xy^7ry+27r' 8.1.14. Taylor's expansion. The multidimensional version of Taylor's theorem is an example of a mathe-'■*ScTS^" matical statement where the most difficult part is finding the right formulation. The proof is then quite simple. The discussion on the Hessians continues. Write Dkf for the fc-th order approximations of the function / : En —> R™. It is always a fc-linear expressions in the increments. The differential D1f = df (the first order) and the Hessian D2 f = Hf (the second order) are already discussed. For functions / : En —> R, points x = (x1,..., xn) G En, and increments v = (£i,..., £„), set Dkf(x)(v)= l R be a k-times differentiable function in a neighbourhood Og(x) of a point x G En. For every increment v G Rn of size \\v\\ < S, there exists a number 9, 0 < 9 < 1, such that f(x + v) = f(x) + D1f(x)(v) + ZD2f(x)(v)+ "'+ Ji~TyDk~lf{x){v) + h°kf{x + 9'v){v)- Proof. Given an increment v G R™, consider the { R denned by the composition p(t) = f o c(t). Taylor's theorem for univariate functions claims that (see Theorem 6.1.3) 525 CHAPTER 8. CALCULUS WITH MORE VARIABLES The tangent plane to the graph of the given function at the point [tt, tt] is given by the first-order Taylor polynomial at the point [tt, tt]; its general equation is thus Z = —TTy — TTX + TT2, and this equation is satisfied by the given point [0, tt, 0]. □ 8.E.3. Determine the third-order Taylor polynomial of the function / : R3 —> R, f(x,y,z) = x3y + xz2 + xy + 1 at the point [0,0,0]. O 8.E.4. Determine the second-order Taylor polynomial of the function / : R2 —> R, f(x, y) = x2 sin y + y2 cos x at the point [0,0]. Decide whether the tangent plane to the graph of this function at the point [0,0,0] goes through the point [tt,tt,tt]. O 8.E.5. Determine the second-order Taylor polynomial of the function ln(x2y) at the point [1,1]. O 8.E.6. Determine the second-order Taylor polynomial of the function / : R2 -> R, f(x,y) = tan(xy + y) at the point [0, 0] o F. Extrema of multivariate functions 8.F.I. Determine the stationary points of the function / : R2 —> R, f(x, y) = x2y + y2x — xy and decide which of these points are local extrema and of which type. Solution. The first derivatives are fx = 2xy + y2 — y, fy = x2 + 2xy — x. If we set both partial derivatives equal to zero simultaneously, the system has the following solution: {x = y = 0}, {x = 0, y = 1}, {x = l,y = 0}, {x = l/3,y = l/3}, which are four stationary points of the given function. The Hessian of the function / is 2y 2x + 2y- l\ K2x + 2y - 1 2x )' Its values at the stationary points are, respectively, 0 -1 1 1 1 0 0 1 1 1 2 1 I I 3 3 -1 0 Therefore, the first three Hessians are indefinite, and the last one is positive definite. The point [1/3,1/3] is thus a local minimum. □ ip(t) = 0 (the composition as well as the sum of increasing functions is again an increasing function). Therefore, it has a unique externum, and that is a minimum at the point x = 0. Similarly, for any fixed value of x, f is a shift of the function f2, and f2 has a minimum at the point y = 0, which is its only externum. We have thus proved that / can have a local externum only at the origin. Since /(0,0) = 0, f(x,y)>0, [x,y] gR2x{[0,0]}, Multi-indices A multi-index a of length n is an n-tuple of non-negative integers (qi, ..., an). The integer \a\ = ol\ + ■ ■ ■ + an is called the size of the multi-index a. Monomials are written shortly as xa instead of a"1 £22 • • • xnn- Real polynomials in n variables can be symbolically expressed in a similar way as univariate polynomials: / = E flQa;Q' g = E bl3X'3 e Kta;i''''' H R, for an increment v g R™ is the polynomial (1) f(x + v) = f{x)+ V -daf{x)va, L—' a! l<|a|o ~t}PaJt(x) v° converges at some neighborhood of t) = 0, we call the function / (real) analytic on a neighborhood of x. For instance, this happens if all the partial derivatives are uniformly bounded, i.e. daj(x) < C for all a. Indeed, we may estimate Y\a\=k ^i.9af(x)va < ^C||™||IQI and thus the Taylor series converges by the Weierstrass criterion. Think about the details. Actually, our argument shows that the Taylor series converges (on a small neighborhood of x) if the partial derivatives do not grow with the order k faster than k\. 527 CHAPTER 8. CALCULUS WITH MORE VARIABLES the function / has a strict local (even global) minimum at the point [0,0]. □ 8.F.4. Examine the local extrema of the function f(x,y) = (x + y2) et, x,y eR. Solution. This function has partial derivatives of all orders on the whole of its domain. Therefore, local extrema can occur only at stationary points, where both the partial derivatives fx, fy are zero. Then, it can be determined whether the local extremum occurs by computing the second derivatives. We can easily determine that fx(x,y) =e* + ^ (x + y2)^, fy(x,y) = 2yet, x,yeR. A stationary point [x, y] must satisfy fy(x,y) = 0, i-e. y = 0, and, further, fx(x,y) = fx(x,0) = et (l + \x) = 0, i.e. x = -2. We can see that there is a unique stationary point, namely [-2,0]. Now, we calculate the Hessian Hf at this point. If this matrix (the corresponding quadratic form) is positive definite, the extremum is a strict local minimum. If it is negative definite, the extremum is a strict local maximum. Finally, if the matrix is indefinite, there will be no extremum at the point. We have fxx(x,y) = ±e* (2 + \ (x + y2)) , fyy(x,y) = 2e*, fxy(x,y) = fyx(x,y) = yef, x,yeR. Therefore, rrf(, m _ (f(-2,0) fxy (-2,0)\ "'"I ' [fyx (-2,0) fyy (-2,0)) _ A/2e 0 ~ V 0 2/ey We should recall that the eigenvalues of a diagonal matrix are exactly the values on the diagonal. Further, positive definite-ness means that all the eigenvalues are positive. Hence it follows that there is a strict local minimum at the point [—2,0]. □ 8.F.5. Find the local extrema of the function f(x, y, z) = x3 + y2 + — 3xz — 2y + 2z, x,y,z G R. Solution. The function / is a polynomial; therefore, it has partial derivatives of all orders. It thus suffices to look for its stationary points (the extrema cannot be elsewhere). In order 8.1.16. Local extrema. We examine the local maxima and minima of functions on En using the differential and the Hessian, lust as in the case of univariate functions, an interior point x0 e En of the domain of a function / is said to be a (local) maximum or minimum if and only if there is a neighbourhood U of x0 such that for all points x e U, the function value satisfies f(x) < f(x0) or f(x) > f(x0), respectively. If strict inequalities hold for all x ^ x0, there is a strict extremum. To simplify, suppose that / has continuous both first-order and second-order partial derivatives on its domain. A necessary condition for the existence of an extremum at a point x0 is that the differential be zero at this point, i.e., df(x0) = 0. If df(x0) ^ 0, then there is a direction v in which dvf(x0) ^ 0. However, then the function value is increasing at one side of the point x0 along the line x0 + tv and it is decreasing on the other side, see 5.3.2. An interior point x e En of the domain of a function / at which the differential df(x) is zero is called a stationary point of the function f. To illustrate the concept on a simple function in E2, consider f(x, y) = sin(a;) cos(y). The shape of this function resembles the well-known egg plates, so it is evident that there are many extrema, and also many stationary points which are not extrema ("saddles" are visible in the picture). Calculate the first derivatives, and then the necessary second-order ones: fx(x, y) = cos(x) cos(y), fy(x, y) = - sin(x) sin(y), and both derivatives are zero for two sets of points (1) cos(a;) = 0, sin(y) = 0, that is (x,y) = (^1ti,Iti), for any k,£ eZ (2) cos(y) = 0, sin(x) = 0, that is (x,y) = (kir, ^^rr), for any k,£ e Z. The second partial derivatives are Hf(x,y) = (f;* f;v)(x,y) fxy fyyJ — sin(a;) cos(y) — cos(a;) sin(y) — cos(a;) sin(y) — sin(a;) cos(y) So the following Hessians are obtained in two sets of stationary points: 528 CHAPTER 8. CALCULUS WITH MORE VARIABLES to find them, we differentiate / with respect to each of the three variables x, y, z and set the derivatives equal to zero. We thus obtain 3a;2 — 3z = 0, i. e., z = x2, 2y-2 = 0, i.e., y = 1, and (utilizing the first equation) z -3a; + 2 = 0, i.e., x e {1,2}. Therefore, there are two stationary points, namely [1,1,1] and [2,1,4]. Now, we compute all second-order partial derivatives: fx QX, fXy fyX 0, fXZ jZ -3, fyy fyz fzy fzz ^* Having this, we are able to evaluate the Hessian at the stationary points: Hf (1,1,1) Hf (2,1,4) = Now, we need to know whether these matrices are positive definite, negative definite, or indefinite in order to determine whether and which extrema occur at the corresponding points. Clearly, the former matrix (for the point [1,1,1]) has eigenvalue A = 2. Since its determinant equals —6 and it is a symmetric matrix (all eigenvalues are real), the matrix must have a negative eigenvalue as well (because the determinant is the product of the eigenvalues). Therefore, the matrix Hf (1,1,1) is indefinite, and there is no extremum at the point [1,1,1]. We will use the so-called Sylvester's criterion for the latter matrix Hf (2,1,4). According to this criterion, a real-valued symmetric matrix ( an a-12 a-13 ■ ■ aln\ a-12 a-22 a-23 ■ l2n A = a-13 a-23 «33 ' l3n \0-ln l2n l3n ' &nn J is positive definite if and only if all of its leading principal minors A, i. e. the determinants an ai2 ari d-3 = a 12 C122 d23 , • • • 1\3 a23 a33 di = Ian I , d2 an ai2 0-12 a22 dn — A |, are positive. Further, it is negative definite iff (1) Hf(k-k + f,^7r) = ± ^ , where the minus sign occurs when k and I have the same parity (remainder on division by two), and the sign + occurs in the other case; (2) Hf(k-k, £tt + -|) = ± ^ ^, where again the minus sign occurs when k and I have the same parity, and the sign + occurs in the other case; From the proposition of Taylor's theorem for order k = 2, there is, in a neighbourhood of one of the stationary points (x0,yo), f(x,y) = f(x0,yo)+ + \Hf(x0 + 0{x - x0),y0 + 6(y - y0))(£, 77). Here, Hf is considered to be a quadratic form evaluated at the increment (x — x0,y — y0) = (£,rj). In the case (1), Hf(x0,y0)(Z,ri) = ±(f + v2), while in the case (2), Hf(xo,yo)(£,,v) — ±2£?7. While in the first case, the quadratic form is either always positive or always negative on all nonzero arguments, in the second case, there are always arguments with positive values and other arguments with negative values. Since the Hessian of the function is continuous (i.e. all the partial derivatives up to order two are continous), the Hessians in the nearby points are small perturbations of those in (x0,y0) and so these properties of the quadratic form Hf(x,y) remain true on some neighbourhood of (a;0,yo)-This is obvious in cases (1) and (2), since a small perturbation of the matrices does clearly not change the latter properties of the quadratic forms in question. A general formal proof is presented below. The local maximum occurs if and only if the point (xq , yo) belongs to the case (1) with k and I of the same parity. On the other hand, if the parities are different, then the point from the case (1) happens to be a point of a local minimum. On the other hand, in the case (2) the entire function / behaves similarly to the Hessian and so the "saddle" points are not extrema. 8.1.17. The decision rules. In order to formulate the general statement about the Hessian and the local extrema at stationary points, it is necessary to remember the discussion about quadratic forms from the paragraphs 4.2.6^1.2.7 in the chapter on affine geometry. There are introduced the following types of quadratic forms h : En —> R: • positive definite if and only if h(u) > 0 for all u =^ 0 • positive semidefmite if and only if h(u) > 0 for all u G V • negative definite if and only if h(u) < 0 for all u =^ 0 • negative semidefmite if and only if h(u) < 0 for all u G V • indefinite if and only if h(u) > 0 and f(v) < 0 for appropriate u, v e V. 529 CHAPTER 8. CALCULUS WITH MORE VARIABLES di < 0, d2 > 0, d3 < 0, ..., {-\)ndn > 0. The inequalities 12 0 -3 = 24 > 0, 0 2 0 = 6 > 0, 12 = 12 > 0, 12 0 0 2 -3 0 1 imply that the matrix Hf (2,1,4) is positive definite - there is a strict local minimum at the point [2,1,4]. □ 8.F.6. Find the local extrema of the function z = (x2-\) (l-a;4-y2), i,i/eR. Solution. Once again, we calculate the partial derivatives zx, zy and set them equal to zero. This leads to the equations -6x5 + 4x3 + 2x- 2xy2 = 0, (x2 - l) (-2y) = 0, whose solutions [x,y] = [0,0], [x,y] = [1,0], [x, y] = [—1, 0]. (In order to find these solutions, it suffices to find the real roots 1,-1 of the polynomial —6a;4 + Ax2 + 2 using the substitution u = x2. Now, we compute the second-order partial derivatives zxx = -30a;4 + 12a;2 + 2 - 2y2, zXy — ZyX — 4a;y, zyy ~ ~^ (x — l) and evaluate the Hessian at the stationary points: Hz (0,0) 2 0 0 2 Hz (1,0) = Hz (-1,0) = -16 0 0 0 We can see that the first matrix is positive definite, so the function has a strict local minimum at the origin. However, the second and third matrices are negative semidefinite. Therefore, the knowledge of second partial derivatives in insufficient for deciding whether there is an extrémům at the points [1,0] and [—1,0]. On the other hand, we can examine the function values near these points. We have z (1,0) = z (-1,0) = 0, z (a;,0) < Ofora; e (-1,1). Further, consider y dependent on a; e (—1,1) by the formula y = a/2 (1 — a;4), so that y —> Ofora; —> ±1. For this choice, we get z (x, ^2(1 - a;4)) = (a;2 - l) (a;4 - l) > 0, x e (—1,1). We have thus shown that in arbitrarily small neighborhoods of the points [1,0] and [—1,0], the function z takes on both higher and lower values than the function value at the corresponding point. Therefore, these are not extrema. □ There are methods to allow determining whether or not a given form has any of these properties. The Taylor expansion with remainder immediately yields the following rules: Local extrema Theorem. Let f : En ■ "- be a twice continuously differ-entiable function and x G En be a stationary point of the function f. Then (1) f has a strict local minimum at x if Hf(x) is positive definite, (2) f has a strict local minimum at x if Hf(x) is negative definite, (3) f does not have an extremum at x ifHf (x) is indefinite. Proof. The Taylor second-order expansion with remainder applied to our function f(x1,..., xn), an arbitrary point x = (xi,... ,xn), and any increment v = (vi,..., vn), such that all points x + 8v,8 e [0,1], lie in the domain of the function /, says that f{x + v) = f{x) + df{x){v) + X-Hf(x + 9 ■ v)(v) for an appropriate real number 6, 0 < 6 < 1. Since it is supposed that the differential is zero, we obtain f{x + v) = f{x)+l-Hf{x + 6-v){v). By assumption, the quadratic form Hf(x) is continuously dependent on the point x, and the definiteness or indefiniteness of quadratic forms can be determined by the sign of the major subdeterminants of the matrix Hf, see Sylvester's criterion in paragraph 4.2.7. However, the determinant itself is a polynomial expression in the coefficients of the matrix, hence a continuous function. Therefore, the non-vanishing and signs of the examined determinants are the same in a sufficiently small neighbourhood of the point x as at the point x itself. In particular, for a positive definite Hf(x), it is guaranteed that at a stationary point x, f(x + v) > f(x) for sufficiently small v. So this is a sharp minimum of the function / at the point x. The case of negative definiteness is analogous. If Hf(x) is indefinite, then there are directions v, w in which f(x + v) > f(x) and f(x + w) < f(x), so there is no extremum at the stationary point in question. □ The theorem yields no result if the Hessian of the function is degenerate, yet not indefinite at the point in question. The reason is the same as in the case of univariate functions. In these cases, there are directions in which both the first and second derivatives vanish, so at this level of approximation, it cannot be determined whether the function behaves like t3 or ±t4 until higher-order derivatives in the necessary directions are calculated. At the same time, even at those points where the differential is non-zero, the definiteness of the Hessian Hf(x) has 530 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.F.7. Decide whether the polynomial p(x, y) = x6 + y8 + yAxA — x6y5 has a local externum at the stationary point [0,0]. Solution. We can easily verify that the partial derivatives px and py are indeed zero at the origin. However, each of the partial derivatives pxx, pxy, pyy is also equal to zero at the point [0,0]. The Hessian Hp (0,0) is thus both positive and negative semidefinite at the same time. However, a simple idea can lead us to the result: We can notice that p(0,0) = 0 and p(x, y) = x6 (l - y5) +ys + y4x4 > 0 for [x,y] G R x (-1,1) \ {[0,0]}. Therefore, the given polynomial has a local minimum at the origin. □ 8.F.8. Determine local externa of the function / : R3 —> R, f(x, y, z) = x2y + y2z + x — z on R3. O 8.F.9. Determine the local externa of the function / : R3 —> R, f(x, y, z) = x2y — y2z + Ax + z on R3. O 8.F.10. Determine the local externa of the function / : R3 -> R, f(x,y, z) = xz2 +y2z - x + y onR3. O 8.F.11. Determine the local externa of the function / : R3 -> R, f(x, y, z) = y2z - xz2 + x + Ay on R3. O 8.F.12. Determine the local externa of the function / : R2 —> R, f(x, y) = x2y + x2 + 2y2 + y on R2 O 8.F.13. Determine the local externa of the function / : R2 —> R, f(x, y) = x2y + 2y2 + 2y on R2. O 8.F.14. Determine the local externa of the function / : R2 —> R, f(x, y) = x2 + xy + 2y2 + y on R2. O 8.F.15. Determine the local externa of the function / : R2 —> R, f(x, y) = x2 + xy — 2y2 + y on R2. O G. Implicitly given functions and mappings 8.G.I. Let F : R2 -> R be a function, F(x, y) = xysm (^xy2). Show that the equality Fix, y) = 1 implicitly defines a function / : U —> R on a neighborhood U of the point [1,1] so that Fix, fix)) = 1 for x G U. Determine /'(!)■ Solution. The function is differentiable on the whole R2, so it is such on any neighborhood of the point [1,1]. Let us evaluate^ at [1, 1]: Fy(xi V) = x sin ( 7;xy2) + KX2y2 cos i^-xy similar consequences as the non-vanishing of the second derivative of a univariate function. For a function / : R™ —> R, the expression z(x + v) = j(x) + df(x)(v) defines the tangent hyperplane to the graph of the function / in the space Rn+1. Taylor's theorem of order two with remainder, as used in the proof above, provides the expression f(x + v) = z(x + v) + ±Hf(x + 0v)(v). If the Hessian is positive definite, all the values of the function / lie above the values of the tangent hyperplane for arguments in a sufficiently small neighbourhood of the point x, i.e. the whole graph is above the tangent hyperplane in a sufficiently small neighbourhood. In the case of negative definiteness, it is the other way round. Finally, when the Hessian is indefinite, the graph of the function has values on both sides of the hyperplane. This happens, in general, along objects of lower dimensions in the tangent hyperplane, so there is no straightforward generalization of inflection points. 8.1.18. The differential of mappings. The concepts of de-<-jm£- rivative and differential can be easily extended |f^| to mappings F : En —> Em. Having selected ^rfivf; ^e Cartesian coordinate system on both sides, i|%rfi^-A_ this mapping is an ordinary m-tuple F(xi, . . . , Xn) (fi (xi, ...,Xn),..., fm(xl? • • • , -^n)) of functions /, : En —> R. F is defined to be a differentiable or k-times differentiable mapping if and only if the corresponding property is shared by all the functions fi,...,fm-The differentials df{ (x) of the particular functions /, give a linear approximation of the increments of their values for the mapping F. Therefore, we can expect that they also give a coordinate expression of the linear mapping D1F(x) : R™ —> Rm between the modelling spaces which linearly approximates the increments of the mapping F. Differential and Iacobi matrix Consider a differentiable mapping F : ponents (fi(x1,... ,xn),.. .,fm(x1, domain. The matrix n ->• Rm with com-., xn)) and x in its D1F{x) (dh{x)\ df2(x \dfm(x)J a/2 a/2 dxi dx2 M dxj (x) is called the Jacobi matrix of the mapping F at the point x. The linear mapping D1F(x) defined on the increments v = (vi,..., vn) by the lacobi matrix is called the differential of the mapping F at a point x in the domain if and only if lim -J—l F(x + v) -Fix) -D1F(x)(v) ) =0. •u-s-0 \\v\\ \ ) Recall that the definition of Euclidean distance guarantees that the limits of values in En exist if and only if the 531 CHAPTER 8. CALCULUS WITH MORE VARIABLES so Fy(l, 1) = 1 0. Therefore, it follows from theorem 8.1.24 that the equation F(x,y) = 1 implicitly determines on a neighborhood of the point (1,1) a function /:[/—» R defined on a neighborhood of the point (number) 1. Moreover, we have /7T 2\ 7T o (tt Fx(x,y) = ysm^-xy j + -xy cos y-^xy' so the derivative of the function / at the point 1 satisfies □ Remark. Notice that although we are unable to explicitly define the function / from the equation F(x, f(x)) = 1, we are able to determine its derivative at the point 1. 8.G.2. Considering the function F : R2 -> R, F{x, y) = ex sin(y) + y - tt/2 - 1, show that the equation F(x,y) = 0 implicitly defines the variable y to be a function of x, y = j (x), on a neighborhood of the point [0,tt/2]. Compute /'(0). Solution. The function is differentiable in a neighborhood of the point [0,7r/2]; moreover, Fy = ex cosy+l,F(0,7r/2) = 1 7^ 0, so the equation indeed defines a function / : U —> R on a neighborhood of the point [0,7r/2]. Further, we have Fx = ex siny, Fx(0, tt/2) = 1, and its derivative at the point 0 satisfies: Fy(0,tt/2) 1 8.G.3. Let F(x, y, z) = sin(a;y) + sin(yz) + sin(xz). Show that the equation F(x, y,z) = 0 implicitly defines a function z(x, y) : R2 —> R on a neighborhood of the point [tt, 1, 0] e R3 so that F(x, y, z(x, y)) = 0. Determine zx(tt, 1) and 2^(71,1). Solution. We will calculate Fz = ycos(yz) + xcos(xz), Fz(tt, 1,0) = tt + 1 7^ 0, and the function z(x, y) is defined by the equation F(x, y, z(x, y)) = 0 on a neighborhood of the point [tt, 1, 0]. In order to find the values of the wanted partial derivatives, we first need to calculate the values of the remaining partial derivatives of the function F at the point M,o]. Fx(x,y, z) = ycos(xy) + zcos(xz) Fx(tt, 1,0) = -1, Fy(x, y,z) = x cos (ay) + z cos(yz) Fy(rT, 1,0) = — tt, limits of the particular coordinate components do. Direct application of Theorem 8.1.6 about the existence of the differential for functions of n variables to the particular coordinate functions of the mapping F thus leads to the following generalization (prove this in detail by yourselves!): Existence of the differential Corollary. LetF : En —> Em be a mapping such that all of its coordinate functions have continuous partial derivatives in a neighbourhood of a point x G En. Then the differential D1F(x) exists, and it is given by the Jacobi matrix D1 F(x). 8.1.19. Lipschitz continuity. Continuous differentiability of mappings allows good control on their variability in the following sense. Assume the estimates of the difference F(y) — F(x) for all x and y from a convex compact subset K in the domain of F are of interest. Applying the Taylor theorem with remainder in order one on each of the components of F = (/1,..., /„) separately gives the estimate (write v = y — x) \\F(y)-F(x)f = J2\My)-m i=l m = Y,\D1Mx + 0iv)(v)\2 i=i j=i j df < i max I -r^— (z) ,zeK,i,j' oxj nm\\v\\z = Cz\\v\\ for an appropriate constant C > 0. The fact that continuous functions are bounded over each compact set is used. This is the property of Lipschitz continuity of F on the compact set K: \\F(y) - F(x)\\ Rm is Lipschitz continuous over convex compact sets. 8.1.20. Differential of composite mappings. The following theorem formulates a very useful generalization of the chain rule for univariate functions. Except for the concept of the differential itself, which is mildly complicated, it is actually the same as the one already seen in the case of one variable. The Jacobi matrix for univariate functions is a single number, namely the derivative of the function at a given point, so the multiplication of Jacobi matrices is simply the multiplication of the derivatives of the outer and inner components of the function. There is, of course, another special case: the formula derived and used several times for the derivative of a composition of multivariate functions with curves. There, 532 CHAPTER 8. CALCULUS WITH MORE VARIABLES odkud zx(tt, 1) = zv{tt,\) = Fx(tt,1,0) = >z(tt,1,0) tt + 1' Fy (it, 1,0) = tt >z(tt,1,0) tt + 1' □ 8.G.4. Having the mapping F : R3 -> R2, F(x, y, z) = (f(x,y,z),g(x,y,z)) = (ex sm^ xyz), show that the equation F(x, ci(x), C2(x)) = (0,0) defines a curve c : R —> R2 on a neighborhood of the point [1, tt, 1]. Determine the tangent vector to this curve at the point 1. Solution. We will calculate the square matrix of the partial derivatives of the mapping F with respect to y and z: _ffy fz\ _ (xcosyex^y 0 H(x,y,z) 9y 9z Hence, H(l,n, 1) = -1 0 1 tt xy and detiř(l,7t, 1) = —tt=£0. Now, it follows from the implicit mapping theorem (see8.1.24) that the equation F(x, ci(x), Oi{x)) = (0, 0) on a neighborhood of the point [1, tt, 1] determines a curve (ci (x), c2 (x)) defined on a neighborhood of the point [1, tt] . In order to find its tangent vector at this point, we need to determine the )column) vector (fx, gx) at this point: fx\ (smye*^y\ ffx{l, tt, 1)\ /0 9x) V yZ ) ' V5x(l,71", l)J [tt The wanted tangent vector is thus (Cl)x(l)\ ffy(l,tt,l) f^tt,!)^'1 f fx(l,tt,l (C2)x(l))) \9v0.,irA) gz(l,tt,l)J \9x(l,tt,l) V °H"HTIK)- □ H. Constrained optimization We will begin with a somewhat atypical optimization problem. 8.H.I. A betting office accepts bets on the outcome of a tennis match. Let the odds laid against player A winning be a : 1 (i. e., if a bettor bets x dollars on the event that player A wins and this really happens, then the bettor wins ax dollars) and, similarly, let the odds laid against player B winning be b : 1 (fees are neglected). What is the necessary and sufficient condition for (positive real) numbers a and b so that a bettor cannot guarantee any profit regardless the actual outcome of the match? (For instance, if the odds were laid 1.5 : 1 against the differential is the one form expressed via the partial derivatives of the outer components, evaluated on the vector of the derivative of the inner component, again given by the product of the one line (the form) and one column (the vector). The chain rule Theorem. Let F : En —> Em and G : Em —> Er be two differentiable mappings, where the domain ofG contains the whole image of F. Then, the composite mapping G o F is also differentiable, and its differential at any point x in the domain of F is given by the composition of differentials D1{G o F)(x) = D1G{F{x)) o D1F{x). The Jacobi matrix on the left hand side is the product of the corresponding Jacobi matrices on the right hand side. Proof. In paragraph 8.1.6 and in the proof of Taylor's i theorem, it was derived how the differentiation of mappings composed of functions and curves behaves. f This proved the theorem in the special case of n = = 1. The general case can be proved analogously, one just has to work with more vectors. Fix an arbitrary increment v and calculate the directional derivative for the composition G o F at a point x e En. This means to determine the differentials for the particular coordinate functions of the mapping G composed with F. To simplify, write g o F for any one of them. dv(goF)(x) - 1 Y^-(g{F{x + tv))-g{F{x))). The expression in parentheses can, from the definition of the differential of g, be expressed as g(F(x + tv)) - g(F(x) = dg(F(x))(F(x + tv) - F(x)) + a(F(x + tv) -F(x)), where a is a function defined on a neighbourhood of the point F(x) which is continuous and lim^- *(v) = 0. Substi- tution into the equality for the directional derivative yields dv(g o F)(x) = lini i fdg(F(x))(F(x + tv) - F(x)) + a(F(x + tv) - F(x))j = dg(F(x)) (Jim - \F(x + tv) - F(x) + lim j (a(F(x + tv) - F(x))j = dg(F(x)) o D1F(x)(v) + 0. The fact that linear mappings between finite-dimensional spaces are always continuous was used. In the last step the Lipschitz continuity of F, i.e. \\F(x + tv) —F(x)\\ < C\\v\\t was exploited, and the properties of the function a. So the theorem for the particular functions gi,..., gr of the mapping G is proved. The theorem in general now follows 533 CHAPTER 8. CALCULUS WITH MORE VARIABLES the win of A and 5 : 1 against the win of B, then the bettor could bet 3 dollars on B winning and 7 dollars on A winning and profit from this bet in either case). Solution. Let the bettor have P dollars. The bet amount can be divided to kP and (1 — k)P dollars, where k e (0,1). The profit is then akP dollars (if player A wins) or b(l — k)P dollars (if B does). The bettor is always guaranteed to win the lesser of these two amounts; the total profit (or loss) is obtained by subtracting the bet P, then. Since each of a, b, P is a positive real number, the function akP is increasing, and the function b(l — k)P is decreasing with respect to k. For k = 0, b(l—k)P is greater; for k = l, akP is. The minimum of the two numbers akP and b(l — k)P is thus maximal for a k e (0,1), namely for the value k0 which satisfies ak0P = b(l — ko)P, whence fco = Therefore, the betting office must choose a, b so that akoP = b(l — ko)P < P, which is equivalent to ak0 < 1, i. e., ab < a + b. □ We managed to solve this constrained optimization problem even without using the differential calculus. However, we will not be able to do so in the following problems. 8.H.2. Find the extremal values of the function h(x, y, z) = x3 + y3 + z3 on the unit sphere S in R3 given by the equation F(x, y, z) = x2 + y2 + z2 — 1 as well as on the circle which is the intersection of this sphere with the plane G(x, y, z) = x + y + z. Solution. First, we will look for stationary points of the function h on the sphere S. Computing the corresponding gradients (for instance, grad h(x, y, z) = (3x2, 3y2, 3z2)), we get the system 0 = 3x2 - 2\x, 0 = 3y2 - 2\y, 0 = 3z2 - 2\z, 0 = x2 + y2 + z2 - 1 consisting of four equations in four variables. Before trying to solve this system, we can estimate how many local con-strianed externa we should anticipate the function to have. Surely, h(P) is in absolute value equal to at most 1, and this happens at all intersection points of the coordinate axes with from the definition of matrix multiplication and its links to linear mappings. □ 8.1.21. Transformation of coordinates. A mapping F : En —> En which has an inverse mapping G : En —> En defined on the entire image of F is 'i^t-^ called a transformation. Such a mapping can be perceived as a change of coordinates. It is usually required that both F and G be (continuously) differentiable mappings. lust as in the case of vector spaces, the choice of "point of view", i.e. the choice of coordinates, can simplify or deteriorate comprehension of the examined object. The change of coordinates is now being discussed in a much more general form than in the case of affine mappings in the fourth chapter. Sometimes, the term "curvilinear coordinates" is used in this general sense. An illustrative example is the change of the most usual coordinates in the plane to polar coordinates. That is, the position of a point P is given by \J x2 + y2 from the origin and the angle its distance r p = arctan(y/a;) between the ray from the origin to it and the a-axis (if x ^ 0). .is. v,/ i ■X/ The illustration shows the the "line" r =

(r cos ip, r sin R2 (for instance, on the domain of all points in the first quadrant except for the points having x = 0): = \Jx2 + y2, p = arctan Consider now the function gt : E2 —> R, with free parameter t e R, g(r, p, t) = sin(r - t) 534 CHAPTER 8. CALCULUS WITH MORE VARIABLES S. Therefore, we are likely to get 6 local extrema. Further, inside every eighth of the sphere given by the coordinate planes, there may or may not be another extremum. The particular quadrants can be easily parametrized, and the function h (considered a function of two parameters) can be analyzed by standard means (or we can have it drawn in Maple, for example). Actually, solving the system (no matter whether algebraically or in Maple again) leads to a great deal of stationary points. Besides the six points we have already talked about (two of the coordinates equal to zero and the other to ±1) and which have A = ±|, there are also the points P± Vs Vs Vs ~3~' ~3~' ~3~ for example, where a local extremum indeed occurs. If we restrict our interest to the points of the circle K, we must give another function G another free parameter 77 representing the gradient coefficient. This leads to the bigger system in polar coordinates. Such a function can approximate the waves on a water surface after a point impulse in the origin at the time t, see the illustration (there, t = —tt/2). While it was easy to define the function in polar coordinates, it would have been much harder to guess with Cartesian coordinates. Compute the derivative of this function in Cartesian coordinates. Using the theorem, dg , . dg , . dr dg , . dp , . -ix,y,t) = -(r,p)-(x,y) + -(r,p)-(x,y) = cos(\/x2 + y2 — i) 3a;2 -2\x - V, 3y2 -2\y -v, 3z2 - 2Xz - V, and, similarly, \A2 + : + 0 x2 + y2 + z2 -1, x + y + z. However, since a circle is also a compact set, h must have both a global minimum and maximum on it. Further analysis is left to the reader. □ dg , . dg , -.dr dg , ,3(3, . -ix,y,t) = -(r,p)-(x,y) + -(r,p)-(x,y) i(^x2 +y2 -t) V \A2 + : 8.H.3. Determine whether the function / : R3 -> R, f(x, y, z) = x2y has any extrema on the surface 2a;2 + 2y2 + z2 = 1. If so, find these extrema and determine their types. Solution. Since we are interested in extrema of a continuous function on a compact set (ellipsoid) - it is both closed and bounded in R3 - the given function must have both a minimum and maximum on it. Moreover, since the constraint is given by a continuously differentiable function and the examined function is differentiable, the extrema must occur at stationary points of the function in question on the given set. We can build the following system for the stationary points: 2a;y = 4fca;, a;2 = 4fcy, 0 = 2kz. 8.1.22. The inverse mapping theorem. If the first derivative of a differentiable univariate function is non-zero, its sign determines whether the func-S'iuNte tion is increasing or decreasing. Then, the function has this property in a neighbourhood of the point in question, and so an inverse function exists in the selected neighbourhood. The derivative of the inverse function /_1 is then the reciprocal value of the derivative of the function / (i.e. the inverse with respect to multiplication of real numbers). Interpreting this situation for a mapping E\ —> E\ and linear mappings R —> R as their differentials, the nonvanish-ing is a necessary and sufficient condition for the differential to be invertible as a linear mapping. In this way, a statement is obtained which is valid for all finite-dimensional spaces in general: 535 CHAPTER 8. CALCULUS WITH MORE VARIABLES This system is satisfied by the points [± , , 0] and [±-^=, —-^g, 0]. The function takes on only two values at these four stationary points. Ir follows from the above that the first and second stationary points are maxima of the function on the given ellipsoid, while the other two are minima. □ Remark. Note that we have used the variable k instead of A from the theorem 8.1.28. 8.H.4. Decide whether the function / : R3 -> R, f(x, y,z) = z — xy2 has any minima and maxima on the sphere x2 + y2 + z2 = 1. If so, determine them. Solution. We are looking for solutions of the system kx = -y2, ky = -2xy, kz = 1. The second equation implies that either y = 0ora; = —■§. The first possibility leads to the points [0,0,1], [0, 0, -1]. The second one cannot be satisfied. Note that because of the third equation k =^ 0 and substituting into the equation of the sphere, we get the equation k2 k2 1 T + T + F=1' which has no solution in real numbers (it is a quadratic equation in k2 with the negative discriminant). The function has a maximum and minimum, respectively, at the two computed points on the given sphere. □ 8.H.5. Determine whether the function / : R3 -> R, f(x, y, z) = xyz, has any externa on the ellipsoid given by the equation g(x,y, z) = kx2 + ly2 + z2 = 1, k, / G R+. If so, calculate them. Solution. First, we build the equations which must be satisfied by the stationary points of the given function on the ellipsoid: dg ,df —— = A— : yz = ZXkx, ox ox dg ,df — = A— : xz = 2Xly, dy dy dg xdf —— = A— : xy = 2\z. Oz Oz The inverse mapping theorem Theorem. Let F : En —> En be a differentiable mapping on a neighbourhood of a point x0 G En, and let the Jacobi matrix D1F(x0) be invertible. Then in some neighbourhood of x0, the inverse mapping F-1 exists, it is differentiable, and its differential at the point F(x0) is the inverse mapping to the differential D^F(x0). Hence, D1 (F~v)(F(x0)) is given by the inverse matrix to the Jacobi matrix of the mapping F at the point x0. Proof. First, verify that the theorem makes sense and , is as expected. If it is supposed that the in-;/ verse mapping exists and is differentiable at F(x0), then differentiating the composite mapping F-1 o F enforces the formula id« D1(F~1 o F)(x0) = D\F~1) o D^ixo), which verifies the formula at the conclusion of the theorem. Therefore, it is known at the beginning which differential for F-1 to find. Next, suppose that the inverse mapping _F_1 exists in a neighbourhood of the point F(x0) and that it is continuous. Since F is differentiable in a neighbourhood of x0, it follows that (1) F(x) - F(x0) - D^Fix^ix - x0) = a(x - x0) with function a : R™ —> 0 satisfying lim^o 'p]fQ(1') = 0. To verify the approximation properties of the linear mapping (_D1_F(a;o))~1, it suffices to calculate the following limit for y = F(x) approaching y0 = F(x0): lim 1 (f-1(y)-P-1(yo)-(fJ1P(a:0))-1(y-y0)). y->y° \\y-yo\\ Substituting (1) for y — yo into the latter equality yields 1 lim x — Xq y^yo \\y - y0\\ - (z)1^))-1^1^)^ - x0) + a(x - xq))^ -1 lim y^yo \\y-y0\ = (D^ixo))-1 lim ■(D'FixoJr^aix-xo)) (-1) ■(a(x - x0)), y^yo \\y - y0\\ where the last equality follows from the fact that linear mappings between finite-dimensional spaces are always continuous. Hence performing this linear mapping commutes with the limit process. The proof is almost finished. The limit at the end of the expression is, using the properties of a, zero if the values \\F(x)—F(x0)\\ are greater than C\\x—x0\\ for some constant 536 CHAPTER 8. CALCULUS WITH MORE VARIABLES We can easily see that the equation can only be satisfied by a triple of non-zero numbers. Dividing pairs of equations and substituting into the ellipse's equation, we get eight solutions, namely the stationary points x = ±y|p V = ^^TJp z = ±-j^. However, the function / takes on only two distinct values at these eight points. Since it is continuous and the given ellipsoid is compact, / must have both a maximum and minimum on it. Moreover, since both / and g are continuously differentiable, these externa must occur at stationary points. Therefore, it must be that four of the computed stationary points are local maxima of the function (of value 3J3kl) and the other four are minima (of value — 3J3kl). □ 8.H.6. Determine the global externa of the function f(x, y) = x2 — 2y2 + Axy — 6a; — 1 on the set of points [x, y] that satisfy the inequalities (1) x > 0, y > 0, y < -x + 3. Solution. We are given a polynomial with continuous partial derivatives on a compact (i. e. closed and bounded) set. Such a function necessarily has both a minimum and a maximum on this set, and this can happen only at stationary points or on the boundary. Therefore, it suffices to find stationary points inside the set and the ones on a finite number of open (or singleton) parts of the boundary, then evaluate / at these points and choose the least and the greatest values. Notice that the set of points determined by the inequalities (1) is clearly a triangle with vertices at [0, 0], [3,0], [0,3]. Let us determine the stationary points inside this triangle as the solution of the equations fx = 0, fy = 0. Since fx{x,y) = 2a; + 4y-6, fy(x,y) =4x - Ay, these equations are satisfied only by the point [1,1]. The boundary suggests itself to be expressed as the union of three line segments given by the choice of pairs of vertices. First, we consider x = 0, y G [0,3], when j(x,y) = — 2y2 — 1. However, we know the graph of this (univariate) function on the interval [0,3] It is thus not difficult to find the points at which global externa occur. They are the marginal points [0, 0], [0,3]. Similarly, we can consider y = 0, x G [0, 3], also obtaining the marginal points [0, 0], [3, 0]. Finally, we get to the line segment y = —a; + 3, a; G [0,3]. Making some rearrangements, we get f(x, y) = f(x, -x + 3) = -5a;2 + 18a; - 19, xe [0, 3]. C > 0. This can be translated in terms of the inverse as C\\F-1(y)-F-1(y0)\\ < \\y - y0||, i.e. \\F-1(y)-F-1(y0)\\ 0. This is Lipschitz continuity, which is a stronger property than _F_1 being continuous. So, now it remains "merely" to prove the existence of a Lipschitz-continuous inverse mapping to the mapping F. To simplify, reduce the general case slightly. Especially, y-without loss of generality, apply shifts of the S? Ill', coordinates by constant vectors. In particular, {a/TIT^ it can be assumed that x0 = 0 G Rn, yo = _jszL— F(x0) = 0 G Rn. So assume this property of the mapping F. Further, composing the mapping F with any linear mapping G yields a differentiable mapping again, and it is known how the differential changes. The choice G(y) = (D1£(0))"1(y) gives D1(G o F)(0) = idK„ and thus we may assume that D1F(0) = idK„ . With these assumptions, consider the mapping K(x) = F(x) — x. This mapping is also differentiable, and its differential at 0 is zero. It is already known that each continuously differentiable mapping is Lipschitz continuous over every (^-neighbourhood Us of the origin (in the its domain), \\K(x)-K(y)\\ R, f(x,y,z) = y2z has any externa on the line segment given by the equations 2x + y + z = 1, x — y + 2z = 0 and the constraint x G [—1,2]. If so, find these externa and determine their types. Justify all of your decisions. Solution. We are looking for the externa of a continuous function on a compact set. Therefore, the function must have both a minimum and a maximum on this set, and this will happen either at the marginal points of the segment or at those where the gradient of the examined function is a linear combination of the gradients of the functions that give the constraints. First, let us look for the points which satisfy the gradient condition: 0 = 2k + I, 2yz = k — /, y2 = k + 2l, 2x + y + z = 1, x — y + 2z = 0. The solution of the system is [x,y,z] = [|,0, —^] and [x,y,z] = [|, |, — i] (of course, the variables k and / can also be computed, but we are not interested in them). The marginal points of the given line segment are [—1,|,|] and [2, — |,—|]. Considering these four points, the function takes on the greatest value at the first marginal point (f(x, y, z) = ^), which is its maximum on the given segment, and it takes the least value at the second marginal point (f(x, y,z) = — 1^), which is thus its minimum there. □ It could seem that the proof is complete, but this is not so. To finish, it is necessary to show that the mapping F restricted to a sufficiently small neighbourhood Us is not only bijective onto its image, but also that it maps open neighbourhoods of zero onto open neighbourhoods of zero.2 Decrease the latter neighbourhood U = Us so that the above estimates are true for the boundary of U as well and at the same time the Jacobi matrix of the mapping is invertible on all of U. This can be done since the determinant is a continuous mapping. Let B denote the boundary of the set U, that is, the corresponding sphere. Since B is compact and F is continuous, the function p{x) = \\F{x)\\ achieves both the maximum and the minimum on B. Denote a = \ minxGB p(x) and consider any y G Oa(0) fixed. Of course, a > 0 because x = 0 is the only point with F(x) = 0 within Us. It is necessary to show that there is at least one x e U such that y = F(x), which completes the proof of the inverse mapping theorem. For this purpose, consider the function (y is a fixed point) h(x) = \\F(x)-y\\2. Again, the image h(U) U h(B) must have a minimum. This minimum cannot occur for x e B. Notice that F(0) = 0, hence h(0) = \\y\\ < a. At the same time, the distance of y from F(x) for x G B is at least a for all y G Oa(0) (since a is selected to be half the minimum of the magnitude of F(x) on the boundary). Therefore, the minimum occurs inside U, and it is a stationary point z of the function h. Fixing such z, means that for all j = 1,..., n, dh ~dx~i {z) = Yj2(jl{z)-yl)^{z) = Q This is a system of linear equations with variables & = fi (z) — yi and coefficients given by twice the Jacobi matrix D1F(z). In particular, for z G U, such a system has a unique solution, and this is zero since the Jacobi matrix is invertible. In this way the desired point x = z G U is found, satisfying, for alH = 1,... ,n, the equality fi(z) = yi, i.e., F(z) =y. □ 8.1.23. The implicit functions. The next goal is to employ the inverse mapping theorem for clarifying the properties of implicitly defined functions. To ^ start, consider a differentiable function F (x, y) defined in the plane E2, and look for those points (x, y) where F(x,y) = 0. In the literature, there are examples of mappings which continuously and bijectively map a line segment onto a square. So this is not an obvious requirement. 538 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.H.8. Find the maximal and minimal values of the polynomial p(x, y) = 4a;3 — 3a; — 4y3 + 9y on the set M = {[x,y] £ R2; a;2 + y2 < 1} . Solution. This is again the case of a polynomial on a compact set; therefore, we can restrict our attention to stationary points inside or on the boundary of M and the "marginal" points on the boundary of M. However, the only solutions of the equations px(x,y) = 12a;2 -3 = 0, py(x,y) = -12y2 + 9 = 0 are the points [i vll [i _vll Li vll Li _vll [2' 2 J ' [2' 2 J ' [ 2' 2 J ' [ 2' 2 J ' which are all on the boundary of M. This means that p has no extrémům inside M. Now, it suffices to find the maximum and minimum of p on the unit circle k : x2 + y2 = 1. The circle k can be expressed parametrically as x = cost, y = siní, t £ [—tt,tt]. Thus, instead of looking for the extrema of p on M, we are now seeking the extrema of the function fit) := p(cost,siní) = 4cos31 — 3cost — 4sin31 + 9siní on the interval [—tt, tt]. For t £ [—tt, tt], we have /'(í) = -12cos2ísiní + 3siní - 12sin2ícosí + 9cosí, In order to determine the stationary points, we must express the function /' in a form from which we will be able to calculate the intersection of its graph with the a>axis. To this purpose, we will use the identity —kr- = 1 + tg21, cos2 t ° ' which is valid provided both sides are well-defined. We get /'(f) = cos3f[-12tgf + 3(tgf+ tg3f) -12tg2í + 9(l + tg2í)] for t £ [—tt, tt] with cos t ^ 0. However, this condition does not exclude any stationary points since siní ^ 0 if cos t = 0. Therefore, the stationary points of / are those points t £ [—tt, tt] for which -4 tg t + tg t + tg31 - 4 tg21 + 3 + 3 tg21 = 0. The substitution s = tg t leads to s3 - s2 - 3s + 3 = 0, i. e. (s - 1) (s - \/3) (s + a/3) = 0. Then, the values An example of this can be the usual (implicit) definition of straight lines and circles: F(x, y) = ax + by + c = 0, a, b, c £ R F(x, y) = (x- s)2 + (y - t)2 - r2 = 0, r > 0. While in the first case, the relation between the quantities x and y can be expressed as the function (for b ^ 0) y = f(x) = --x--b for all x; in the other case, for any point (xq, yo) satisfying the equation of the circle and such that yo ^ t (these are the marginal points of the circle in the direction of the coordinate x), There is a neighbourhood of the point x0 in which either V = f(x) = t + \/(a;-s)2-r, or V = f(x) = 1 - \J(x-s)2 -r, according to whether (a;0, yo) belongs to the upper or lower semicircle. If a diagram of the situation is drawn, the reason is clear: describing both the semicircles simultaneously by a single function y = j(x) is not possible. The boundary points of the interval [s — r,s + r] are even more interesting. They also satisfy the equation of the circle with y = t, yet Fy (s±r, t) = 0, which describes the position of the tangent line to the circle at these points, parallel to the y-axis. There are no neighbourhoods of these points in which the circle could be described as a function y = j(x). Moreover, the derivatives of the function y = j(x) = t+ \J(x — s)2 — r2 can be easily expressed in terms of partial derivatives of the function F: 1 2{x — s) x — s Fx fix) = 2 yV2 - (x - s)2 y-t Fv If the roles of the variables x and y are interchanged and a relation x = /(y) such that F(f(y),y) = 0 is sought, then neighbourhoods of the points (s ± r, t) are obtained with no problem. Notice that the partial derivative Fx is non-zero at these points. So it is observed (though for only two examples): for a function F(x, y) and a point (a, 6) £ E2 such that F(a,b) = 0, there is the unique function y = j(x) satisfying F(x,j(x)) = 0 on some neighbourhood of x if Fy(a,b) 7^ 0. In this case,/'(a) = —Fx(a,b)/Fy(a,b) can even be computed. We prove that in fact, this proposition is always true. The last statement about derivatives can be remembered (and is quite comprehensible if things are properly understood) from the expression for the differential of the (constant) function g(x) = F(x, y(x)) and the differential dy = f'(x)dx Q = dg = Fxdx + Fydy = (Fx + Fyf(x))dx. One can work analogously with the implicit expressions F(x,y,z) = 0, to look for a function g(x,y) such that F(x, y, g(x, y)) = 0. As an example, consider the function 539 CHAPTER 8. CALCULUS WITH MORE VARIABLES s = 1, s = V3, s = — \/3 respectively correspond to t e {-§tt, ±tt}, t€{-fir,iir}, t e {-§ tt, § tt}. Now, we evaluate the function / at each of these points as well as at the marginal points t = — tt, t = 7r. Sorting them, we get / (-± tt) = -1 - 3^ < / (-| tt) = -3y/2 < /(-§*■) =1-3v/3<-1, / (-7T) = / (tt) = -1< 0, / (|^) = 1 + 3^ > / (Itt) = 3^ > / (±tt) = -l + 3\/3> 0. Therefore, the global minimum of the function / is at the point t = —7r/3 , while the global maximum is at t = 27r/3. Now, let us get back to the original function p. Since we know the values cos (—i 7r) = i, sin (— 17r) = —-^r, cos (| 7r) = — |, sin (| 7r) = ^tt, we can deduce that the polynomial p takes on the minimal value —1—3^ (the same as /, of course) at the point [1/2, — v/3/2] and the maximal value 1 + 3^ at [-1/2, y/3/2]. □ 8.H.9. At which points does the function f(x,y) =x2 -Ax + y2 take on global externa on the set M : \ x\ + \ y\ < 1? Solution. Expressing / in the form f(x,y) = (x-2)2-A + y2, we can see that the global maximum and minimum occur at the same points as for the function g(x, y) := ^(x - 2)2 + y2, [x, y] G M, since neither shifting the function nor applying the increasing function v = ^pu for u > 0 changes the points of externa (of course, they can change their values). However, we know that the function g gives the distance of a point [x, y] from the point [2,0]. Since the set M is clearly a square with vertices [1,0], [0,1], [-1,0], [0,-1], the point of M that is closest to [2,0] is the vertex [1,0], while the most distant one is [—1,0]. Altogether, we have obtained that the minimal value of / occurs at the point [1,0] and the maximal one at [—1, 0]. □ 8.H.10. Compute the local externa of the function y = j(x) given implicitly by the equation 3x2+ 2xy + x = y2+ 3y + \, [x,y] e K2\{[i,i-|];ieIR}. f(x, y) = x2 + y2, whose graph is the rotational paraboloid centered at the point (0, 0). This can be defined implicitly by the equation 0 = F(x,y, z) = z Before formulating the result for the general situation, notice which dimensions could/should appear in the problem. If it is desired to find, for this function F, a curve c(x) = (ci (x), c2(x)) in the plane such that F(x,c(x)) = F(x,c1(x),c2(x)) = 0, then this can be done (even for all initial conditions x = a), yet the result is not unique for a given initial condition. It suffices to consider an arbitrary curve on the rotational paraboloid whose projection onto the first coordinate has a non-zero derivative. Then consider x to be the parameter of the curve, and c(x) to be its projection onto the plane yz. Therefore, it is expected that one function of m + 1 variables defines implicitly a hypersurface in Rm+1 which is to be expressed (at least locally) as the graph of a function of m variables. It can be anticipated that n functions of m + n variables define an intersection of n hypersurfaces in Rm+n, which is expected as an "m-dimensional" object. 8.1.24. The general theorem. Consider a differentiable mapping F=(fi,...,fn) sm+n The lacobi matrix of this mapping has n rows and m + n columns. Write it symbolically as /Ml D1F ■ Ml -Mi. dxm dxm+ i Ml \ dx-i = (DlF,DlF), M \ 9fn , dxm+n / is written as (x, y) e where (xi,... ,xm+n) e R™, D\F is a matrix of n rows and the first m columns in the lacobi matrix, while DyF is a square matrix of order n, with the remaining columns. The multidimensional analogy to the previous reasoning with the non-zero partial derivative with respect to y is the condition that the matrix DyF is invertible. 540 CHAPTER 8. CALCULUS WITH MORE VARIABLES Solution. In accordance with the theoretical part (see 8.1.24), let us denote F(x, y) = 3x2 + 2xy + x - y2 - 3y - §, [x,y] £R2\{[i,i-|];i£lR} and calculate the derivative m' - f'M - F*(X'V) - 6x+2;/+l y — J — Fv{x,y) ~ 2x-2y-3' We can see the this derivative is continuous on the whole set in question. In particular, the function / is denned implicitly on this set (the denominator is non-zero). A local extremum may occur only for those x, y which satisfy y' = 0, i. e., Qx + 2y + 1 = 0. Substituting y = —3x—1/2 into the equation F(x,y) = 0, we obtain—12a;2 + Qx = 0, which leads to [x,y]=[0,-±], [x,y]=[±,-2]. We can also easily compute that ,," - f„'Y - (fi+2yr)(2x-2y-i)-(f>x+2y+l)(2-2yr) y - \y ) - (2x-2y-3)2 Substituting x = 0, y = -1/2, y' = 0 and x = 1/2, y = -2, y' = 0, we obtain and y I, _ 6(+2)-0 < 0 for [x, y] We have thus proved that the implicitly given function has a strict local minimum at the point x = 0 and a strict local maximum at x = 1/2. □ 8.H.11. Find the local externa of the function z = f(x, y) given on the maximum possible set by the equation (1) a;2 + y2 + z2 - xz - yz + 2x + 2y + 2z - 2 = 0. Solution. Differentiating (1) with respect to x and y gives 2a; + 2zzx — z — xzx — yzx + 2 + 2zx = 0, 2y + 2zzy — xzy — z — yzy + 2 + 2zy = 0. Hence we get that z -2a;-2 (2) zx = fx(x,y) = fy(x,y) 2z-x-y + 2 z -2y-2 *y-jyy-,a/ 2z-x-y + 2-We can notice that the partial derivatives are continuous at all points where the function / is denned. This implies that the local externa can occur only at stationary points. These points satisfy zx = 0, i. e. z — 2x — 2 = 0, zy = 0, i. e. z — 2y — 2 = 0. The implicit mapping theorem Theorem. Let F : Rm+n -> Rn be a differentiable mapping in an open neighbourhood of a point (a, b) G Rm x K" = R™+™ at which F(a,b) = Q, and det DyF ± 0. Then there exists a differentiable mapping G : Rm —> Rn defined on an neighbourhood U of the point a G Rm with image G(U) which contains the point b and such that F(x,G(x)) = 0 for all x G U. Moreover, the Jacobi matrix D1G of the mapping G is, in the neighbourhood of the point a, given by the product of matrices D1G(x) = -(D>)"1(a;,G(a;)) ■ DlF{x,G{x)). Proof. For the sake of comprehensibility, first show the proof for the simplest case of the equation F(x, y) = 0 with a function F of two variables. E'kC^f At first sight, it might look complicated, but this situation can be discussed in a way which can be extended for the general dimensions as in the theorem, almost without changes. Extend the function F to F : (x,y) ^ (x,F(x,y)). The Jacobi matrix of the mapping F is 1 0 Fx{x,y) Fy{x,y) It follows from the assumption Fy (a, b) =^ 0, that the same also holds in a neighbourhood of the point (a,b), so the function F is invertible in this neighbourhood, by the inverse mapping theorem. Therefore, there is a uniquely denned differentiable inverse mapping f-1 in a neighbourhood of the point (a,0). Denote by it : R2 —> R the projection onto the second coordinate, and consider the function f(x) = it o f~1(x, 0). This function is well-defined and differentiable. It must be verified that the expression f{x, f{x)) = f{x^{f-\x,0))) is zero in a neighbourhood of the point x = a. It follows directly from the definition of f(x, y) = (x, f(x, y)) that its inverse is of the form f-1 (x, y) = (x,ttf~1(x,y)). Therefore, the previous calculation can be resumed: F(x,f(x)) = ir(F(x,ir(F-1(x,0)))) = tt(F(F-1(x,0))) = tt(x,0) = 0. This proves the first part of the theorem, and it remains to compute the derivative of the function f(x). This derivative can, once again, be obtained by invoking the inverse mapping theorem, using the matrix (D1^)-1. 541 CHAPTER 8. CALCULUS WITH MORE VARIABLES We have thus two equations, which allow us to express the dependency of x and y on z. Substituting into (1), we obtain the points [x,y,z] = [-3 + ^,-3 + ^,-4 + 2^] , [x,y,z] = [-3-^,-3-^,-4-2^] . Now, we need the second derivatives in order to decide whether the local extrema really occur at the corresponding points. Differentiating zx in (2), we obtain y - f (t i,\ - (^-2)(2z-x-ty+2)-(z-2x-2)(2zx-l) Zxx — Jxx(X, y) — (2z-x-y+2Y ' with respect to x, and _ f I \ _ zy(2z-x-y+2)-(z-2x-2)(2zy-l) Zxy - Jxy(X,y) - (2z-x-y+2)2 °xy J xy \ with respect to y. We need not calculate zyy since the variables x and y are interchangeabel in (1) (if we swap x and y, the equation is left unchanged). Moreover, the x- and y-coordinates of the considered points are the same; hence zXx = Zyy. Now, we evaluate that at the stationary points: fxx (-3 + VG, -3 + VG) = fyy (-3 + VG, -3 + VG) = __i_ fxy (-3 + VG, -3 + VG) = fyx (-3 + VG, -3 + VG) = 0, fxx (-3 -y/6,-3-y/6)= fyy (-3 -y/6,-3-V$) = 1 \/6' fxy (-3 - VG, -3 - VG) = fyx (-3 -VG,-3-Vg) = 0. As for the Hessian, we have Hf{-3+VG,-3 + VG)=[f i— 0 Apparently, the first Hessian is negative definite, while the second one is positive definite. This means that there is a strict local maximum of the function / at the point [—3 + VG, —3 + Vg], and there is a strict local minimum at the point [-3 - VG, -3 - Vg\ . □ 8.H.12. Determine the strict local extrema of the function f(x,y) = ± + ±, x^0,y^0 on the set of points that satisfy the equation \ + \ = 4. X y Solution. Since both the function / and the function given implicitly by the equation ^ + — 4 = 0 have continuous The following equality is easily verified by multiplying the matrices. It can also be computed directly using the explicit formula for the inverse matrix in terms of the determinant and the algebraically adjoint matrix, see paragraph 2.2.11 1 0 Yl (F (x y))-i ( Fy(x,y) Fx(x,y) Fy(x,y)J ^ \-Fx(x,y) I, By the definition f(x) = 7r_F_1(a;,0), and thus the first entry of the second row of this matrix is the derivative f'(x) with y = f(x), i.e. the lacobi matrix D1f. In this simple case, it is exactly the desired scalar — Fx(x, f(x))/Fy(x, f(x)). The general proof is exactly the same, there is no need to change any of the formulae. We obtain the in-vertible mapping F : Rm+n ->• Rm+n and define G(x) = ttF-^x, 0), where tt : Rm+n -> Rn, tt(x, y) = y. The same check as above reveals that F(x, G(x)) = 0) as requested. Only in the last computation of the derivative of the function do the corresponding parts of the lacobi matrix DXF and DyF appear, instead of the particular partial derivatives. For the calculation of the lacobi matrix of the mapping G, use the computation of the inverse matrix. This time the algebraic procedure from paragraph 2.2.11 is not very advantageous. It is better to be guided by the case in dimension m + n = 2 and to divide the matrix (D1F~ idn 0 DlF(x,y) DlF(x,y) into blocks of m and n rows and columns (for instance A is of type m x m, while C is of type n x m). Now, the matrices A, B, C, D can be determined from the defining equality for the inverse: idKm 0 DlF(x,y) DlF{x,y) A B C D iK- 0 0 idK' Apparently, it follows that A = idKm, B = 0, D = (DyF)-1, and finally, D\F + D\F ■ C = 0. The latter equality implies already the desired relation D1G = C = —(DyF)-1 ■ D\F. This concludes the proof of the theorem. □ 8.1.25. The gradient of a function. As seen in the pre-(-£-£)=-^ Now, we will demonstrate a quicker way how to obtain the result. We know (or we can easily calculate) the second partial derivatives of the function L, i. e., the Hessian with respect to the variables x and y: 2__6A HL{x,y)= ( -3 0 -4 0 2__6A y3 yA The evaluation The vector D1F e K™ is called the gradient of the function F. In technical and physical literature, it is also often denoted as grad F. Since M(, is given by a constant value of the function F, the derivatives of the curves lying in M have the property that the differential dF always evaluates to zero along them. For every such curve, F(c(t)) = b, hence jtF{c{i)) = dF(c'(t)) = 0. On the other hand, we can consider a general vector v = (iii,..., n„) G R™ and the magnitude of the corresponding directional derivative 9f df \dvF\ = -—iii H-----h -—ii„ = cosi^HD^IIIIiill, ax i axn where tp is the angle between the directions of the vector 11 and the gradient F, see the discussion about angles of vectors and straight lines in the fourth chapter (cf. definition 4.1.18). b ť v / Zdebysehodil Thus is observed: obrázek, napr. obr. 2 The maximal growth of a function Proposition. The gradient D1F = ■ ■ ■, Jj-) pro- vides the directions of maximal growth of the function F of n variables. Moreover, the vanishing directional derivatives are exactly those in directions perpendicular to the gradient. Therefore, it is clear that the tangent plane to a non-empty level set Mb in a neighbourhood of its point with non-zero gradient D1F is determined by the orthogonal complement to the gradient, and the gradient itself is the normal vector of the hypersurface Mb. For instance, considering a sphere in R3 with radius r > 0, centered at (a, b, c), i.e. given implicitly by the equation F(x, y, z) = {x- a)2 + (y - b)2 + (z - c)2 = r2, The normal vectors at a point P = (xq, yo, zq) are obtained as a non-zero multiple of the gradient, i.e. a multiple of D1F=(2(x0-a),2(y0-b),2(z0-c)), and the tangent vectors are exactly the vectors perpendicular to the gradient. Therefore, the tangent plane to a sphere at the point P can always be described implicitly in terms of the gradient by the equation 0 = (x0-a)(x-x0) + (y0-b)(y-y0) + (z0-c)(z-z0). This is a special case of the following general formula: 543 CHAPTER 8. CALCULUS WITH MORE VARIABLES HL V2 V2 -2^2 0 0 -2V2 HL\-A-^ 1 2 ' 2 2V2 0 0 2V2 then tells us that the quadratic form is negative definite for the former stationary point (there is a strict local maximum) and positive definite for the latter one (there is a strict local minimum). We should be aware of a potential trap in this "quicker" method in the case we obtain an indefinite form (matrix). Then, we cannot conclude that there is not an extremum at that point since as we have not included the constraint (which we did when computing d2 L), we are considering a more general situation. The graph of the function / on the given set is a curve which can be defined as a univariate function. This must correspond to a one-dimensional quadratic form. □ 8.H.13. Find the global externa of the function f(x,y) = ± + ±, x^0,y^0 on the set of points that satisfy the equation \ + \ = 4. X y Solution. This exercise is to illustrate that looking for global externa may be much easier than for local ones (cf. the above exercise) even in the case when the function values are considered on an unbounded set. First, we would determine the stationary points (1) and the values (2) the same way as above. Let us emphasize that we are looking for the function's externa on a set that is not compact, so we will not do with evaluating the function at the stationary points. The reason is that the function / may not have an extremum on the considered set - its range might be an open interval. However, we will show that this is not the case here. Let us thus consider x > 10. We can realize that the equation ^ + = 4 can be satisfied only by those values y for which y > 1/2. We have thus obtained the bounds -2V2 < - 2 < f(x,y) < i + 2 < 2^2, if x I > 10. At the same time, we have (interchanging x and y leads to the same task) -2V2 < - 2 < f(x,y) < i + 2 < 2^2, if y I > 10. Hence we can see that the function / must have global externa on the considered set, and this must happen inside Tangent hyperplanes to level sets Theorem. For a function F'(xi,... ,xn) of n variables and a point P = (pi,... ,pn) in a level set Mf, of the function F such that the gradient D1F is non-vanishing at P, the implicit equation for the tangent hypersurface to Mf, is 0 df - Pi) + ■ ■ ■ + -^-(P)(xn - Pn) Proof. The statement is clear from the previous discussions. The tangent hyperplane must be (n — l)-dimensional, so its direction space is given as the kernel of the linear form given by the gradient (zero values of the corresponding linear mapping R™ —> R given by multiplying the column of coordinates by the row vector grad F). Clearly, the selected point P satisfies the equation. □ 8.1.26. Illumination of 3D objects. Consider the illumination of a three-dimensional object where the direction v of the light falling onto the two-dimensional surface M of this object is known. Assume M is given implicitly by an equation F{x,y,z) = 0. The light intensity at a point P G Mis defined as I cos ip, where p is the angle between the normal line to M and the vector which is opposite to the flow of the light. As seen, the normal line is determined by the gradient of the function F. The sign of the expression then says which side of the surface is illuminated. For example, consider an illumination with constant intensity To in the direction of the vector v = (1,1, —1) (i.e. "downward askew"), and let the ball given by the equation F(x, y,z) = x2 + y2 + z2 — 1 < 0 be the object of interest. Then, for a point P = (x, y,z) e M on the surface, the intensity grad F ■ v —2x — 2y + 2z I(P) -In -In |gradF|||Hru 2V3 is obtained. Notice that, as anticipated, the point which is illuminated with the (full) intensity Io is the point P = ^(—1,—1,1) on the surface of the ball, while the antipodal point is fully illuminated with the minus sign (i.e. on the zaebysemozna .j c ^ , x hodilo i neco vie, aspon mside of the sphere). obrazek, neco Jai0 ^ 3 8.1.27. Tangent and normal spaces. Ideas about tangent and normal lines can be extended to general di-mensions. With a mapping / : Rm+n ->• Rn, sr=2g£=-^ and coordinate functions one can also consider the n equations for n + m variables fi (xl, - - - , X7n^.7l) b{, i 1,...,72, expressing the equality F(x) = & for a vector b e Rn. Assuming that the conditions of the implicit function theorem hold, the set of all solutions (xi,..., xm+n) e Rm+rl is (at least locally) the graph of a mapping G : Rm —> R™. Technically, it is necessary to have some submatrix in D1F of the maximal possible rank n. 544 CHAPTER 8. CALCULUS WITH MORE VARIABLES the square ABCD with vertices A = [-10,-10], B = [10, -10], C = [10,10], D = [-10,10]. The intersection of the "hundred times reduced" square with vertices at I = [—1/10, —1/10], B = [1/10,-1/10], C = [1/10,1/10], D = [-1/10,1/10] and the given set is clearly the empty set. Therefore, the global extrema are at points inside the compact set bounded by these two squares. Since / is continuously differentiable on this set, the global extrema can occur only at stationary points. We thus must have f - f ±H\ - o,/2 f ■ - f - Jmax — J I 2 ' 2 / — * ^: Jmm — J I 2 ' 2 1 — -2V2. □ 8.H.14. Determine the maximal and minimal values of the function f(x, y, z) = xyz on the set M given by the conditions x2+y2 + z2 = l, x + y + z =0. Solution. It is not hard to realize that M is a circle. However, for our problem, it is sufficient to know that M is compact, i. e. bounded (the first condition of the equation of the unit sphere) and closed (the set of solutions of the given equations is closed since if the equations are satisfied by all terms of a converging sequence, then it is satisfied by its limit as well). The function / as well as the constraint functions F(x,y,z) = x2 + y2 + z2 — 1, G(x,y,z) = x + y + z have continuous partial derivatives of all orders (since they are polynomials). The Jacobi constraint matrix is 'Fx(x,y,z) Fy(x,y,z) Fz(x,y,zY KGx(x,y,z) Gy(x,y,z) Gz(x,y,z)^ <2x 2y 2z" 1 1 1 Its rank is reduced (less than 2) if and only if the vector (2x, 2y, 2z) is a multiple of the vector (1,1,1), which gives x = y = z, and thus x = y = z = 0 (by the second constraint). However, the set M does contain the origin. Therefore, we may look for stationary points using the method of Lagrange multipliers. For L(x,y,z,X1,X2) = xyz — Ai (x2 + y2 + z2 — l) — A2 (x + y + z), the equations Lx = 0, Ly = 0, Lz = 0 give yz - 2\ix - A2 = 0, xz - 2\iy - A2 = 0, For a fixed choice b = (&i,..., bn), the set of all solutions is, of course, the intersection of all hypersurfaces M(pi, f) corresponding to the particular functions The same must hold for tangent directions, while normal directions are generated by the individual gradients. Therefore, if D1F is the Jacobi matrix of a mapping which implicitly defines a set M and P = (p±,... ,pm+n) e M is a point such that M is the graph of a mapping in the neighbourhood of the point P, (Mi. dxi D1F : , Ml \dxi dxr, _9k dxr, -J then the affine subspace in Rm+n which contains exactly all tangent lines going through the point P is given implicitly by the following equations: 0 = g^j- (P)(xl — Pi) H-----h (P) (Xm+n - Pm+n) 0 dfn dx1 (P)(x1-Pl) + . 9fn + -WZ-{P){xm+n OXn ' Pm-\-n) This subspace is called the tangent space to the (implicitly given) m-dimensional surface M at the point P. The normal space at the point P is the affine subspace generated by the point P and the gradients of all the functions /i,..., /„ at the point P, i.e. the rows of the Jacobi matrix D1F. As an illustrative simple example, calculate the tangent and normal spaces to a conic section in R3. Consider the equation of a cone with vertex at the origin, 0 = f(x, y,z) = z- \Jx2 + y2, and a plane, given by 0 = g(x, y, z) = z — 2x + y + 1. The point P = (1, 0,1) belongs to both the cone and the plane, so the intersection M of these surfaces is a curve (draw a diagram!). Its tangent line at the point P is given by the following equations: 0 : 1 2^x2 + y2 1 2x (x-1) x=l,y=0 ?2y y + l- (2-1) 2sjx2 + y2 x=i,y=o = —x + z 0 = -1(x - 1) + y + (z - 1) = -2x + y + z + l, while the plane perpendicular to the curve, containing the point P, is given parametrically by the expression (1,0,1)+t(-1,0,1)-I-<7(-2,1,1) with real parameters r and a. 545 CHAPTER 8. CALCULUS WITH MORE VARIABLES xy - 2Xxz - A2 = 0, respectively. Subtracting the first equation from the second one and from the third one leads to xz — yz — 2X\y + 2\\x = 0, xy — yz — 2\\z + 2Xix = 0, i. e., (x-y) (s + 2Ai) = 0, (x-z) (y + 2Ai) = 0. The last equations are satisfied in these four cases: x = y,x = z; x = y,y = -2Xx; z = —2Xi,x = z\ z = — 2Xi,y = —2Ai, thus (including the constraint G = 0) x = y = z = 0; x = y = —2\\,z = 4Ai; x = z =— 2Ai,y = 4Ai; a; = 4Ai, y = z = — 2Ai. Except for the first case (which clearly cannot happen), including the constraint F = 0 yields (4A1)2 + (-2A1)2 + (-2A1)2 = l, i. e. \i = ±£i Altogether, we get the points ■2%/6' 1__1_ _2_ \/6' \/6' VE\ ' 1 1__2_] %/6' %/6' %/6 J ' 1 2__]_ V6' \/6' VeJ ' 1__2_ 1 1 %/6' %/6' s/6} ' 2__]___1_ 2 11 We will not verify that these really are stationary points. The only important thing is that all stationary points are among these six. We are looking for the global maximum and minimum of the continuous function / on the compact set M. However, the global externa (we know they exist) can occur only at points of local externa with respect to M. And the local externa can occur only at the aforementioned points. Therefore, it suffices to evaluate the function / at these points. Thus we find out that the wanted maximum is 71 V6' Vg'Vg while the minimum is 71 Ve'Ve' Vg 8.1.28. Constrained extrema. Now we come with the first really serious application of the differential calculus of more variables. The typical task in optimization is to find the extrema of values depending on several (yet finitely many) parameters, under some further constraints on the parameters of the model. The problem often has m + n parameters constrained by n conditions. In the language of differential calculus, it is desired to find the extrema of a differentiable function h on the set M of points given implicitly by a vector equation F(x1,..., xm+n) = 0. Of course, we might first locally parameterize the solution space of the latter equation by m free parameters, express the function h in terms of these parameters and look for the local extrema by the inspecting the critical points. However, we have already prepared more efficient procedures for this effort. For every curve c(t) c M going through P = c(0), it must be ensured that h(c(t)) is an externum of this univariate function. Therefore, the derivative must satisfy jh(c(t))\t=0 = dc,(0)h(P) = dh(P)(c'(0)) = 0. This means that the differential of the function h at the point P is zero along all tangent increments to M at P. This property is equivalent to stating that the gradient of h lies in the normal subspace (more precisely, in the modelling vector space of the normal subspace). Such points P e M are called stationary points of the function h with respect to the constraints given byF. As seen in the previous paragraph, the normal space to the set M is generated by the rows of the lacobi matrix of the mapping F, so the stationary points are described equiv-alently by the following proposition: Lagrange multipliers 2 76' 1 1 7i 1 1 2 7i'7i'~7i 1 2 1 Ti'^Ti'Ti ■/(-- - -71 Vg'Vg'Vg l 3^' □ Theorem. Let F = (fu...Jn) : Rm+n ->• Rn be a differentiable function in a neighbourhood of a point P, F(P) = 0. Further, let M be given implicitly by an equation F(x,y) = 0, and let the rank of the matrix D1F at the point P be n. Then P is a stationary point of a continuously differentiable function h : Rm+n —> R with respect to the constraints F, if and only if there exist real parameters Ai,..., Xn such that grad /i = Ai grad /H-----h Xn grad /„. The procedure suggested by the theorem is called the method of Lagrange multipliers. It is of algo-7 rithmic character. Consider the numbers of unknowns and equations: the gradients are vectors of m + n coordinates, so the statement of the theorem yields m + 7i equations. The variables are, on one side, the coordinates xi,..., xm+n of the stationary points P with respect to the constraints, and, on the other hand, the n parameters Xi in the linear combination. It remains to say that the point P belongs to the implicitly given set M, which represents n 546 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.H.15. Find the extrema of the function / : R3 -> R, f(x, y, z) = x2 + y2 + z2, on the plane x + y — z = 1 and determine their types. Solution. We can easily build the equations that describe the linear dependency between the normal to the constraint surface and the examined function: x = k, y = k z = —k, k G R. The only solution is the point |, — |]. Further, we can notice that the function is increasing in the direction of (1,-1,0), and this direction lies in the constraint plane. Therefore, the examined function has a minimum at this point. Another solution. We will reduce this problem to finding the extrema of a two-variable function on R2. Since the constraint is linear, we can express z = x + y — 1. Substituting this into the given function then yields a real-valued function of two variables: f(x,y) = x2 + y2 + (x + y — l)2 = 2x2 + 2xy + y2 — 2x — 2y + 1. Setting both partial derivatives equal to zero, we get the linear equation Ax + 2y - 2 = 0, Ay + 2x - 2 = 0, whose only solution is the point [|, |]. Since it is a quadratic function with positive coefficients at the unknowns, it is unbounded on R2. Therefore, there is a (global) minimum at the obtained point. Then, we can get the corresponding point [|, |, — i] in the constraint plane from the linear dependency of z. □ 8.H.16. Find the extrema of the function x + y : R3 —> R on the circle given by the equations x + y + z = 1 and x2 + y2 + z2 = 4. Solution. The "suspects" are those points which satisfy (1,1, 0) = k ■ (1,1,1) + /- (x, y, z), k, I e R. Clearly, x = y(= 1 //). Substituting this into the equation of the circle then leads to the two solutions 1 \/22 1 V22 1 V22 3 IT'3 ~G~ '3=F_3_ Since every circle is compact, it suffices to examine the function values at these two points. We find out that there is a maximum of the considered function on the given circle at the former point and a minimum at the latter one. □ 8.H.17. Find the extrema of the function / : R3 -> R, f(x, y, z) = x2 + y2 + z2, on the plane 2x + y — z = 1 and determine their types. O further equations. Altogether, there are 2n + m equations for 2n + m variables, so it can be expected that the solution is given by a discrete set of points P (i.e., each one of them is an isolated point). Very often, the system of equations is a seemingly simple system of algebraic equations, but in fact only rarely can it be solved explicitly. We return to special algebraic methods for systems of polynomial equations in chapter 12. There are also various numerical approaches to such systems. Theoretical details are not discussed here, but there are several solved examples in the other column, including also the illustration of how to use the second derivatives to decide about the local extrema under the constraints. 8.1.29. Arithmetic mean versus geometric mean. As an example of practical application of the Lagrange multipliers, we prove the inequality 1, , ,_ — (xi H-----\-xn) > {Vx1 ■••!„ 71 for any n positive real numbers x1,..., xn. Equality occurs if and only if all the x{'s are equal. Consider the sum x1-\-----h xn = c as the constraint for a (non-specified) non-negative constant c. We look for the maxima and minima of the function f(xi,. . . , Xn) = y/Xi ■ ■ ■ xn with respect to the constraint and the assumption x1 > 0,..., xn > 0. The normal vector to the hyperplane M defined by the constraint is (1,..., 1). Therefore, the function / can have an externum only at those points where its gradient is a multiple of this normal vector. Hence there is the following system of equations for the desired points: -— \yxi ■■■Xn = A, n Xi for i = 1,..., n and A e R. This system has the unique solution a; 1 = ■ ■ ■ = xn in the set M. If the variables x{ are allowed to be zero as well, then the set M would be compact, so the function / would have to have both a maximum and a minimum there. However, / is minimal if and only if at least one of the values x{ is zero; so the function necessarily has a strict maximum at the point with x{ = -, i = 1,..., n, and then A = -. 1 n' lit n By substituting, the geometric mean equals the arithmetic mean for these extreme values, but it is strictly smaller at all other points with the given sum c of coordinates, which proves the inequality. 2. Integration for the second time We return to the process of integration, discussed in the second and third parts of chapter six. We saw that the integration with respect to the diverse coordinates can be iterated. Now we extend the concept of the Riemann integration and lordan measure to general Euclidean spaces and, again, we 547 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.H.18. Find the maximum of the function / : R2 -> R, f(x,y) = xy on the circle with radius 1 which is centered at the point [x0,yo] = [0,1]. O 8.H.19. Find the minimum of the function / : R2 -> R, / = xy on the circle with radius 1 which is centered at the point [x0,y0] = [2,0]. O S.ff.20. Find the minimum of the function / : R2 -> R, / = xy on the circle with radius 1 which is centered at the point [x0,y0] = [2,0]. 8.H.21. Find the minimum of the function / / = xy on the ellipse x2 + 3y2 = 1. o > R, o 8.H.22. Find the minimum of the function / : R2 -> R, / = x2y on the circle with radius 1 which is centered at the point [x0,y0] = [0,0]. 8.H.23. Find the maximum of the function / : I f(x, y) = x3y on the circle x2 + y2 = 1. 8.H.24. Find the maximum of the function / : I f(x, y) = xy on th ellipse 2x2 + 3y2 = 1. 8.H.25. Find the maximum of the function / : I f(x, y) = xy on the ellipse x2 + 2y2 = 1. I. Volumes, areas, centroids of solids o ► R, o ► R, o ► R, o 8.1.1. Find the volume of the solid which lies in the half-plane z > 0, the cylinder x2 + y2 < 1, and the half-plane a) z < x, b) x + y + z < 0. shall see that the approaches coincide for many reasonable functions. 8.2.1. Integrals dependent on parameters. Recall that integrating a function f(x, y1,..., yn) of n + 1 variables with respect to the single variable x, the result is a function F(y1,..., yn) of the remaining variables. Essentially, we proved the following theorem already in 6.3.12 and 6.3.14. This is an extremely useful technical tool, as we saw when handling the Fourier transforms and convolutions in the last chapter. Previous results about externa of multivariate functions also have a direct application for minimization of areas or volumes of objects defined in terms of functions dependent on parameters, etc. Continuity and differentiation Theorem. Consider a continuous function f(x, y1;..., yn) defined for all x from a finite interval [a,b] and for all (yi,..., yn) lying in some neighbourhood U of a point c = (ci,..., cn) £ Rn, and its integral F(yi,...,yn) = f(x,yi,---,yn)dx. J a Then function F(yi,..., yn) is continuous on U. Moreover, if there exists the continuous partial derivative J^- on a neighbourhood of the point c, then (c) exists as well and dF rb df 9yj Ja 9yj • 5 (-Vi) dx. Solution, a) The volume can be calculated with ease using cylindric coordinates. There, the cylinder is determined by Proof. In Chapter 6, we dealt with two variables x, y only, but replacing the absolute value \y\ with the norm ||y|| of the vector of parameters does not change the argumentation at all. Again, the main point is that the continuous real functions on compact sets are uniformly continuous. Since partial derivative concerns only one of the variables, the rest of the theorem was proved in 6.3.14, too. □ 8.2.2. Integration of multivariate functions. In the case of univariate functions, integration is motivated by fy* the idea of the area under the graph of a given function of one variable. Consider now the volume of the part of the three-dimensional space which lies under the graph of a function z = f(x, y) of two variables, and the multidimensional analogues in general. In chapter six, small intervals [x{, xi+1] were chosen of length Axi which divided the whole interval [a,b]. Then, their representatives & were selected, and the corresponding part of the area was approximated by the area of the rectangle with height given by the value /(&) at the representative, i.e. the expression /(&) Ae,. In the case of functions of two variables, work with divisions in both variables and the values representing the height of the graph above the particular little rectangles in the plane. 548 CHAPTER 8. CALCULUS WITH MORE VARIABLES the inequality r < 1; the half-plane z < x by z < rcostp, then. Altogether, we get f1 rf rr cos tp 2 Jo J-fJo 3 b) We will reduce this problem to one that is completely analogous to the above part by rotating the solid around the 2-axis by the angle tt/4 (be it in the positive or the negative direction). Applying the rotation matrix /V2/2 -V2/2 0\ V2/2 V2/2 0 , the original inequality x+y+ z < 0 V 0 0 1/ is transformed to \f2~x1 +z' < 0 in the new coordinates. Now, it is easy to express the integral that corresponds to the volume of the examined solid: V = C fw2 f° /k r dz dw dr =-. We JO j -j j—\/2rcosip ~ o need not have computed the result as we did; instead, we could notice that the solid from part (a) differs only by homo-thety with coefficient \[2 in the direction of the y-axis. See also note 8.1.11. □ 8.1.2. Find the volume of the solid in R3 which is given by x2 + y2 + z2 < 1, 3x2 + 3y2 >z2,x> 0. Solution. First, we should realize what the examined solid looks like. It is a part of a ball which lies outside a given cone (see the picture). The best way to determine the volume is probably to subtract half the volume of the sector given by the cone from half the ball's volume (note that the volume of the solid does not change if we replace the condition x > 0 with z > 0 - the sector is cut either "horizontally" or "vertically", but always The first thing to deal with is to determine the integration domain, that is, the region the function / is to be integrated over. As an example, consider the function z = f(x,y) = sj\ — x2 — y2, whose graph is, inside the unit disc, half of the unit sphere. Integrating this function over the unit disc yields the volume of the unit semi-ball. The simplest approach is to consider only those integration domains M which are given by products of intervals, i.e. given by ranges x e [a, b] and y e [c, d]. In this context, it is called a multidimensional interval. If M is a different bounded set in R2, work with a sufficiently large area [a,b] x [c, d], rather than with the set itself, and adjust the function so that f(x, y) = 0 for all points lying outside M. Considering the above case of the unit ball, integrate over the set M = [-1,1] x [-1,1] the function f(x,y) = 0 for x2 + y2 < 1 otherwise. The definition of the Riemann integral then faithfully follows the procedure from paragraph 6.2.8. This can be done for an arbitrary finite number of variables. Given an n-dimensional interval I and partitions into k{ subintervals in each variable x{, select the partition of I into ki ■ ■ ■ kn small n-dimensional intervals, and write Axil,,An for their volumes. The maximum of the lengths of the sides of the multidimensional intervals in such a partition is called its norm. Riemann integral Definition. The Riemann integral of a real-valued function / denned on a multidimensional interval I = [ai,&i] x [0-2, ^2] x • • • x [o.n, bn] exists if for every choice of a sequence of divisions E (dividing the multidimensional interval in all variables simultaneously), and the representatives £n...i„ of the little mutlidimensional intervals in the partitions, with the norm of the partitions converging to zero, the integral sums always converge to the value S = J f(xi,... ,xn)dxi... dxn, independent of the selected sequence of divisions and representatives. The function / is then said to be Riemann-integrable over /. As a relatively simple exercise, prove in detail that every Riemann-integrable function over an interval I must be bounded there. The reason is the same ftV as in the case of univariate functions: control ^5rfi^-A_ the norms of the divisions used in the definition somewhat roughly. 549 CHAPTER 8. CALCULUS WITH MORE VARIABLES to halves). We will calculate in spherical coordinates. x = rcos(ip) sin(ifj), y = r sin(), p £ [0, 27r), ip £ [0,7r),r £ (0,oo). The Jacobian of this transformation R3 —> R3 is r2 sm{ij}). First of all, let us determine the volume of the ball. As for the integration bounds, it is convenient to express the conditions that bind the solid in the coordinates we will work in. In the spherical coordinates, the ball is given by the inequality x2 + y2 + z2 = r2 < 1. First, let us find the integration bounds for the variable p. If we denote by ir^ the projection onto the (^-coordinate in the spherical coordinates (irv (ip,t p), then the image of the projection irv of the solid in question gives the integration bounds for the variable p. We know that irv(ball) = [0,2ir) (the equation r2 < 1 does not contain the variable p, so there are no constraints on it, and it takes on all possible values; this can also easily be imagined in space). Having the bounds of one of the variables determined, we can proceed with the bounds of other variables. In general, those may depend on the variables whose bounds have already been determined (although this is not the case here). Thus, we choose arbitrarily a po £ [0,2ir), and for this po (fixed from now on), we find the intersection of the solid (ball) and the surface p = p0 and its projection ir^ on the variable ip. Similarly like for p, the variable ip is not bounded (either by the inequality r2 < 1 or the equality p = po), so it can take on all possible values, ip £ [0,7r). Finally, let us fix a p = po and a ip = ipo- Now, we are looking for the projection Trr(U) of the object (line segment) U given by the constraints r2 < 1, p = p0, ip = ip0 on the variable r. The only constraint for r is the condition r2 < 1, so r £ (0,1]. Note that the integration bounds of the variables are independent of each other, so we can perform the integration in any order. Thus, we have koule r2 sin(ip) dip dp dr = -tt. Jo Jo Jo Now, let us compute the volume of the spherical sector given by x2 + y2 + z2 < 1 and 3x2 + 3y2 > z2. Again, we The situation gets worse when integrating in this way over unbounded intervals, see more remarks in 8.2.6 below. Therefore, consider integration of functions over R™ mainly for functions whose support is compact, that is, functions which vanish outside a bounded interval /. A bounded set M C R™ is said to be Riemann measurable3 if and only if its indicator function, denned by XM(xi,...,Xn) 1 for (xi,... ,xn) £ S 0 for all other points in I is Riemann-integrable over R™. For any Riemann-measurable set M and a function / defined at all points of M, consider the function / = xm / as a function denned on the whole R™. This function / apparently has a compact support. The Riemann integral of the function / over the set M is denned by M fdx1...dxn= / fdx1...dxn, supposing the integral on the right-hand side exists. 8.2.3. Properties of Riemann integral. This definition of the integral does not provide reasonable instructions for computing the values of Riemann integrals. However, it does lead to the following basic properties of the Riemann integral (cf. Theorem 6.2.8): Theorem. The set of Riemann-integrable real-valued functions over a Riemann measurable domain M c Rn is a vector space over the real scalars, and the Riemann integral is a linear form there. If the integration domain M is given as a disjoint union of finitely many Riemann-measurable domains Mit then f is integrable on M if and only if it is integrable on all Mit and the integral over a function f over M is given by the sum of the integrals over the individual subdomains Mi. Proof. All the properties follows directly from the definition of the Riemann integral and the properties of convergent sequences of real numbers, just as in the case of univariate functions. Think out the details by yourselves. □ For practical use, rewrite the theorem into the usual equalities: 3 Better to say "measurable via Riemann integration", the measure itself is commonly called the Peano-Jordan measure in the literature. 550 CHAPTER 8. CALCULUS WITH MORE VARIABLES express the conditions in the spherical coordinates: r2 < 1, 3sin2(i/>) > cos2(i/>), i. e., tg(ip) > Just like in the case of the ball, we can see that the variables occur independently in the inequalities, so the integration bounds of the variables will be independent of each other as well. The condition r2 < 1 implies r e (0,1]; from tg(ip) > we have ip G [0, j]. The variable p is not restricted by any condition, so p e [0, 27t]. 2-7T /»i /»-|- 2_~\/3 Vsector= / / / r2 sinip dip dr dp =---tt, o Jo Jo altogether, v v v 2 v = vbal\ - ^sector = ttTt--g-= We could also have computed the volume directly: V= I I I r2 sini/> dip dr dp = ——. Jo Jo Jf v3 In cylindric coordinates x = r cos(i,s), y = rsin(p), z = z with Jacobian r of this transformation, the calculation of the volume as the difference of the two solids considered above looks as follows: r2w rh /•! V ■ :7r - 0 0 Jo r dz dr dp ■■ tt 71' Note that we cannot compute the volume of the solid directly in the cylindric coordinates. Thus, we must split it into two solids defined by the conditions r < \ and r > |, respectively. □ Another alternative is to compute it as the volume of a solid of revolution, again splitting the solid into the two parts as in the previous case (the part "under the cone" and the part "under the sphere". However, these solids cannot be obtained by rotating around one of the axes. The volume of the former Finite additivity and linearity Any linear combination of Riemann-integrable functions fi : I —> R, i = 1,..., k (over scalars in R) is again a Riemann-integrable function, and its integral can be computed as follows: (aifi(x1,... ,xn)+... + akfk(x1,. . .,xn))dx1.. .dxn ai / fi(x1,...,xn)dx1...dxn+ ----\-a-k J fk(xi, ■ ■ ■ ,xn) dx1 ... dxn. Let Mi and M2 be disjoint Riemann-measurable sets, consider a function / :M1UM2^I, Then / is Riemann-integrable over both sets M{ if and only if it is integrable over its union, and /(aii,... ,ai„) dx1 ... dxn MiuM2 /(aii,... , i„) dxi ... dxn+ /(aii, ...,xn)dx1... dxn. 8.2.4. Multiple integrals. Riemann-integrable functions es-r4$>g?, pecially involve cases when the boundary of !MI2w> tne mtegration domain M can be expressed ssSljsi^? step by step via continuous dependencies between the coordinates in the following way. The first coordinate ai runs within an interval [a,b]. The interval range of the next coordinate can be defined by two functions, i.e. y e [p(x), ip(x)], then the range of the next coordinate is expressed as 2 £ [ri(x,y), £(ai, y)], and so on for all of the other coordinates. For example, this is easy in the case of a ball from the introductory example: for ai e [—1,1], define the range for y as y e [—Vl — x2, Vl — ai2]. The volume of the ball can then be computed by integration of the mentioned function /, or integrate the indicator function of the ball, i.e. the function which takes one on the subset M c R3 which is further defined by z e [— \J\ — x2 — y2, \J\ — x2 — y2]. The following fundamental theorem transforms the computation of a Riemann integral to a sequence of computations of univariate integrals (while the other variables are considered to be parameters, which can appear in the integration bounds as well). Notice, we could have defined the multiple integral directly via the one-dimensional integration, but we would face the trouble of ensuring the indepdence of the result on our way of describing M. The theorem reveals that the two approaches coincide and there are no unclear points left. 551 CHAPTER 8. CALCULUS WITH MORE VARIABLES part can be calculated as the difference between the volumes of the cylinder x2 + y2 < |, 0 < z < and the cone's part 3x2 + 3y2 < z2, 0 < z < The volume of the latter one is then the difference between the volumes of the solid that is created by rotating the part of the arc y = — x2), \ < x < 1 around the 2-axis and the cylinder x2 + y2 < \, 0 < z < ^. Multiple integrals V ■ V1 + V2 TT^/l 8 ttV3 24 + AV3 + I TT TT (1 - r2) dr - 8.1.3. Calculate the volume of the spherical segment of the ball x2 + y2 + z2 = 2 cut by the plane z = \. Solution. We iwll compute the integral in spherical coordinates. The segment can be perceived as a spherical sector without the cone (with vertex at the point [0,0,0] and the circular base z = 1, x2 + y2 = 1). In these coordinates, the sector is the product of the intervals [0, y% x [0,27r) x [0,7r/4]. We thus integrate in the given bounds, in any order: 2ir fj2 r2 sin(0) dö dr dp = - (V 2 - 1)tt. In the end, we must subtract the volume of the cone. That is equal to ^ttR2H (where R is the radius of the cone's base and H is its height; both are equal to 1 in our case), so the total volume is sector W = ^(V2-\)-l-tt= ^tt(aV2 - 5). Theorem. Let M '- " be a bounded set, expressed with the help of continuous functions ipi, rji M = {(x1,... ,xn); xx e [a,b],x2 G [^2(2:1),«2(^1)], .. . ,xn G [ipn(xi, ■ ■ ■ pxn_1),rin(x1,... ,xn_i)]}, and let f be a function which is continuous on M. Then the Riemann integral of the function f over the set M exists and is given by the formula f(x1,x2,...,xn)dx1...dxn= I M Ja Vt/)2(ii) fij„(xi,...,x„_i) \ \ f(x1,x2, ...pxn) dxn\ . . . dx2\dx! where the individual integrals are the one-variable Riemann integrals. Proof. Consider first the proof for the case of two variables. It can then be seen that there is no need of further ideas in the general case. Consider an interval I = [a, b] x [c, d] containing the set M = {(x,y);x G [a,b],y G [ip(x),r](y)]} and divisions E of the interval I with representatives &j. The corresponding integral sum is i ^ j ' where Ax{j is written for the product of the sizes Ax{ and Axj of the intervals which correspond to the choice of the representative ^. Assume that the work is only with choices of representatives &j which all share the same first coordinate x{. If the partition of the interval [a, b] is fixed, and only the partition of [c, d] is refined, the values of the inner sum of the expression approaches the value of the integral S*2 = it1.' f(xi,y)dy, which exists since the function f(xi,y) is continuous. In this way, a function is obtained which is continuous in the free parameter x{, see 8.2.1. Therefore, further refinement of the partition of the interval [a, b] leads, in the limit, to the desired formula rb / rn(v) \ ' SiAxi -> S = I ( / f(x,y)dy)dx. E' It remains to deal with the case of general choices of representatives of general divisions E. Since / is a continuous function on a compact set, it is uniformly continuous there. 552 CHAPTER 8. CALCULUS WITH MORE VARIABLES The volume of a general spherical segment with height h in a ball with radius R could be computed similarly: V = V* sector r2 sin(ö) dr d6 dp -ir(2Rh h2)(R-h) = -Trh2(3R-h). 8.1.4. □ Find the volume of the part of the cylinder ■? = 16 which lies inside the cylinder x2 + y2 = 16. Solution. We will compute the integral in Cartesian coordinates. Since the solid is symmetric, it suffices to integrate over the first octant (interchanging x and —x does not change the equation of the solid; the same holds for y and for z). The part of the solid that lies in the first octant is given by the space under the graph of the function z(x,y) = Vl6 — x2 and over the quarter-disc x2 + y2 < 16, x > 0, y > 0. Therefore, the volume of the whole solid is equal to V16-. -.dydx = 128. □ Remark. Note that the projection of the considered solid onto both the plane y = 0 and the plane z = 0 is a circle with radius 4, yet the solid is not a ball. Therefore, if a small real number e > 0 is selected beforehand, there is always a bound S > 0 for the norm of the partitions, so that the values of the function / for the general choices &j differ by no more than e from the choices used above. The limit processes results in the same value for general Riemann sums Ss^ as seen above. Now, the general case can be proved easily by induction. Ji i. In the case of n = 1, the result is trivial. The presented reasoning can easily be transformed for a genii eral induction step, writing (x2, ■ ■ ■, xn) instead of y, having x1 instead of x, and perceiving the particular little cubes of the divisions as (n — 1)-dimensional cubes Cartesian-multiplied by the last interval. In the last-but-one step of the proof, the induction hypothesis is used, rather than the simple one-dimensional integration. The final argument about uniform continuity remains the same. It is advised to write this proof in detail as an exercise. □ 8.2.5. Fubini theorem. The latter theorem has a particularly simple shape in the case of a multidimensional t '•*fe~"73*' interval M. Then all the functions in bounds for S^ff*---' integration are just the constant bounds from the definition of M. But this means that the integration process can be carried out coordinate by coordinate in any order. We have exploited this behavior already in Chapter 6, cf. 6.3.13. In this way is proved the important corollary:4 Fubini theorem Theorem. Every continuous function f(xi,..., xn) on a multidimensional interval M = [ai, b{\ x [a2, b2] x ... x [an, bn] is Riemann integrable on M, and its integral f(xu ... ,xn)dx1... dxn M fbi rb2 rbn / . . . / f(xi, ...,xn)dxi... dxn is independent of the order in which the multiple integration is performed. The possibility of changing the order of integration in multiple integrals is extremely useful. We have already taken advantage of this result, namely when studying the relation of Fourier transforms and convolutions, see paragraph 7.1.9. 8.2.6. Unbounded regions and functions. There is no simple concept of an improper integral for unbounded multivariate functions. The following example of multiple integration Guido Fubini (1907-1943) was an important Italian mathematician active also in applied areas of mathematics. Simple derivation of Fubini theorem builds upon the simple properties of Riemann integration and the continuity of the integrated function. Fubini, in fact, proved this result in a much more general context of integration, while the theorem just introduced was used by mathematicians like Cauchy at least a century before Fubini. 553 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.1.5. Find the volume of the part of the cylinder x2 +y2 =4 of an unbounded function is illustrative in this direction: bounded by the planes z = 0 and z = x + y + 2. o (x + yY 1 x-y ■dy dx = — 2 dx i dy = — - 1 io (x + y)3 The reason can be understood by looking at the properties of non-absolutely converging series. There, rearranging the summands can lead to an arbitrary result. The situation is better if the Riemann integral of a bounded non-negative function j(x) > 0 with non-compact support over the whole R™ is calculated. Of course some extra information is needed on the decay of the function / for large arguments. For example, if / is Rieman integrable over each n-dimensional interval I and there is a universal bound \f(x)\ dx 4 pyramid 3 Further, V1-V2 tt jo r2(sin(i,s) + cos(<£>)) + 2r dr dip = 8tt, so Vi + V2 = 4tt + ^ □ f(x) dx = 8.2.7. Further remarks on integration. The Riemann integral of multivariate functions behaves even worse than in the case of functions of one variable in the sixth chapter. Therefore, more sophisticated ■, 5 approaches to integrations have been developed. They are mainly based on the concept of the measure of a set. We consider this problem briefly now. As we shall see in 8.2.10, the Riemann integration of the indicator functions xm of sets M C R™ leads to a finitely additive measure. In probability theory in chapter 10, even elementary problems require a concept of measure which is additive over countable systems of disjoint sets. Having such a measure, measurable functions f can be defined by the condition that their preimages of bounded intervals, / ~1 ([a, b] ]), are measurable sets and the integral is built by approximation via such "horizontal strips", see the illustration. This is the M!imgvl!tIii!lgr!m starting point of Lebesgue integration.. We omit further details here, but note the Riesz representation theorem5 saying that for each linear functional I (i.e. a linear mapping valued in R) on continuous functions with compact support on a metric space x, there is the unique measure (with certain regularity properties) such that the integral associated to this measure extends /. In the case of on Lebesgue integration Remark. During the calculation, we made use of the fact that integrating a function of two variables over an area in R2 Frigyes Riesz (1880-1956) was a famous Hungarian mathematician active in particular in functional analysis. He introduced this theorem in the special case of X being an interval in M" in 1909 554 CHAPTER 8. CALCULUS WITH MORE VARIABLES yields the difference of the volume of the solid in R3 determined by the graph of the integrated function and lying above z = 0 and the one lying below 2 = 0. 8.1.6. Find the volume of the solid in I the intersection of the sphere x2 +y 2+z2 i3 which is given by = 4 and the cylinder Solution. Thanks to symmetry, it suffices to compute the volume of the part that lies in the first octant. We will integrate in cylindric coordinates given by the equations x = r cos(y>), y = rsin(^), z = z with Jacobian J = r, and it is the space between the plane z = 0 and the graph of the function z = \J4. — x2 — y2 = \JA — r2. Therefore, we can directly write is as the double integral V = 8 71-/2 r\/A - r2áráip = -(8 - 3VŠ)tt. o 3 □ 8.1.7. Find the volume of the solid in R3 which is given by the intersection of the sphere x2 + y2 + z2 = 2 and the paraboloid z = x2 + y2. the Riemann integral I on functions on R™ this provides the Lebesgue measure and the Lebesgue integral. Another point of view is the completion procedure for metric spaces. Consider the vector space X = 5c(Rn) of all continuous functions with compact support. It can be equipped with the Lp norms, similar to the univariate case from the seventh chapter, i.e. 11/11; , xn)\p dxi... dxn i/p for any 1 < p < oo. Since the Riemann integral is defined again in terms of partitions and the representative values, the properties of the norm can be verified in the same way as for univariate functions, using Holder's and Minkowski's inequalities. There are the metrics || ||p on X. The general theory provides its completion X, unique up to isometry, and it can be shown that it is again a space of functions. The Lebesgue integral mentioned above defines exactly these norms. Hence the spaces of functions with Lebesgue integrable powers / \p are obtained. 8.2.8. Change of coordinates. When calculating integrals of univariate functions, the "subtitution method" is used as one of the powerful tools, cf. 6.2.5. The method works similarly in the case of functions of more variables, when understanding its geometric meaning. Recall and reinterpret the univariate case. There, the integrated expression j(x) dx infinitesimally describes the two-dimensional area of the rectangle whose sides are the (linearized) increment Ax of the variable x, i.e. the one-dimensional rectangle, and the value f(x). If the variable x is transformed by the relation x = u(t), then the linearized increment can be expressed with the help of the differential as dx = du ~ďt' -dt, and so the corresponding contribution for the integral is given by f(u(t))^dt. Here one either supposes that the sign of the derivative u'(t) is positive, or one interchanges the bounds of the integral, so that the sign does not effect the result. Intuitively, the procedure for n variables should be similar. It is only necessary to recall the formula for (change of) the volume of parallelepipeds. The Riemann integrals are approximated by Riemann sums, which are based on the n-dimensional volume (area) of small multidimensional intervals Ae^...^ in the variables, multiplied by the values of the function at the representative points £n...i„- If the coordinates are transformed by means of a mapping x = G(y), not only the function values / (G (£, 1...,n)) are obtained at the 555 CHAPTER 8. CALCULUS WITH MORE VARIABLES Solution. Once again, we will work in cylindric coordinates: r2, rl rV2^ ^ ^ V ■ 0 r dz dr dip ■ 6 □ 8.1.8. Find the volume of the solid in R3 which is bounded by the elliptic cylinder 4x2 + y2 = 1 and the planes z = 2y and 2 = 0, lying above the plane 2 = 0. Z / K Solution. Thanks to symmetry, it is advantageous to work in the coordinates x = ^rcos( R™ be a continuously differen-tiable and invertible mapping, and write t = (h,... ,tn), x = (xu ... ,xn) = G(h, ...,tn). Further let M = G(N) be a Riemann measurable sets, and f : M —> R a continuous function. Then, N is also Riemann measurable and f(x) dXl.. .dxn = / f(G(t))\det(D1G(t))\dt1.. .dtn. 8.2.9. The invariance of the integral. The first thing to be j§t verified is the coincidence of two definitions of volume of parallelepipeds (taken for granted in •5£=lisiEL> the above intuitive explanation of the latter theorem). Volumes and similar concepts were dealt with in chapter 4 and a crucial property was the invariance of the concepts with respect to the choice of Euclidean frames of R™, cf. 4.1.22 on page 247, which followed directly from the expression of the volumes in terms of determinants. It is needed to show that the same result holds in terms of the Riemann integration as defined above. It turns out that it is easier to deal with invariance with respect to general invertible linear mappings^ : R™ -4- R™. Proposition. Let

R™ be an invertible linear mapping and I C Rn a multidimensional interval. Consider a function f, such that fo^P is integrable on I. Then M = &{F) is Riemann measurable, f is Riemann integrable on M and I dx-i ■ dxn = M detail / {fo&){y1,...,yn)dy1...dyn. □ Proof. Each linear mapping is a composition of the elementary transformations of three types (see the discussion in chapter 2, in particular paragraphs 2.1.7 and 2.1.9). The first one is a multiplication of one of the coordinates with a constant: ), y = rsm((p), z = z with Jacobian J = 4=r. The equation of the paraboloid in these coordinates is z = r2, so the volume of the solid is equal to /•7T-/2 /*\/2 /*2 ^ V = 4 / / —pr dz dr dp Jo Jo Jr2 v2 /"ir/2 rs/2 rir/2 = 2y/2 / 2r - r3 dr dp = 2^2 / dp Jo Jo Jo = V2tt. □ 8.1.10. Calculate the volume of the ellipsoid x2 + 2y2 + 3z2 = 1. Solution. We will consider the coordinates x = r cos( 0 without loss of generality), det W\ J f(ayu y2, ■ ■ ■, yn) dy1 ... dyn = = a J ■■■{^J f(ayi^v2, - ■ ■ ,yn)dy^j ...dyn = qq"1 / ••• ( / f{xi,x2,...,xn)dx1 J ...dxn f(x1,x2,... ,xn)dx1... dxn. The second case is even easier, since the order of integration does not matter due to the Fubini theorem. The third case is similar to the first one: det W\ J f(yi + y2, y2,---, yn) dy1 ... dyn = rb„ / fbt \ f(yi + v2, v2, ■ ■ ■ , Vn) dyi J .. . dyn f-bn / l-b1+x2 \ / f(xi,x2,. . . ,xn) dx1 j . . . dxn 'a1+x2 / f(xi,x2, ■ ■ ■ ,xn) dxi... dxn. The reader should check the details that the last multiple integral describes the image □ As a direct corollary of the proposition, the Riemann integral is invariant with respect to the Euclidean affine mappings. That is, the integral cannot depend on the choice of the orthogonal frame in the Euclidean R™. 8.2.10. Riemann measurable sets. It is necessary to understand how to recognize Riemann measurable domains M. When denning the Riemann integral, a strict analogy of the lower and upper Riemann integrals for univariate functions can be considered. This means taking infima or suprema of the integrated function over the corresponding multidimensional intervals instead of the function values at the representatives in the Riemann sums. For bounded functions, there are 557 CHAPTER 8. CALCULUS WITH MORE VARIABLES changed proportionally to the change of the volume of an infinitesimal volume element, which is the Jacobian. Therefore, if we consider the volume of the ball with a given radius r to be known, (in this case, r = 1), we can infer directly that the volume of the ellipsoid is V = ■ ^tt = 8.1.12. Find the volume of the solid which is bounded by the paraboloid 2a;2 + 5y2 = z and the plane z = l. Solution. We choose the coordinates X = 72rCOS(^)' V = Tgrsin^), z = z. The determinant of the lacobian is -4==, so the volume is V = I I I ,_dz dr dtp =—==. Jo Jo Jr* VW 2VTÖ □ 8.1.13. Find the volume of the solid which lies in the first octant and is bounded by the surfaces y2 + z2 = 9 and y2 = 3a;. Solution. In cylindric coordinates, /•tt/2 r^- cos2 (ip) 27 V = I II rdxdrdp=—ir. Jo Jo Jo 16 □ 8.1.14. Find the volume of the solid in R3 which is bounded by the cone part 2a;2 +y2 = (z—2)2,z > 2 and the paraboloid 2a;2 +y2 = 8 - z. Z. well-defined values of the upper and lower integrals found in this way. If this is done for the indicator function xm of a fixed set M, the inner and outer Riemann measure of the set M is obtained. Evidently, the inner measure is the supremum of the areas given by the the (finite) sums of the volumes of all multi-dimensional intervals from the partitions which are inside M, and on the other hand, the outer measure is the infi-mum of the (finite) sums of the volumes of intervals covering M. It follows directly from the definition that a set M is Riemann measurable if and only if its inner and outer measures are equal. The sets whose outer measure is zero are, of course, Riemann measurable. They are called measure zero sets or null sets. The finite additivity of the Riemann integral makes the measure finitely additive. Hence, a disjoint union of finitely many measurable sets is again a measurable set, and its measure is given by the sum of the measures of the individual sets in the union. Consider the measurability of any given set M C I C R™ inside a sufficiently large multidimensional interval /. Consider the boundary dM, i.e. the set of all boundary points of M. For any partition E of I from the definition of the Riemann integral of xm, each of the intervals with non-trivial intersection with dM contributes to the upper integral but might not contribute to the lower integral. On the contrary, for every point in the interior MQ c M its interval hx...in contributes to both the same way as soon as the norm of the partition is small enough. This observation leads to the first part of the following claim: Proposition. A bounded set M C Rn is Riemann measurable if and only if its boundary is of Riemann measure zero. If M is a Riemann measurable set and G : M C Rn —> Rn is a continuously differentiate and invertible mapping, then G(M) is again Riemann measurable. Proof. The first claim is already verified . Since both >";v G and G-1 are continuous, G maps internal p}', points of M to internal points of G(M). To '\/Tj^ finish the proof, it must be verified that G maps ^skil— the boundary dM, which is a set of measure zero, again to a set of measure zero. Since every Riemann integrable set M is bounded, its closure M must be compact. It follows that G and all partial derivatives of its components are uniformly continuous on M, and in particular on the boundary dM. Next, consider a partition E of an interval I containing dM and a fixed tiny interval J in a partition including a point t e dM. Write R = G(t) + D1G{t){J -t). J is first shifted to the origin by translation, then the derivative of G is applied obtaining a parallelepiped. This is shifted back to be around G(t). By the uniform continuity of G and D1G, for each e > 0 there is a bound S for the norm of a partition for which G(J) C G(t) + (1 + e)D1G{t){J - t) 558 CHAPTER 8. CALCULUS WITH MORE VARIABLES Solution. First of all, we find the intersection of the given surfaces: (z-2)2 = -z + 8, z>2; therefore, 2 = 4, and the equation of the intersection is 2x2+y = 4. The substitution a; = -^r cos( 2, and r2 = 8 — 2, i. e., z = r + 2 for the former surface and z = 8 — r2 for the latter. Altogether, the projection of the given solid onto the coordinate p is equal to the interval [0, 2tt]. Having fixed a po e [q,2n], the projection of the intersection of the solid and the plane p = p0 onto the coordinate r equals (independently of po) the interval [0,2]. Having fixed both r0 and po, the projection of the intersection of the solid and the line r = ro, p = po, onto the coordinate z is equal to the interval [r0 + 2,8 — ?-q] . The Jacobian of the considered transformation is J V2 r, so we can write V ■ 2tt r2 rS-r- 0 JO Jr+2 r a a a 16y/2 —= az dr dp = -it. V2 ^3 □ 8.1.15. Find the volume of the solid which lies inside the cylinder y2 +z2 = 4 and the half-space a; > 0 and is bounded by the surface y2 + z2 + 2x = 16. Solution. In cylindric coordinates, V ■ 2tt 1-2 /.8-V- 0 JO JO r dx dr dp = 28tt. □ 8.1.16. The centroid of a solid. The coordinates (xt, yt,zt) of the centroid of a (homogeneous) solid T with volume V in R3 are given by the following integrals: xt = J J J xdxdydz, Vt y dx dy dz, z da;dyd2. The centroid of a figure in R2 or other dimensions can be computed analogously. 8.1.17. Find the centroid of the part of the ellipse 3a;2 + 2y2 = 1 which lies in the first quadrant of the plane R2. can be guaranteed. The entire image of J lies inside a slightly enlarged linear image of J by the derivative. Now, the outer measure a of the image G(J) satisfies: a < (1 + e)n vol„ R = (1 + e)n | det G(t) vol„ J. If [i is the upper Riemann sum for the measure of dM corresponding to the chosen partition, the outer measure of G(dM) must be bounded by (1 + e)n maxf€aM | det G(t)\fi. Finally, for the same e, the norm of the partition is bounded, so that [i < e, too. But then the outer measure is bounded by a constant multiple of (1 + e)ne, with the universal constant maxtGgM | det G(t)\. So the outer measure is zero, as required. □ A slightly extended argumentation as in the proof above leads to understanding that the Riemann integrable functions are exactly those bounded functions with compact support whose set of discontinuity points has (Riemann) measure 8.2.11. Proof of Theorem 8.2.8. A continuous function / and a differentiable change of coordinates is under consideration. So the inverse G-1 is continuously differentiable, and the image G~1(M) = N is Riemann measurable. Hence the integrals on both sides of the equality exist and it remains to prove that their values are equal. Denote a composite continuous function by I tn) — f(G(h,..., tn)), and choose a sufficiently large n-dimensional interval I containing 7y and its partition S. The entire proof is nothing more than a more exact writing of the discussion presented before the formulation of the theorem. Repeat the estimates on the volumes of images from the previous paragraph on Riemann measurability. It is already known that the images G(Ii1...in) of the intervals from the partition are again Riemann measurable sets. For each small part Ii 1... i n of the partition E, the integral of / over Ji1...in = G(Ii1...in) certainly exists, too. Further.ifthecenterf^...^ of the interval/^...^ isfixed, then the linear image of this interval Ri G(ttl...tn) + Z31G(til...i„)(Jil...i„ - ttl...tn), is obtained. This is an n-dimensional parallelepiped (note that the interval is shifted to the origin with the linear mapping given by the Jacobi matrix, and the result is then added to the image of the center). If the partition is very fine, this parallelepiped differs only a little from the image Ji1 . By the uniform continuity of the mapping G, there is, for an arbitrarily small e > 0, a norm of the partition such that for all finer partitions G(fn...2J + (1 + e)D1G(t1,... f„)(/n...2J D Jll...lk. 559 CHAPTER 8. CALCULUS WITH MORE VARIABLES Solution. First, let us calculate the volume of the given ellipse. The transformation x = -j^x', y = -^y' with Jaco- bian -j7g leads to v5 0 Jo tt dyda; = —— V6jo Jo 1 rVl-x2 Ay' Ax' The other integrals we need can be computed directly in Cartesian coordinates x and y: Tx = xAyAx= I x\l---da; = 1 /"* /l — 3i y/2 1 ^(l-3x2)Ax=^§. Therefore, the coordinates of the centroid are 2-^l. □ 8.1.18. Find the volume and the centroid of a homogeneous cone of height h and circular base with radius r. Solution. Positioning the cone so that the vertex is at the origin and points downwards, we have in cylindric coordinates that V = 4 / / / pAzApAp = -Trhr2. Jo Jo J^p 3 Apparently, the centroid lies on the z-axis. For the 2-coordinate, we get cone zdV=- y rir/2 j-r 0 JO J^p zpAzApAp = —h. Thus, the centroid lies \h over the center of the cone's base. □ 8.1.19. Find the centroid of the solid which is bounded by the paraboloid 2a;2 + 2y2 = z, the cylinder (a;+l)2+y2 = 0, and the plane 2 = 0. Solution. First, we will compute the volume of the given solid. Again, we use the cylindric coordinates (x = r ■ cos p, y = r ■ sin p, z = z), where the equation of the paraboloid is z = 2r2 and the equation of the cylinder reads r = — 2 cos(y>). Moreover, taking into account the fact that the plane x = 0 is tangent to the given cylinder, we can easily However, then the n-dimensional volumes also satisfy vol„(Jn...2J < (l + e)nvo\n(Rn...tJ = (1 + eT\AetG(tn...tk)\vo\n(In...tJ. Now, it is possible to estimate the entire integral: f(x1, ...,xn)dx1... dxn = = E / f(x1,...,xn)dx1...dxn < V ( sup g(t))voln(Jil...iJ <(l + e)"V( sup g(t))\AetG(tn...tk)\vo\n(In...tJ. — t€li, ;„ If e approaches zero, then the norms of the partitions approach zero too, the left-hand value of the integral remains the same, while on the right-hand side, the Riemann integral of g(t) | det G(t) | is obtained. Instead of the desired equality, the inequality: f(x)dx1...dxn< I f(G(t))\Aet(D1G(t))\dt1...dtn m Jn is obtained. The same reasoning can be repeated after interchanging G and G-1, the integration domains M and N, and the functions / and g. The reverse inequality is immediately obtained: g(t)\Aet(D1G(t)) \ dh ... dtn < f /(a;)|det(D1G(G-1(a;)))| JM \Aet(D1G~1 (x)) \ dx±... dxn f(x) dx1...dxn. The proof is complete. 8.2.12. An example in two dimensions. The coordinate transformations are quite transparent for the in- "^~T^' tegral of a continuous function f(x,y) of two '>< V ' variables. Consider the differentiable transfor- 1' -~- mation G(s,t) = (x(s,t),y(s,t)). Denoting g(sj) = f(x(s,t),y(s,t)), f(x,y) dxdy = / g(s,t) G(iV) JN dx dy dx dy ds dt dt ds dsdt is obtained. As a truly simple example, calculate the integral of the indicator function of a disc with radius R (i.e. its area) and the integral of the function f(t, 6) = cos(i) defined in polar coordinates inside a circle with radius \tt (i.e. the volume hidden under such a "cap placed above the origin", see the illustration). 560 CHAPTER 8. CALCULUS WITH MORE VARIABLES determine the bounds of the integral that corresponds to the volume of the examined solid: r^r r~2 cos v r2r2 V = I I I rdzdr dp o Jo 2 cos ip 2rA dr dp 8 cos irp = 37r, where the last integral can be computed using the method of recurrence from 6.2.6. Now, let us find the centroid. Since the solid is symmetric with respect to the plane y = 0, the y-coordinate of the centroid must be zero. Then, the remaining coordinates xt and zt of the centroid can be computed by the following integrals: 1 xT ■ V 1 Vjo 1 V .U 1 V .L B x dx dy dz -2 cos ip r2 cos p dz dr dp 'o -2 cos ip 2r4 cos p dr dph o 64 6 , 4 — cos pdp = --, where the last integral was computed by 6.2.6 again. Analogously for the z-coordinate of the centroid: 1 -2 cos ip zr cos p dz dr dp = —. The coordinates of the centroid are thus [—1,0, 201 3' 9 >■ □ 8.1.20. Find the centroid of the homogeneous solid in R3 which lies between the planes z = 0 and z = 2, bounded by the cones x2 + y2 = z2 and x2 + y2 = 2z2. Solution. The problem can be solved in the same way as the previous ones. It would be advantageous to work in cylindric coordinates. However, we can notice that the solid in question is an "annular cone": it is formed by cutting out a cone K1 with base radius 4 of a cone K2 with base radius 8, of common height 2. The centroid of the examined solid can be determined by the "rule of lever": the centroid of a system of two solids is the weighted arithmetic mean of of the particular solids' centroids, weighed by the masses of the solids. We found out First, determine the Jacobi matrix of the transformation x = r cos 6,y = r sin 9 D1G = (cos ( I sin I Hence, the determinant of this matrix is equal to det D1G(r, 9) = r(sin2 9 + cos2 6») = r. Therefore, the calculation can be done directly for the disc S which is the image of the rectangle (r, 9) e [0, R] x [0, 27r] = T. In this way the area of the disc is obtained: f2" fR fR 2 dxdy = I I r drd9 = I 2ixr dr = ttR . s Jo Jo Jo The integration of the function / is very similar, using multiple integration and integration by parts: /•2ir j-tt/2 j(x,y)dxdy= I I r cos r drd9 = tt2 — 27T. is Jo Jo In many real life applications, a much more general approach to integration is needed which allows for the dealing with objects over curves, surfaces, and their higher dimensional analogues. For many simple cases, such tools can be built now with the help of parametrization of such k-dimensional surfaces and employ the letter theorem to show the independence of the result on such a parametrization. These topics are postponed to the beginning of the next chapter where a more general and geometric approach is discussed. 3. Differential equations In this section, we return to (vector) functions of one variable, defined and examined in terms of their instantaneous changes. 8.3.1. Linear and non-linear difference models. The concept of derivative was introduced in order to work with instantaneous changes of the examined quantities. In the introductory chapter, difference equations based on similar concepts in relation to sequences of scalars were discussed. As a motivating introduction to equations containing derivatives of unknown functions, recall first the difference equations. 561 CHAPTER 8. CALCULUS WITH MORE VARIABLES in exercise 8.1.18 that the centroid of a homogeneous cone is situated at quarter its height. Therefore, the centroids of both cones lie at the same point, and this points thus must be the centroid of the examined solid as well. Hence, the coordinates of the wanted centroid are [0,0, §]. □ 8.1.21. Find the volume of the solid in R3 which is bounded by the cone part x2 + y2 = (z — 2)2 and the paraboloid x2 + y2 = 4 — z. Solution. We build the corresponding integral in cylindric coordinates, which evaluates as follows: j-2-K rl ri-r2 g V = I I I rdz drdp = —ir. Jo Jo Jr+2 6 □ 8.1.22. Find the volume of the solid in R3 which lies under the cone x2 + y2 = (z — 2)2, z < 2 and over the paraboloid x2 + y2 = z. Solution. /•2ir r-1 r~2-r g V = / / / rdz drdp = -ir. Jo Jo Jr2 6 Note that the considered solid is symmetric with the solid from the previous exercise 8.1.21 (the center of the symmetry is the point [0,0, 2]). Therefore, it must have the same volume. □ 8.1.23. Find the centroid of the surface bounded by the parabola y = 4 — x2 and the line y = 0. O 8.1.24. Find the centroid of the circular sector corresponding to the angle of 60° that was cut out of a disc with radius 1. o 8.1.25. Find the centroid of the semidisc x2 + y2 = 1, y > 0. o 8.1.26. Find the centroid of the circular sector corresponding to the angle of 120° that was cut out of a disc with radius 1. o 8.1.27. Find the volume of the solid in R3 which is given by the inequalities z > 0, z — x < 0, and (x — l)2 + y2 < 1. o 8.1.28. Find the volume of the solid in R3 which is given by the inequalities z > 0, z — y <0. O 8.1.29. Find the volume of the solid bounded by the surface 3a2 + 2y2 + 3z2 + 2xy - 2yz - Axz = 1. o The simplest difference equations are formulated as yn+i = F(yn, n), with a function F of two variables. For example, the model describing interests of deposits or loans (this included the Malthusian model of populations) was considered. The increment was proportional to the value, yn+i = ayn, see 1.2.2. Growths by 5% is represented by a = 1.05. Considering continuous modeling, the same request leads to an equation connecting the derivative y'(t) of a function with its value (1) V'(t)=ry(t) with the proportionality constant r. Here, the instantaneous growth by 5% corresponds to r = 0.05. It is easy to guess the solution of the latter equation, i.e. a function y(t) which satisfies the equality identically, y(t) = Cert with an arbitrary constant C. This constant can be determined uniquely by choosing the initial value yo = y(to) at some point to. If a part of the increment in a model should be given as a constant independent of the value y or t (like bank charges or the natural decrease of stock population as a result of sending some part of it to slaughterhouses), an equation can be used with a constant s on the right-hand side. (2) y'(t)=r-y(t) + s. The solution of this equation is the function y(f) = Cert--. r It is a straightforward matter to produce this solution when it is realized that the set of all solutions of the equation (1) is a one-dimensional vector space, while the solutions of the equation (2) are obtained by adding any one of its solutions to the solutions of the previous equation. The constant solution y(t) = k for k = — ^ is easily found. Similarly, in paragraph 1.4.1, the logistic model of population growth was created. Based on the assumption that the ratio of the change of the population size p(n + 1) — p(n) and its size p(n) is affine with respect to the population size itself. The model behaves similar as the Malthusian one for small values of the population size and to cease growing when reaching a limit value K. Now, the same relation for the continuous model can be formulated for a population p(t) dependent on time t by the equality (3) p'(t)=p(t)(-^p(t)+r). At the value p(t) = K for a (large) constant K, the instantaneous increment of the function p is zero, while for p(t) > 0 near zero, the ratio of the rate of increment of the population and its size is close to r, which is the (small) number expressing the rate of increment of the population in good conditions (e.g. 0.05 would again mean immediate growth by 5%). It is not easy to solve such an equation without knowing any theory (although this type of equations will be dealt with in a moment). However, as an exercise on differentiation, it 562 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.1.30. Find the volume of the part of R3 lying inside the ellipsoid 2x2 +y2 + z2 =6 and in the half-space x > 1. O 8.1.31. The area of the graph of a real-valued function f(x,y) in variables x and y. The area of the graph of a function of two variables over an area S in the plane xy is given by the integral l + f2 + f2dxdy. Considering the cone x2 + y2 = z2. find the area of the part of its lateral surface which lies above the plane z = 0 and inside the cylinder x2 + y2 = y. Solution. The wanted area can be calculated as the area of the graph of the function z = \Jx2 + y2 over the disc K: x2 — (y — |)2. We can easily see that V fx fy x2 + y2 ' y x2 + y2 ' so the area is expressed by the integral JJ y/l + f2 + f*dxdy = JJ yfidxdy. = V2 7t /*sm 7t r ar dtp — - / sin ip 2 _/n □ 8.1.32. Find the area of the parabola z = x2 + y2 over the disc x2 + y2 < 4. O 8.1.33. Find the area of the part of the plane x + 2y + z = 10 that lies over the figure given by (x — l)2 + y2 < 1 and y > x. o In the following exercise, we will also apply our knowledge of the theory of Fourier transforms from the previous chapter. 8.1.34. Fourier transform and diffraction. Light intensity is a physical quantity which expresses the transmission of energy by waves. The intensity of a general light wave is defined as the time-averaged magnitude of the Poynting vector, which is the vector product of mutually orthogonal vectors of electric and magnetic fields. A monochromatic plane wave spreading in the direction of the y-axis satisfies I = ceo where c is the speed of light and eo is the vacuum permittivity. The monochromatic wave is described by the harmonic function Ey = t/j(x, t) = A cos(cut — kx). The number A is the is easily verified that the following function is a solution for every constant C: Pit) K l + CKe- For the continuous and discrete versions of the logistic models, the values K = 100, r = 0.05, and C = 1 in the left hand illustration are chosen. The same 1.4.1 result occurs in the right hand illustration (i.e. with a = 1.05 and p± = 1, as expected). The choice C = 1 yields p(0) = K/(1 + K) which is very close to 1 if K is large enough. In particular, both versions of this logistic model yield quite similar results. For example, the left hand illustration also contains the dashed line of the graph of the solution of the equation (1) with the same constant r and initial condition (i.e. the Mathusian model of growth). 8.3.2. First-order differential equations. By an (ordinary) first-order differential equation, is usually meant the relation between the derivative y'(t) of a (^Mtiwts function with respect to the variable t, its value ^r^sp^A- y(t), and the variable itself, which can be written in terms of some real-valued function F : R3 —> R as the equality F(y'(t),y(t),t) = 0. This equation resembles the implicitly defined functions y(t); however, this time there is a dependency on the derivative of the function y(t). We also often suppress the dependence of y = y(t) on the other variable t and write F(y', y, t) = 0 instead. If the implicit equation is solved at least explicitly with regard to the derivative, i.e. y' = f(t,y) for some function / : R2 —> R, it is clear graphically what this equation defines. For every value (t, y) in the plane, the arrow corresponding to the vector (1, f(t, y)), can be considered. That is the velocity with which the point of the graph of the solution moves through the plane, depending on the free parameter t. For instance, the equation (3) in the previous subsection determines the following: (illustrating the solution for the initial condition as above). 563 CHAPTER 8. CALCULUS WITH MORE VARIABLES at maximal amplitude of the wave, u is the angular frequency, and for any fixed t, the so-called wave length A is the prime period. The number k then represents the speed k = 2f-which the wave propagates. We have I = ce0- [ Eldt = ce0- [ A2 cos2 (ojt - k x) dt T Jo T Jo a91 C 1 +cos(2M -kx)) , = ce0A2- -^-=dt T Jo 1 1 ,,lr sin(2(cjt — k x)) 1T 1 . 9I/ sin(2(o;r — k x)) — sin(2(—k x)) -, = -ce0Az- (t H--—---—-— 2 u t{ 2uj = ^ce0A2(l + ■ 1 A2 = ^ce0A2 i(2(wt -kx))- sin(2(-fc x)), 2cjt ' The second term in the parentheses can be neglected since it is always less than 7^7 = 577 < 10~6 f°r real detectors of light, so it is much inferior to 1. The light intensity is directly proportional to the squared amplitude. A diffraction is such a deviation from straight-line propagation of light which cannot be explained as the result of a refraction or reflection (or the change of the ray's direction in a medium with continuously varying refractive index). The diffraction can be observed when a lightbeam propagates through a bounded space. The diffraction phenomena are strongest and easiest to see if the light goes through openings or obstacles whose size is roughly the wavelength of the light. In the case of the Fraunhofer diffraction, with which we will deal in the following example, a monochromatic plane wave goes through a very thin rectangular opening and projects on a distant surface. For instance, we can highlight a spot on the wall with a laser pointer. The image we get is the Fourier transform of the function describing the permeability of the shade - opening. Let us choose the plane of the diffraction shade as the coordinate plane 2 = 0. Let a plane wave Aexp(ikz) (independent of the point (x, y) of landing on the shade) hit this plane perpendicularly. Let s(x,y) denote the function of the permeability of the shade, then the resulting waves falling onto the projection surface at a point (£, if) can be described as the integral sum of the waves (Huygens-Fresnel principle) which have gone through the shade and propagate through the medium from all points (x, y, 0) (as a spherical wave) into the point (£,77,2): ,* y s s / / / / / / / / / / / / / / / s s s / r / / / / / / / / / / / / / / / / / / / / /// / / / / / / / / / / ■ / / / / / / / / / / / ////// /////////// /////////// /////////// yyyyyyyyyyy Such illustrations should invoke the idea that differential equations define a "flow" in the plane, and each choice of the initial value (to,y(t0)) should correspond to a unique flow-line expressing the movement of the initial point in the time t. It can be anticipated intuitively that for reasonably behaved functions f(t,y) in the equations y' = f(t,y), there is a unique solution for all initial conditions. 8.3.3. Integration of differential equations. Before examining the conditions for existence and uniqueness of the solutions, we present a truly elemen-!>^j$£~ tarv method for finding the solutions. The idea, y>$s~*~ mentioned briefly already in 6.2.14 on page 406, is to transforms the problem to ordinary integration, which usually leads to an implicit description of the solution. Equations with separated variables Consider a differential equation in the form (1) y' = f(t) ■ g(y) for two continuous functions of a real variable, / and g. The solution of this equation can be obtained by integration, finding the antiderivatives G(y) f(t)dt. This procedure reliably finds solutions y(t) which satisfy 9(y(t)) 7^ 0, given implicitly by the formula (2) F(t) + C = G(y) with an arbitrary constant G Differentiating the latter equation (2) using the chain rule for the composite function G(y(t)) leads to ^yy'(^) = f(t)< as required. As an example, find the solution of the equation y' = ty. Direct calculation gives In \y(t)\ = \t2 + C with arbitrary constant C. Hence it looks (at least for positive values of y) as y(t) = e'*2+c = D e'*2, where D is an arbitrary positive constant. It is helpful to examine the resulting formula and signs thoroughly. The constant solution y(t) = 0 also satisfies the equation. For negative values of y, the same solution can be used with negative 564 CHAPTER 8. CALCULUS WITH MORE VARIABLES v) = A / / s(x, y)e-jfe(^+TO) dx dy J jr2 rp/2 rq/2 iP(^v)=A / e-tk^+vv) dy dx J-p/2 J-q/2 rp/2 rq/2 e~lk^x dx A p/2 -ikrjy dy q/2 p/2 ' e-ik-qy ' q/2 -ik£ _ -p/2 —ikrj -q/2 A = Apq 2sin(fcCp/2) 2sin(fc rjq/2) sin(fc £p/2) sin(fc rjq/2) k£p/2 kriq/2 The graph of the function j(x) = looks as follows: o,l- Ia- Jü2- -20 / 1 \/ 0,2- The graph of the function if) = sin t sin 7 then does: constants D. In fact, the constant D can be arbitrary, and a solution is found satisfying any initial value. i i \ \ WW \~W-/ /////) \ 1 \ l \ \\ \ N-"-/ / / / / / I I i I, i \ \ \ \ \-w /III!! \ \ \ \ \ \ \ \\^// I ! I I I W W \ W \ / / / / WWW \N<—/ / / / \ \ W \ W^-a-sS// III. \ \ \ \ N\\v-----/ r ill \ \ \\\\x^------/ / / / / \ \ \ \ \ \ \-"'//111 II \\ \ \ \ \ \ w-"-// /////// \ \ \ \ \ W-r—// I I I I I \ \ \ W\\\ y—///////. \ \\ \ \ \ \w"—s/r 11 1 ill \ \\\ // / ill \ \ \\ WW-----"S//ill \ \ \\N w------"/' / / / / .'X---------^^x\\\ \ \ / / f I ///---'-—-^;\\ A \ \ I 1111 I I //S-^---NW \ \ \\\ \ I ill I I I / /,---->v.\.\ \ \ \ \\, \ I / I I I I I / --x \ \ \ \ \ \ \ \ /Iff//// /•—\ \ \ \ \ \ \ \ 1 II) il'lll /.-—-\\ \ \ \ \ \ ill wii 11 \ \ \ \ \ \ / / SSSSS'^--------\ / / / /////'—\ \ \ I I I I I / //--j— ss\ \ \ \ \ \ //////>/'—fcs\\s\\\\ WW I I I 11 n \ \ \ \ \ \ \ W I! If 11 WW \ i \ WW I! I I//---W\ \ \ vV 1 Iii///// /---\ wiim fin//1\ \ \ \ i The illustration shows two solutions which demonstrate the instability of the equation with regard to the initial values: For every to, if we change a small yo from a negative value to a positive one, then the behaviour of the resulting solution changes dramatically. Notice the constant solution y(t) = 0, which satisfies the initial condition y(t0) =0. Using separation of variables, the non-linear equation is easily solved from the previous paragraph which describes the logistic population model. Try this as an exercise. 8.3.4. First order linear equations. In the first chapter, we paid much attention to linear difference equations. Their general solution was determined in paragraph 1.2.2 on page 11. Although it is clear beforehand that it is a one-dimensional affine space of sequences, it is a hardly transparent sum, because all the changing coefficients need to be taken into account. Consequently this can be used as a source of inspiration for the following construction of the solution of a general first-order linear equation (1) y'= a{t)y + b{t) with continuous coefficients a(t) and b(t). First, find the solution of the homogeneous equation y' = a(t)y. This can be computed easily by separation of variables, obtaining the solution with y(s) = y0, y(t) = y0F(t,t0), F(t,s)=ef>^dx. In the case of difference equations, the solution of the general non-homogeneous equation was "guessed". Then it was proved by induction that it was correct. It is even simpler now, as it suffices to differentiate the correct solution to verify the statement, once we are told what the right result is: The solution of first-order linear equations The solution of the equation (1) with initial values y(to) = yo is (locally in a neighbourhood of 10) given by the formula y(t)=yoF(t,to)+ [ F(t,s)b(s)ds, where F(t,s) = eJ'sa<-^dx. And the diffraction we are describing: Verify the correctness of the solution by yourselves (pay proper attention to the differentiation of the integral where t is 565 CHAPTER 8. CALCULUS WITH MORE VARIABLES 140 fad - Orad L-MO rad Since limx^0 ~ 1. the intensity at the middle of the image is directly proportional to To = A2p2q2. The Fourier transform can be easily scrutinized if we aim a laser pointer through a subtle opening between the thumb and the index finger; it will be the image of the function of its permeability. The image of the last picture can be seen if we create a good rectangular opening by, for instance, gluing together some stickers with sharp edges. J. First-order differential equations 8.J.I. Find all solutions of the differential equation V (1 + cos2 x) Solution. We are given an ordinary first-order differential equation in the form y' = f(x,y), which is called an explicit form of the equation. Moreover, we can write it as y' = fi(x)- J2(y) for continuous univariate functions f\ and J2 (on certain open intervals), i. e., it is a differential equation with separated variables. First, we replace y' with dy/dx and rewrite the differential equation in the form Since 1+co2s x dx. COSz x I1- +COS x dx = J ■ + 1 dx, we can integrate using the basic formulae, thereby obtaining (1) arcsiny = tg x + x + C, CeK. However, we must keep in mind that the division by the expression sj\ — y2 is valid only if it is non-zero, i. e., only for y =^ ±1. Substituting the constant functions y = 1, y = — 1 into the given differential equation, we can immediately see that they satisfy it. We have thus obtained two more solutions, both in the upper bound and a free parameter in the integrand, cf. 6.3.14). In fact, there is the general method called variation of constants which directly yields this solution, see e.g. the problem 8.1.9. It consists in taking the solution for the homogenous equation in the form y(t) = cF(t,t0) and consider instead an ansatz for a solution to the non-homogeneous equation in the form y(t) = c(t)F(t,t0) with an unknown function c(t). Differentiating yields the equation c' = e~ -^o a^dx an(j integrating this leads to c(t) = Jl eX.'° a^dx b(s)ds, i.e. y(t) = c(i)eJ4 a^dx as in the above formula. Check the details! Notice also the similarity to the solution for the equations with constant coefficients explicitly computed in the form of convolution in ?? on the page ??, which could serve as inspiration, too. As an example, the equation y 1 -xy, can be solved directly, this time encountering stable behaviour, visible in the following illustration. 8.3.5. Transformation of coordinates. The illustrations suggest that differential equations can be perceived as geometric objects (the "directional field of the arrows"), so the solution can be found by conveniently chosen coordinates. We return to this point of view later. Here are three simple examples of typical tricks as seen from the explicit form of the equations in coordinates. We begin with homogeneous equations of the form y\ Considering the transformation z = f and assuming that t ^ 0, then by the chain rule, z'(i) = ±(ty'(i)-y(i)) = \(f(z)-z), which is an equation with separated variables. Other examples are the Bernoulli differential equations, which are of the form y'(t) = f(t)y(t)+g(t)y(tr, 566 CHAPTER 8. CALCULUS WITH MORE VARIABLES which are called singular. We do not have to pay attention to the case cos x = 0 since this only loses points of the domains (but not any solutions). Now, we will comment on several parts of the computation. The expression y' = dy/dx allows us to make many symbolic manipulations. For instance, we have dz _ dy_ _ dz_ 1 _ dx_ dy dx dx"1 dy ' The validity of these two formulae is actually guaranteed by the chain rule theorem and the theorem for differentiating an inverse function, respectively. It was just the facility of the manipulations that inspired G. W. Leibniz to introduce this notation, which has been in use up to now. Further, we should realize why we have not written the general solution (1) in the suggesting form (2) y = sin (tg x + x + O) , C £ As we will not mention the domains of differential equations (i. e., for which values of x the expressions are well-defined), we will not change them by "redundant" simplifications, either. It is apparent that the function y from (2) is denned for all x £ (0,7r) \ {k/2}. However, for the values of x which are close to 7r/2 (having fixed C), there is no y satisfying (1). In general, the solutions of differential equations are curves which may not be expressible as graphs of elementary functions (on the whole intervals where we consider them). Therefore, we will not even try to do that. □ 8.J.2. Find the general solution of the equation y' = (2 - y) tg x. Solution. Again, we are given a differential equation with separated variables. We have dy dx dy = (2 - y) tg x, sin a; ■ dx, y — 2 cos x — In | y — 2 | = -In | cosa; -ln|C|, C ^ 0. Here, the shift obtained from the integration has been expressed by In | C |, which is very advantageous (bearing in mind what we want to do next) especially in those cases when we obtain a logarithm on both sides of the equation. Further, we have where n/0,1. The choice of the transformation z = y1 71 leads to the equation z'(t) = (l-n)y(t)-n(f(t)y(t)+g(t)yn) = {\-n)f{t)z{t) + {\-n)g{t), which is a linear equation, easily integrated. We conclude with the extraordinarily important Riccati equation. It is a form of the Bernoulli equation with n = 2, extended by an absolute term y'(t) = f(t)y(t) + g(t)y(t)2 + h(t). This equation can also be transformed to a linear equation provided that a particular solution x (t) can be guessed. Then, use the transformation Z{t) = y{t)-x{ty Verify by yourselves that this transformation leads to the equation z'(i) = -{f(i) + 2x(i)g(i))z(i)-g(i). As seen in the case of integration of functions (the simplest type of equations with separated variables), the equations usually do not have a solution expressible explicitly in terms of elementary functions. As with standard engineering tables of values of special functions, books listing the solutions of basic equations are compiled as well.6 Today, the wisdom concealed in them is essentially transferred to software systems like Maple or Math-ematica. Here, any task about ordinary differential equations can be assigned, with results obtained in surprisingly many cases. Yet, explicit solutions are not possible for most problems. 8.3.6. Existence and uniqueness. The way out of this is numerical methods, which try only to approximate the solutions. However, to be able to use them, good theoretical starting points are still needed regarding existence, uniqueness, and stability of the solutions. We begin with the Picard-Lindeldf theorem: Existence and uniqueness of the solutions of ODEs Theorem. Consider a function f(t, y) : R2 —> R with continuous partial derivatives on an open set U. Then for every point (to,yo) £ U C R2, there exists the maximal interval I = [to — a, to + b], with positive a, b £ R, and the unique function y(t) : / —> R which is the solution of the equation y' = f(t, y) on the interval I. For example, the famous book Differentialgleichungen reeller Funktionen, Akademische Verlagsgesellschaft, Leipzig 1930, by E. Kamke, a German mathematician, contains many hundreds of solved equations. They appeared in many editions in the last century. 567 CHAPTER 8. CALCULUS WITH MORE VARIABLES In | y — 2 | = ln|Ccosa;|, C / 0, |2/ — 2 | = |Ccosx|, C/0, y-2 = Ccosa;, C 0, where we should write ±C (after removing the absolute value). However, since we consider all non-zero values of C, it makes no difference whether we write +C or — C. We should pay attention to the fact that we have made a division by the expression y — 2. Therefore, we must examine the case y = 2 separately. The derivative of a constant function is zero, so we have found another solution, y = 2. However, this solution is not singular since it is contained in the general solution as the case C = 0. Thus, the correct result is y = 2 + Ccosa;, C e R. □ 8.J.3. Find the solution of the differential equation (1 + ex) yy' = ex which satisfies the initial condition y(0) = 1. Solution. If the functions / : (a, 6) —> R and g : (c, d) —> R are continuous and g(y) / 0, y e (c,d), then the initial problem y' = f(x)g(y), y(x0) = yo has a unique solution for any xo £ (a, b), yo G (c, d). This solution is determined implicitly as yo xo In practical problems, we first find all solutions of the equation and then select the one which satisfies the initial condition. Let us compute: (1 + ex)ydy/dx = ex, ex ydy = 1 + ex ■ dx. V = ln(l + ex) + ln|C,|, C + 0, ^=ln(C[l + e*]), C>0. The substitution y = 1, x = 0 then gives i=ln(C-2), i.e. C = ^. We have thus found the solution iL. 2 l. e., In ( 4 [1 + e^] 21n(^ [l + e* Proof. If a differentiable function y{i) is a solution of an equation satisfying the initial condition y(t0) =t0, then it also satisfies the equation y(t) = Vo+ y'(s)ds = y0+ f(s,y(s))ds, J to J to where the Riemann integrals exist due to the continuity of / and hence also y'. However, the right-hand side of this expression is the integral operator L(v)(t) = yo+ I f{s,y{s))ds J to acting on functions y. Solving first-order differential equations, is equivalent to finding fixed points for this operator L, that is, to find a function y = y(t) satisfying L(y) = y. On the other hand, if a Riemann-integrable function y(t) is a fixed point of the operator L, then it immediately follows from the fundamental theorem of calculus that y(t) satisfies the given differential equation, including the initial conditions. It is easy to estimate how much the values L(y) and L(z) differ for various functions y(t) and z(t). Since both partial derivatives of / are continuous, / is itself lots'1 cally Lipschitz. This means that restricting the values ffl 1 (t, y) to a neighbourhood U of the point (to, yo) with compact closure, there is the estimate \f(t,y)-f(t,z)\ t0, but the final conclusion works for t < t0 the same way) JÄ \(L(y)-L(z))(t)\ f(-%y(s)) - f(s,z(s))ds \f(s,y(s)) - f(s,z(s))\ds < < C / \y(s) - z(s)\ds J ta < C( max \y(s) — z(s)\)\t — to\ to^s^Ct = D( max \y(s) - z(s)\), to 0. 8.J.4. Find the solution of the differential equation □ x+1 which satisfies y(0) = 1. Solution. Similarly to the previous example, we get dy dx y2 + 1 ~ x + V arctany = In | x + 11 + C, CeK. The initial condition (i. e., the substitution x = 0 and y = 1) gives arctan 1 = In 111 + C, i. e., C = f. Therefore, the solution of the given initial problem is the function y(x) = tg (In + + f) on a neighborhood of the point [0,1]. □ 8.J.5. Solve (1) V = 2x + 2y-l- Solution. Let a function / : (a, b) x (c, d) —> R have continuous second-order partial derivatives and f(x,y) =^ 0, x G (a,b), y G (c, d). Then, the differential equation y' = f(x,y) can be transformed to an equation with separated variables if and only if f(x,y) fy(x,y) fL(x,y) f"y(x,y) With a bit of effort, it can be shown that a differential equation of the form y' = f(ax + by + c) can be transformed to an equation with separated variables, and this can be done by the substitution z = ax + by + c. Let us emphasize that the variable z replaces y. We thus set z = x + y, which gives 2' = 1 + y'. Substitution into (1) yields z>-l=Z+l 0, x G (a,b), y G (c, d). dz dx 2z-V ^- + 1, 2z - 1 dz dx 1 ' 3z 2 l,i -z--\n\z\=x + C, 2z-Y dz = 1 dx, C GR, must leave the chosen space of functions y invariant, i.e. the images L(y) are also there. To begin, choose e > 0 and S > 0, both small enough so that [t0 - 5, t0 + S] x [y0 - e, y0 + e] = Vet/, and consider only those functions y(i) which satisfy for J = [to —5, t0 +S] the estimate maxt6j \y(t)—yo\ < £■ The uniform continuity of f(t, y) on V ensures that fixing e and further shrinking S, implies max \L(y)(t) - y0\ < e. Finally, the above estimate for \\L(y) — L(z)\\ shows that if S is decreased sufficiently further, then the latter constant D becomes smaller than one, as required for a contraction. At the same time, L maps the above space of functions into itself. However, for the assumptions of the Banach contraction theorem, which guarantees the uniquely determined fixed point, completeness of the space X of functions on which the operator L works is needed. Since the mapping f(t, y) is continuous, there follows a uniform bound for all of the functions y(t) considered above and the values t > s in their domain: \L(y)(t) - L(y)(s)\ < J \f(s,y(s)\ds < A\t - s\ with a universal constant A > 0. Besides the conditions mentioned above, there is a restriction to the subset of all equicontinuous functions in the sense of the Definition 7.3.15. According to the Arzela-Ascoli Theorem proved in the same paragraph at the page 499, this set of continuous functions is already compact, hence it is a complete set of continuous functions on the interval. Therefore, there exists a unique fixed point y(t) of this contraction L by the Theorem 7.3.9. This is the solution of the equation. It remains to show the existence of a maximal interval I = (to — a, t0 + b). Suppose that a solution y(t) is found on an interval (to,ti), and, at the same time, the one-sided limit y1 = limt_i.tl_ y(t) exists and is finite. It follows from the already proven result that there exists a solution with this initial condition (t 1, yi), in some neighbourhood of the point t1. Clearly, it must coincide with the discussed solution y(t) on the left-hand side of t1. Therefore, the solution y(t) can be extended on the right-hand side of 11. There are only two possibilities when the extension of the solution behind t1 does not exist: either there is no finite left limit y(t) at ti, or the limit yi exists, yet the point (f 1, yi) is on the boundary of the domain of the function /. In both cases, the maximal extension of the solution to the right of to is found. The argumentation for the maximal solution left of to is analogous. □ I z — j In I Cz I = x, C^0. 569 CHAPTER 8. CALCULUS WITH MORE VARIABLES Now, we must get back to the original variable y in one of these forms. The general solution can be written as lx + ly- x + y\ = x + C, C e i. e., a; - 2y + In | a; + y | = C, C G At the same time, we have the singular solution y = —x, which follows from the constraint z ^ 0 of the operations we have made (we have divided by the value 3z). □ 8.J.6. Solve the differential equation xy' + y In x = y In y. Solution. Using the substitution u = y/x, every homogeneous differential equation y' = f (y/x) can be transformed to an equation (with separated variables) u' = -\ (f(u) - u), i.e. u'x + u = f(u). The name of this differential equation is comes from the following definition. A function / of two variables is called homogeneous of degree k iff f(tx,ty) = tkf(x,y). Then, a differential equation of the form P(x, y) dx + Q(x, y) dy = 0 is a homogeneous differential equation iff the functions P and Q are homogeneous of the same degree k. For instance, we can discover that the given equation x dy + (y In x — y In y) dx = 0 is homogeneous. Of course, it is not difficult to write it explicitly in the form y' = 2 In 2. ^ XX The substitution u = y/x then leads to du dx u'x + u = u lnu, x = u (lnu — 1) , 8.3.7. Iterative approximations of solutions. The proof of the previous theorem can be reformulated as an iterative procedure which provides approximate solutions using step-by-step integration. Moreover, an explicit estimate for the constant C from the proof yields bounds for the errors. Think this out as an exercise (see the proof of Banach fixed-point theorem in paragraph 7.3.9). It can then be shown easily and directly that it is a uniformly convergent sequence of continuous functions, so the limit is again a continuous function (without invoking the complicated theorems from the seventh chapter). Picard's approximations Theorem. The unique solution of the equation y' = f(t,y) whose right-hand side f has continuous partial derivatives can be expressed, on a sufficiently small interval, as the limit of step-by-step iterations beginning with the constant function (Picard's approximation): Vo(t)=yo, yn+i(t) = L(yn), n = l,.... It is a uniformly converging sequence of differentiable functions with differentiable limit y(t). du dx u (lnu — 1) x ' Only the Lipschitz condition is needed for the function /, so the latter two theorems are true with this weaker assumption as well. It is seen in the next paragraph that continuity of the function / guarantees the existence of the solution. Yet it is insufficient for the uniqueness. 8.3.8. Ambiguity of solutions. We begin with a simple example. Consider the equation y Separating the variables, the solution is y(t) = \{t + c)\ for positive values y, with an arbitrary constant C and t + C > 0. For the initial values (to, yo) with yo 0, this is an assignment matching the previous theorem, so there is locally exactly one solution. The solution must apparently remain non-decreasing, hence for negative values yo. the solution is the same, only with the opposite sign and t + C < 0. However, for the initial condition (to,yo) = (to,0), there is not only the already discussed solution continuing to the left of to and to the right, but also the identically zero solution y(t) = 0. Therefore, these two branches can be glued arbitrarily (see the diagram, where the thick solution can be continued along the t axis and branch along the parabola at any value t.) 570 CHAPTER 8. CALCULUS WITH MORE VARIABLES where u(lau — 1) ^ 0. Using another substitution, namely t = In u — 1, we can integrate du u (Inu — 1) dt T In 1t1 = In I x I + In | C |, In I In u — 1 I = In Cx \, lnu — 1 = Cx, V In- x V : -Cx + l, xtCx+1, dx x dx x C^O, C^O, c^o, c^o, c^o. The excluded cases u = 0 and In u = 1 do not lead to two more solutions sinceu = 0 implies y = 0, which cannot be put into the original equation. On the other hand, In u =1 gives y/x = e, and the function y = ex is clearly a solution. Therefore, the general solution is y = xeCx+1, CeR. □ 8.J.7. Compute V I _ 4x+3j/+l 3x+2j/+1 ' Solution. In general, we are able to solve every equation of the form ax + by + c (1) , = f ( ax + by + c \ V J\Ax + By + c) Ax + By + C, If the system of linear equations (2) ax + by + c = 0, Ax + By + C = 0 has aunique solution x0, y0, then the substitution u = x—x0, v = y — Ho transforms the equation (1) to a homogeneous equation dv _ £ 1 au-\-bv du J \Au+Bv If the system (2) has no solution or has infinitely many solutions, the substitution z = ax + by transforms the equation (1) to an equation with separated variables (often, the original equation is already such). In this problem, the corresponding system of equations 4x + 3y + 1 = 0, 3a; + 2y + 1 = 0 has a unique solution xq = — 1, yo = 1- The substitution u = x+1, v = y — 1 then leads to the homogeneous equation Nevertheless, the existence of a solution is guaranteed by the following theorem, known as Peano existence theorem: uk'Sai"'^^^. Theorem. Consider a function j(t,y) : R2 —> R which is continuous on an open set U. Then for every point (to,yo) £ U D R2, there exists a solution of the equation y' = f(t,y) locally in some neighbourhood oft0. Proof. The proof is presented only roughly, with the details left to the reader. We construct a solution to the right of the dv du 4u+3v ' 3u+2v ' initial point to. For this purpose, select a small step h > 0 and label the points tk = t0 + kh, A; = 1,2,.... The value of the derivative f(to,yo) of the corresponding curve of the solution (t,y(t)) is defined at the initial point (to, yo), so a parametrized line with the same derivative can be substituted: V(o)(t) = yo + f(to,yo)(t-t0). Label y± = y^0) (h)- Construct inductively the functions and points y(k)(t) =yk + f(xk,yk)(t - tk), yk+1 = y{k)(tk+1). Now, define yh(t) by gluing the particular linear parts, i.e., yh(t) = y(k)(t) ifte[kh,(k + \)h]. This is a continuous function, called the Euler's approximation of the solution. It "only" remains to prove that the limit of the functions yh for h approaching zero exists and is a solution. For this, one must observe (as done already in the proof of the theorem on uniqueness and existence of the solution) that f(t,y) is uniformly continuous on a sufficiently small neighbourhood U where the solution is sought. For any selected e > 0, a sufficiently small S such that \ f(t,y) — f(s,z)\ < e, exists whenever \\(t-s,y-z)\\ ^ 0. Making simple rearrangements, the general solution can be expressed as (x + y)(2x + y + l) = D, fl/0. Now, let us return to the condition z2 + 3z + 2 ^ 0. It follows from z2 + 3z + 2 = 0 that z = — 1 or z = —2, i. e., v = —u or v = —2u. For v = —u, we have x = u — 1 and y = v + 1 = — u + 1, which means that y = —x. Similarly, for v = —2u, we have y = —2u + 1, hence y = —2x — 1. However, both functions y = —x, y = —2x — 1 satisfy the original differential equations and are included in the general solution for the choice D = 0. Therefore, every solution is known from the implicit form (x + y) (2x + y + 1) = D, D e R. □ 8.J.8. Find the general solution of the differential equation (x2 + y2) dx — 2xy dy = 0. Solution. For y ^ 0, simple rearrangements lead to ■ _ _ i+(f)2 V 2xy 2* Using the substitution u = y/x, we get to the equation the constructed continuous functions yu are all in a compact set of functions. So there exists a sequence of values hn —> 0 such that the corresponding sequence of functions y/j„ converges uniformly to a continuous function y(n). Write yn(t) = yhn (t), i.e. yn -> y uniformly. For each of the continuous functions y^, there are only finitely many points in the interval [to, t] where it is not dif-ferentiable, so yn(t) = y0 + / y'n(s)ds- J to On the other hand, the derivatives on the particular intervals are constant, so (here, k is the largest such that to + khn < t, while yj and tj are the points from the definition of the function yhn) yn(t) =y0 + ^2 J f(tj,yj)ds+ J f(tk,yk)ds. j=0 *3 Instead, the equation Vn(t)=yo+ f(s,yn(s))ds J to is wanted, but the difference between this integral and the last two terms in the previous expression is bounded by the possible variation of the function values f(t, y) and the lengths of the intervals. By the universal bound for f(t, y) above, the last integral can be used instead of the actual values in the limit process linire^oo yn(t), thereby obtaining y(t)= lim(y0+ / f(s,yn(s))ds) ■J to = yo+ / ( lim f(s,yn(s)))ds U X + u = yo + / f(s,y(s))ds, J to where the uniform convergence yn(t) —> y(t) is employed. This proves the theorem. □ 8.3.9. Coupled first-order equations. The problem of find-^ ing the solution of the equation y' = /(x,y)can also be viewed as looking for a (parametrized) curve (x(t),y(t)) in the plane where the parametrization of the variable x(t) = t is fixed beforehand. If this point of view is accepted, then this fixed choice for the variable x can be forgotten, and the work can be carried out with an arbitrary (finite) number of variables. In the plane, for instance, such a system can be written in the form x' = f{t,x,y), y' = g(t,x,y) with two functions /, g : R3 —> R. A simple example in the plane might be the system of equations x' = -y, y' = x. 572 CHAPTER 8. CALCULUS WITH MORE VARIABLES For ti/il and D = — 1/C, we have du _ 1 + u2 - 2u2 dx 2u ' 2u dx ■ du = —, 1 — w -In I 1 -u2 I = In I rr I + ln|C|, C^O, In- = In I Ca;|, C^O, 1 1 -u2 l = Cx\l-D Cx, C ^ 0, C^O, y \-a—, D^O, x ar -Dx = x2 -y2, D^O. The condition u = ±1 corresponds to y = ±a\ While y = 0 is not a solution, both the functions y = x and y = —x are, solutions and can be obtained by the choice D = 0. The general solution is thus y2 = x2 + Dx, D e R. □ It is easily guessed (or at least verified) that there is a solution of this system, x(t) = Rcost, y(t) = Rsvut, with an arbitrary non-negative constant R, and the curves of the solution are exactly the parametrized circles with radius R. In the general case, the vector notation of the system can be used in the form x' = f(t,x) for a vector function a; : R —> R™ and a mapping / : Rn+1 —> R™. The validity of the theorem on uniqueness and existence of the solution to such systems can be extended: Existence and uniqueness for systems of ODEs Theorem. Consider functions f{(t, x1,... ,xn) : Rn+1 —> R i = 1,..., n, with continuous partial derivatives. Then, for every point (t0, y) £ Rn+1, y = (ci,..., cn), there exists a maximal interval [t0 —a, t0 +b], with positive numbers a, b £ R, and a unique function x(t) : R —> Rn which is the solution of the system of equations x[ = fi(t,xu ...,xn) 8.J.9. Solve _2y_ Solution. The given equation is of the form y' = a(x)y + b(x), i. e., a non-homogeneous linear differential equation (the function b is not identically equal to zero). The general solution of such an equation can be obtained using the method of integration factor (the non-homogeneous equation is multiplied by the expression e~ J"a^ dx) or the method of variable separation (the integration constant that arises in the solution of the corresponding homogeneous equations is considered to be a function in the variable a;). We will illustrate both of these methods on this problem. As for the former method, we multiply the original equation by the expression C x+l ' where the corresponding integral is understood to stand for any antiderivative and where any non-zero multiple of the obtained function can be considered (that is why we could remove the absolute value). Thus, consider the equation i 2V — x(x~1) y x+i o+i)2 x+i ■ The core of the method of integration factor is that fact that the expression on the lei Integrating this leads to the expression on the left-hand side is the derivative of y fq^. Xn — fn{t,X\, . .. ,xn with the initial condition x(to) = y, i.e. Xl(t0) = Ci, ... ,xn(t0) = Proof. The proof is almost identical to the one of the jji i, existence and uniqueness of the solution for a single equation with a single unknown function as shown in Theorem 8.3.6. The unknown function x(t) = ^ (xi(i),... ,xn(t)) is a curve in R™ satisfying the given equation, so its components Xi (t) are again expressed in terms of integrals Xi(t) = Xi(t0) + / x/i(s)ds = ci+ / fi(s,x(s)) ds. J to J to We work with the integral operator y L(y), this time mapping curves in R™ to curves in R™. It is desired to find its fixed point. The proof proceeds in much the same way as in the case 8.3.6. It is only necessary to observe that the size of the vector \\f(t,z1,...,zn)-f(t,y1,...,yn)\\ is bounded from above by the sum ||/(Mi,...^) - f(t,y1,z2 ... ,zn)\\ + ... + \\f(t,yu.. .,yn-uzn) - /(i,yi,... ,y„)||. It is recommended to go through the proof of Theorem 8.3.6 from this point of view and to think out the details. □ 573 CHAPTER 8. CALCULUS WITH MORE VARIABLES yf^T = S^TTldx = ^-2x + 21n|x + l|+C, Cg Therefore, the solutions are the functions y=z±±{iL2--2x + 2\n\x + l\+C As for the latter method, we first solve the corresponding homogeneous equation which is an equation with separated variables. We have dy 2y dx dy y ar 2 xz - 1 r ■ dx, In I y I = — In I a; — l|+ln|a; + l|+ln|C|, x + 1 In I y I In y: c- c x - 1 x + l x-V where we had to exclude the case y = 0. However, the function y = 0 is always a solution of a homogeneous linear differential equation, and it can be included in the general solution. Therefore, the general solution of the corresponding homogeneous equation is Now, we will consider the constant C to be a function C(x). Differentiating leads to y C'(x) (x+l)(x-l)+C(x) (x-l)-C(x) (x+l) O-i)2 Substituting this into the original equation, we get C'(x) (x+l)(x-l)+C(x) (x-l)-C(x) (x+l) (x-1)2 It follows that 2 C(x) (x+l) (x-l)(x2-l) ■ C(a _ x(x — l) C(x) = x(x — 1) x + l dx, C(x) = — - 2a; + 21n|a; + 1 | + C, C g Now, it suffices to substitute: y ■ x+l / x_ x-1 \ 2 2a; + 2 In | x + 1 | + Cj , C g R. We can see that the result we have obtained here is of the same form as in the former case. This should not be surprising as the differences between the two methods are insignificant and the computed integrals are the same. 8.3.10. Example. When dealing with models in practice, it is of interest to consider the qualitative behaviour of the solution in dependence on the initial conditions and free parameters of the system We consider a simple example of a system of first-order equations from this point of view. The standard population model "predator - prey", was introduced in the 1920s by Lotka and Volterra. Let x(t) denote the evolution of the number of individuals in the prey population and y(t) for the predators. Assume that the increment of the prey would correspond to the Malthu-sian model (i.e. exponential growth with coefficient a) if they were not hunted. On the other hand, assume that the predator would only naturally die out if there were no prey (i.e. exponential decrease with coefficient 7). Further, consider an interaction of the predator and the prey which is expected to be proportional to the number of both with a certain coefficient /3, which is in the case of the predator, supplemented by a multiplicative coefficient expressing the hunting efficiency. lotka-volterra model This is a system of two equations, x models the pray, y the predator, with positive constants a, /3, 7, S x' = ax — Pyx y' = - jy + &Pxy- The diagram illustrates one of typical behaviours of such dynamical systems - the existence of closed orbits on which the system moves in time. These are the thick black ovals, while the "comets" indicate the field at the individual points (i.e. their expected movement). The left illustration corresponds to a = 1, (3 = 1, 7 = 0.3, S = 0.3 and the initial condition (a;0,yo) = (1,0.5) atfo = 0 for the solution, while the other illustration comes with a = l, /3 = 2, 7 = 2, <5 = 1 and (x0,y0) = (1,1.5) In both cases, the system is quite stable in the vicinity of the initial condition, and it would be very stable for (xq , yo) = (1,1) or (1, 0.5), respectively. But their developement differs in speed — the depicted solution cycles close in the times about t = 12 in the first case and t = 5 in the other one. 574 CHAPTER 8. CALCULUS WITH MORE VARIABLES Finally, we can notice that the solution y of an equation y' = a(x)y can be found in the same way for any continuous function a. We thus always have y = Chf a(-x1 dx, CgR. Similarly, the solution of an equation y' = a(x)y + b(x) with an initial condition y(x0) = y0 can be determined explicitly as (provided the coefficients, i. e. the functions a and b, are continuous) P alt) dt ( f-x — f a(s) ds ,,\ V = eJl" [y0 + JXo b(t) e J-o diJ . Let us remark that the linear equation has no singular solution, and the general solution contains a C g R. □ 8.J.10. Solve the linear equation (y' + 2xy) ex = cos x. Solution. If we used the method of integration factor, we would only rewrite the equation trivially since it is already of the desired form - the expression on the left-hand side is 2 the derivative of y ex . Thus, we can immediately calculate [ye j = cos x, yzx- = j cosxdx, y ex2 = sin x + C, C g R, y = e"x2 (sinx + C) , C g R. □ 8.J.11. Find all non-zero solutions of the Bernoulli equation v' ~ x = 3xy2-Solution. The Bernoulli equation y' = a(x)y + b(x)yr, r/0,r/l,r£R can be solved by first dividing by the term yr and then using the substitution u = y1_r, which leads to the linear differential equation u' = (1 - r) [a(x)u + b(x)] . In this very problem, the substitution u = y1_2 = l/y gives u' -\- — — -3x. X Similarly to the previous exercise, we have It is interesting that the same model captures quite well the development of the unemployment rate in population, considering the employees to be the predators, while the employers play the role of the prey. Much information about this and other models can be found in the literature. 8.3.11. Stability of systems of equations. In order to illustrate the stability questions, we discuss just one basic theorem only. R We are interested in the continuity with re-spect the to norm on the space of functions (i.e. the supremum norm, see 7.3.5). According to the theorem below, the assumption that the partial derivatives of the functions defining the system are continuous (in fact, it suffices to have them Lipschitz), guarantees the continuity of the solutions in dependence on the initial conditions as well as the defining equations themselves. Note however, that as the distance of t from the initial value to grows, then the error estimates grow exponentially! Therefore, this result is of a strictly local character. It is not in contradiction with the example of the unstably behaving equation y' = ty illustrated in paragraph 8.3.3.7 Consider two systems of equations written in the vector form (1) x' = f(t,x), y' = g(t,y) and assume that the mappings f,g : U C Rn+1 —> R™ have continuous partial derivatives on an open set U with compact closure. Such functions must be uniformly continuous and uniformly Lipschitz on U, so there are the finite values C= sup \f(^)-g(t,y)\ x^y; (t,x), (t,y)eU \x V\ B= sup \f(t,x) -g(t,x)\ (t,x)eu With this notation, the fundamental theorem can be formulated: Theorem. Let x(t) andy(t) be two fixed solutions x'= f(t,x(t)), y' = g(t,y(t)) of the systems (1) considered above, given by initial conditions x(t0) = x0 andy(t0) = yo- Then, \x(t) - y{t)\ < \xo - Uo\ e' CI*—to I + B (iO\t-to\ -1 Proof. Without loss of generality, to = 0. From the expression of the solutions x(t) and y(t) as fixed v,i, points of the corresponding integral operators follows the estimate \x(t) - y(t)\ < \x0 - y0\ + / \f(s,x(s)) - g(s,y(s))\ds. Jo u = e [f-3xeln^xUx] , Much more information can be found for example in Gerald Teschl's book Ordinary Differential Equations and Dynamical Systems, Graduate Studies in Mathematics, Volume 140, Amer. Math. Soc, Providence, 2012. 575 CHAPTER 8. CALCULUS WITH MORE VARIABLES In , ■ e 1 U =--: X -3x eln 1 x dx -3a; \ x\dx where In | x was obtained as an (arbitrary) antiderivative The integrand can be further estimated as follows: to 1/x. Furhter, \fM*)) ~ s(*.2/W)l < < \f(s,x(s)) - f(s,y(s))\ + \f(s,y(s)) - g(s,y(s))\ < C\x(s) - y(s)\ + B If F(t) = \x(t) — y(t)\, a = \x0 — y0\, then F(t) 0. Then ft t 8.J.12. Interchanging the variables, solve the equation F(t)dr ds). We have thus obtained a linear differential equation. Now, we can easily compute its general solution The inte8rand is a derivative: Wdrds) Jo ds 576 CHAPTER 8. CALCULUS WITH MORE VARIABLES K. Practical problems leading to differential equations 8.K.I. A water purification plant with volume 2000 m3 was contaminated with lead which is spread in the water with density 10g/m3. Water is flowing in and out of the basin at 2 m3/s. In what time does the amount of lead in the basin decrease below 10 /ig/m3 (which is the hygienic norm for the amount of lead in drinkable water by a regulation of the European Community) provided the water keeps being mixed uniformly? Solution. Let us denote the water's volume in the basin by V (m3), the speed of the water's flow by v (m3/s). In an infinitesimal (infinitely small) time unit dt, y ■ v dt grams of lead runs out of the basin, so we can construct the differential equation m dm = — — ■ v dt for the change of the lead's mass in the basin. Separating the variables, we get the equation dm v , — = --dt. m V Integration both sides of the equation and getting rid of the logarithms, we get the solution in the form m(t) = m0e~ v*; where m0 is the lead's mass at time t = 0. Substituting the concrete values, we find out that t = 6 h 35 min. □ 8.K.2. The speed of transmission of a message in a population consisting of P people is directly proportional to the number of people who have not heard the message yet. Determine the function / which describes the dependency of the number of people who have heard the message on time. Is it appropriate to use this model of message transmission for small or large values of PI Solution. We construct a differential equation for /. The speed of the transmission ^ = f'(i) should be directly proportional to the number of people who have not heard of it, i. e. the value P — f(t). Altogether, df dt = k(P-f(t)). Separating the variables and introducing a constant K (the number of people who know the message at time / = 0 must be P — K), we get the solution f(t) = P- Ke~kt, where k is a positive real constant. Apparently, this model makes sense for large values of P only. □ and the second proposition of the lemma is also proved. □ Now, the proof of the theorem about continuous dependency on the parameters is easily finished. The bound F(t) < a + Jg(C F(s) + B)ds is already obtained, and using a slightly modified function F(t) = F(t) + |l, this yields F(t) <§ + a+ / CF(s)ds. Jo This is the assumption of Gronwall's inequality with even constant parameters, so by the second claim of the lemma, F(t) + §< (a + §)eticds, which is the statement F(t) < aect+§(ect-l) as desired. □ The continuous dependency on both the initial conditions and the potential further parameters in which the function / would be Lipschitz-continuous follows immediately from the statement of the theorem. The extremely simple equations in one variable x' = ax, where a are small constants, with their exponential solution x(t) = eat show that better general results cannot be expected. 8.3.12. Differentiable dependance. In practical problems, the differentiability of the obtained solutions is often of interest, especially with regard to the initial conditions or other parameters of the system. In the general vector notation of the system of ordinary equations y' = f(t,y), it can always be supposed that the vector function does not depend explicitly on t. If it does, then another variable yo can be added to the other variables y±,... ,yn. Then there is the same system of equations for the curve y'(t) = (yo(t),yi(t),...,yn(t)) as 2/o = l y[ = /i(yo,yi, • • • ,yn) y'n = fn(yo,yi,-- -,yn) with the initial conditions yo(Zo) =/o, yi(Zo) =x1,...,yn(t0) =xn. Such systems, which do not explicitly depend on time, are called autonomous systems of ordinary differential equations. Without loss of generality, we deal with autonomous systems in finite dimension n, dependent on parameters A and with initial conditions (1) y' = f(yA), y(to) = x. 577 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.K.3. The speed at which an epidemic spreads in a given closed population consisting of P people is directly proportional to the product of the number of people who have been infected and the number of people who have not. Determine the function j(t) describing the number of infected people in time. Solution. Just like in the previous problem, we construct a differential equation: ft=k-f(t) (P-/(*)). Again, separating the variables and introducing suitable constants K and L, we obtain fit) = K □ 8.K.4. The speed at which a given isotope of a given chemical element decays is directly proportional to the amount of the given isotope. The half-life of the isotope of plutonium 949Pu is 24,100 years. In what time does a hundredth of a nuclear bomb whose active component is the mentioned isotope disappear? Solution. Denoting the amount of plutonium by m, we can build a differential equation for the rate of the decay: dm - = k ■ m, dt where k is an unknown constant. The solution is thus the function m(t) = m0e~kt. Substituting into the equation for half-life(e-fe* = ±), we get the constant k = 2.88 ■ 105. The wanted time is then approximately 349 years. □ 8.K.5. The acceleration of an object falling in a constant gravitational field with a certain resistance of the environment is given by the formula dv — = q — kv, dt y where k is a constant which expresses the resistance of the environment. An object was dropped in a gravitational field with g = 10 ms"2 at the initial speed of 5 ms-1, the resistance constant is k = 0.5 s-1. What will the speed of the object be in three seconds? Solution. -!-(!-)«-• v(3) = 20 — 15e~2 ms-1 after substitution. □ Without loss of generality, consider the initial value to = 0, and write the solution with y(0) = x in the form y(t, x, A) to emphasize the dependency on the parameters. For fixed values of the initial conditions (and the potential parameters A), the solution is always once more differen-tiable than the function /. This can be derived inductively by applying the chain rule. If / is continuously differentiable and y(t) is a solution, then (use the matrix notation where the Jacobi matrix D1f(y) of the mapping / : R™ —> R™ is multiplied with the column vector y') y" = D1f(y)-y' = D1f(y)-f(y) exists and is continuous. With all the derivatives up to order two continuous, there is an expression for the third derivative: = D2f(y)(f(y)J(y)) + {D^fay))2 ■ f(y). Here, the chain rule is used again, starting with the differential of the bilinear mapping of matrix multiplication and viewing the second derivative as a bilinear object evaluated on y' in both arguments. Think out the argumentation for this and higher orders in detail. Assume for a while that there is a solution y(t,x) of the system (1) which is continuously differentiable in the parameters x £ Rn, i.e. the initial condition as well, and forget about the further parameters A for now. Write 0? O 8.K.9. A 100-gram body lengthens a spring of 5 cm if hung on it. Express the dependency of its position on time t provided the speed of the body is 10 cm/s when going through the equilibrium point. O Further practical problems that lead to differential equations can be found on page 595. L. Higher-order differential equations 8.L.I. Underdamped oscillation. Now, we will describe a simple model for the movement of a solid object attached to a point with a strong spring. If y(t) is the deviation of our object from the point yo = y(0) = 0, then we can assume that the acceleration y"(t) in time t is proportional to the magnitude of the deviation, yet with the other sign. The proportionality constant k is called the spring constant. Considering the case k = 1, we get the so-called oscillation equation y"(t) = -y{i). This equation corresponds to the system of equations x'(t) = -y(t), y'(t) = x(t) Differentiability of the solutions Theorem. Consider an open subset U C Rn+fe and a mapping f : U —> Rn with continuous first derivatives. Then, a system of differential equations dependent on a parameter A £ Rfe with initial condition at a point x £ U y' = f(y,\), y(0)=x has a unique solution y(t, x, A), which is a mapping with continuous first derivatives with respect to each variable. Proof. Consider a general system dependent on parameters, but viewed as an ordinary autonomous system with no parameters. More explicitely, consider the parameters to be additional space variables and add ■I (vector) conditions \'(t) = 0 and A(0) = A. Therefore, the theorem is proved for autonomous systems with no further parameters. There is dependency on the initial conditions. Just as in the proof of the fundamental existence theorem 8.3.6, build on the expression of the solutions as fixed points of the integral operators and prove that the expected derivative, as discussed above, enjoys the properties of the differential. Fix a point x0 as the initial condition, together with a small neighbourhood x0 £ V, which if necessary can be further decreased during the following estimates, so that \f(y)-f(z)\ TT. points of the integral operators. Next, exploit the definition of the mapping 0, there is a bound \\h\\ < S for which the remainder R satisfies \\R(y(t,x0 + h),y(t,x0))\\ < e\\y(t,x0 + h) - y(t,x0)\\ < \\h\\eeCT. Therefore, the estimate on G(t, h) can be improved as follows: G{t,h)T. This implies that lim^o ^G(t, h) = 0 as requested. □ In the same way, it can be proved that continuous differ-. entiability of the right-hand side up to order k (inclusive) guarantees the same order of differentiability of solutions in all input parameters. 8.3.13. The analytic case. Let us pay additional attention to the case when the right hand side / of the system of equations (1) y' = f(y), y(to) = yo 580 CHAPTER 8. CALCULUS WITH MORE VARIABLES of the spring, and other factors) which is initiated by an outer force. The function f(t) can be written as a linear combination of Heaviside's function u(t) and its shift, i. e., /(*) =cos(2t)(u(t)-uff(t)) Since C{y"){s) = s2C{y) - sy(0) - y'(0) = s2C{y) + 1, we get, applying the results of the above exercises 7 and 8 to the Laplace transform of the right-hand side s2C(y) + 1 + 4C(y) = £(cos(2i)(u(i) - uw(t))) = £(cos(2i) ■ u(t)) - £(cos(2i) ■ u^t)) = £(cos(2f)) - e-*s£(cos(2(i + tt)) (l-e-™)- Hence, i ■2+4' + (l-e- s2 + 4 v y(s2 + 4)2' Performing the inverse transform, we obtain the solution in the form s y(t) = -I sin(2i) + \t sin(2i) + £_1 ( e However, by formula (1), we have (s2 + 4)2 (s2+4)2;=^"1(e"^(ísin(2í))) = (t--ir) sin(2(i - tt)) ■ H^(t). Since Heaviside's function is zero for t < tt and equal to 1 for t > tt, we get the solution in the form J-±sin(2i) + ±/jsin(2i) for 0 < t < tt j^sin^f) forf>7r □ 8.L.3. Find the general solution of the equation y'" - 5y" - 8y' + 48y = 0. Solution. This is a third-order linear differential equation with constant coefficients since it is of the form 2/(") + aiy^"1) + a2y(n-2) + • • • + an-iy' + any = f(x) for certain constants ai,..., an e K. Moreover, we have f(x) = 0, i. e., the equation is homogeneous. First of all, we will find the roots of the so-called characteristic polynomial \n + ai A™"1 + a2Are-2 + ■ ■ ■ + a„_iA + an. Each real root A with multiplicity k corresponds to the k solutions is analytic in all arguments (i.e. a convergent multidimensional power series f(y) = J2^\=o iisee 8-L15)-Exactly as in the previous discussion, we may hide the time variable t as well as further parameters in the variables. The famous theorem below says that the solution of the most general system with analytic right-hand side is analytic in all the parameters as well (including the initial conditions). ODE version of Cauchy-Kovalevskaya Theorem Theorem. Assume f(y) is a real analytic vector valued function on a domain in Rn and consider the differential equation (1). Then the unique solution of this initial problem is real analytic, including the dependancy on the initial condition. Proof. The idea of the proof is identical as in the simple one-dimensional case in 6.2.15. As we saw in the beginning of the previous paragraph, there are universal (multidimensional) polynomial expressions for all derivatives of the vector function y(t) in terms of the partial derivatives of the vector function /. If we expand them in terms of the individual partial derivatives of the mapping / all of their coefficients are obviously non-negative. Let us write again y(fc)(0) = ft(/(y(0)),...,^/(y(0)),...) for these multivariate vector valued polynomials (the multi-indices (3 in the arguments are all of size up to k — 1). Without loss of generality we may consider the initial condition to = 0, y(0) = 0. Indeed, constant shifts of the variables (say z = y — yo, x = t — t0) transform the general case to this one. Once we know that the components of the solution are power series, the transformed quantities will be analytic too, including the dependancy on the values of the incital conditions. In order to prove that the solution to the problem y' = f(y), y(0) =0 is analytic on a neighborhood of the origin, we shall again look for a majorant g for the vector equation y' = f(y), i.e. we want an analytic function on a neighborhood of the origin 0 6 R" with dag(0) > \daf(0)\, for all multi-indices a. Then, by the universal computations of all the coefficients of the power series y(t) = J2T=o j^y^ (®)tk solving potentially our problem, and similarly for z' = g(z), the convergence of the series for z implies the same for y: zW(0) = Pk(g(0),...,df)g(0),...) >Pk(\f(0)\,..., |fy/(0)|) > |y(fc)(0)|- As usual, knowing already how to find a majorant in a simpler case, we try to apply a straightforward modification. By the analycity of /, for r > 0 small enough there is a constant C such that |-g9Q/i(0)rlQl | < C, for all i = 1,... ,n and mutli-indices a. This means \dafi(Q)\ < C-^j. In the 1-dimensional case, we considered the multiple of a geometric series g(z) = C-^ with the right 581 CHAPTER 8. CALCULUS WITH MORE VARIABLES and every pair of complex roots A = a ± i/3 with multiplicity k corresponds to the k pairs of solutions eax cos (fa) , x eax cos (fa) ,..., a^"1 eax cos (fa) , eax sin (fa) , x eax sin (fa) x*'1 eax sin (fa) . Then, the general solution corresponds to all linear combinations of the above solutions. Therefore, let us consider the polynomial A3 - 5A2 - 8A + 48 with roots Ai = A2 = 4, A3 = —3. Since we know the roots, we can deduce the general solution as well: y = C\eix + C2xeix + C3e~3x, C\,C2,C3eR. □ 8.L.4. Compute y'" + y" + 9y' + 9y = ex + 10 cos (3a). Solution. First, we will solve the corresponding homogeneous equation. The characteristic polynomial is equal to A3 + A2 + 9A + 9, with roots Ai = — 1, A2 = 3i, A3 = —3i. The general solution of the corresponding homogeneous equation is thus y = C1e-x + C2cos(3x) + C3sm(3x), Ci,C2,C3 G R. The solution of the non-homogeneous equation is of the form y = Cie-* + C2 cos (3a;) + C3 sin (3a;) + yp, d, C2, C3 G R for a particular solution yp of the non-homogeneous equation. The right-hand side of the given equation is of a special form. In general, if the non-homogeneous part is given by a function Pn(x)taX, where Pn is a polynomial of degree n, then there is a particular solution of the form yp = xkRn(x) eax, where k is the multiplicity of a as a root of the characteristic polynomial and Rn is a polynomial of degree at most n. More generally, if the non-homogeneous part is of the form eax [Pm(x) cos (fa) + Sn(x) sin (fa)] , where Pm is a polynomial of degree m and Sn is a polynomial of degree n, there exists a particular solution of the form yp = xkeax [Ri(x) cos (fa) + 7] (a) sin (fa)], derivatives g^ = C^. Now the most similar mapping is g(z1,...,zn) = (g1(z1,...,zn),...,gn(z1,...,zn)) with all the components gi equal to h h(z1, ...,zn) = C- r - zi-----zn Then the values of all the partial derivatives with \a\ = fc at z = 0 are dah(0) = Crk!(r-Zl -k-l z=0 exactly as suitable. (Check the latter simple computation yourself!) So it remains to prove that the majorant system z' = g(z) has got the converging power series solution z. Obviously, by the symmetry of g (all compoments equal to the same h and h is symmetric in the variables z{), also the solution z with z(0) = 0 must have all the components equal (the system does not see any permutation of the variables z{ at all). Let us write Zi(t) = u(t) for the common solution components. With this ansatz, u'(t) = h(u(t),u(t),u(t)) = C- r — nu(t) This is nearly exactly the same equation as the one in 6.2.15 and we can easily see its solution with u(0) = 0: u = -I 1 - n 1 - InCt Clearly, this is an analytic solution and the proof is finished. □ 8.3.14. Vector fields and their flows. Before going to higher-order equations, pause to consider systems of first-order equations from the geometrical point of view. When drawing illustrations of solutions earlier, we already viewed the right hand side of an autonomous system as a "field of vectors" /(a) G R™. This shows how fast and in which direction the solution should move in time. This can be formalized. A tangent vector with a footpoint a G Rn is a couple (a, »)6l"x Rn. The set of all vectors with footpoints in an open set U C R™ is called the tangent bundle TU, with the footpoint projection p : (a, v) i-> a. A vector field X defined on an open set U C R™ is a mapping X : U —> TU which is a section of the projection p, i.e., p o X = id ij. The derivative in the direction of the vector field X is defined for all differentiable functions g on U by X(g) :U^R, X(g)(x) = dx(x)g = dg(x)(X(x)). So the vector field X is a first order linear differential operator mapping functions into functions. Apply pointwise the properties of directional derivative to obtain the derivative rule (also called the Leibniz rule) for products of functions: (1) X(gh) = hX(g)+gX(h). In fixed coordinates, X(x) = (X1 (a),... Xn(x)) and X(g)(x) *l(a)^(a) + . + X„(a)^-(a) 582 CHAPTER 8. CALCULUS WITH MORE VARIABLES A - ±- A — 20- where k is the multiplicity of a + i/3 as a root of the characteristic polynomial and Ri, Ti are polynomials of degree at most / = max {m, n}. In our problem, the non-homogeneous part is a sum of two functions in the special form (see above). Therefore, we will look for (two) corresponding particular solutions using the method of undetermined coefficients, and then we will add up these solutions. This will give us a particular solution of the original equation (as well as the general solution, then). Let us begin with the function y = ex, which has particular solution ypi (x) = Atx for some 4eR. Since vvi (x) = y'P1 (x) = y'v\ (x) = y'p'i (x) = A&x^ substitution into the original equation, whose right-hand side contains only the function y = ex, leads to 20Aex = ex, i. e. For the right-hand side with the function y = 10 cos (3x), we are looking for a particular solution in the form Vp2 (x) = X[B cos (3a;) + C sin (3a;)] . Recall that the number A = 3i was obtained as a root of the characteristic polynomial. We can easily compute the derivatives y'p2 (x) ~ cos (3a;) + C sin (3a;) +x [-3B sin (3a;) + 3C cos (3a;)] y" (x) = 2 [-3B sin (3a;) + 3C cos (3a;) +x [-9B cos (3a;) - 9C sin (3a;)] (x) = 3 [-9B cos (3a;) - 9C sin (3a;) +x [27B sin (3a;) - 27C cos (3a;)]. Substituting them into the equation, whose right-hand side contains the function y = 10 cos (3a;), we get -185 cos (3a;) - 18C sin (3a;) - 6B sin (3a;) + 6C cos (3a;) = 10 cos (3a;). Confronting the coefficients leads to the system of linear equations -185 + 6(7 = 10, -18(7-65 = 0 with the only solution B = —1/2 and C = 1/6, i. e., Vp2 (x) — x [~\ cos (3x) + | sin (3a;)] . Altogether, the general solution is y = Cxtrx + C2 cos (3a;) + <73 sin (3a;) + ^ex — -a; cos (3a;) + -a;sin (3a;) , C\, C2, C3 £ R. Clearly, there are the special vector fields with coordinate functions equal to zero except for one function X{ which is identically one. Such a field then corresponds to the partial derivatives with respect to the variable x{. This is also matched by the common notation g|- for such vector fields and in general, X{x)=Xl{x)— + . + Xn(x) dxn Remark. Actually, each derivative on functions, i.e., a linear operator D satisfying (1), is given by a unique vector field. This may be seen as follows. First D(l) = D(l ■ 1) = 2D(1) and, thus, D(c) = 0 for constant functions. Next, each function j(x) can be written on a neighborhood of a point q £ R" as f(x) = /(g) + ftf(q + t(x - q)) dt = /( R™ with inverse diffeomorphisms Fl^t. 583 CHAPTER 8. CALCULUS WITH MORE VARIABLES □ 8.L.5. Determine the general solution of the equation y" + 3y' + 2y = e~2x. Solution. The given equation is a second-order (the highest derivative of the wanted function is of order two) linear (all derivatives are in the first power) differential equation with constant coefficients. First, we solve the homogenized equation y" + 3y' + 2y = 0. Its characteristic polynomial is x2 + 3x + 2 = (x + l)(x + 2), with roots x1 = —1 and x2 = —2. Hence, the general solution of the homogenized equation is c1e~x + c2e~2x, where ci, c2 are arbitrary real constants. Now, using the method of undetermined coefficients, we will find a particular solution of the original non-homogeneous equation. According to the form of the non-homogeneity and since —2 is a root of the characteristic polynomial of the given equation, we are looking for the solution in the form yo = axe~2x for a£l, Substituting into the original equation, we obtain a[-4e~2x+4xe~2x+3(e~2x-2xe-2x)+2xe-2x] = e~2x, hence a = — 1. We have thus found the function — xe~2x as a particular solution of the given equation. Hence, the general solution is the function space c\e~x + c2e~2x — xe~2x, ci,c2eK. □ 8.L.6. Determine the general solution of the equation y" + y' = l. Solution. The characteristic polynomial of the given equation is x2 + x, with roots 0 and —1. Therefore, the general solution of the homogenized equation is c1 + c2e~x, where ci,c2 G R. We are looking for a particular solution in the form ax, a£l (since zero is a root of the characteristic polynomial). Substituting into the original equation, we get a = 1. The general solution of the given non-homogeneous equation is ci + c2e~x + x, ci, c2 G R. □ A simple example of a complete vector field is the field X(x) = Its flow is given by Flf (xi, ...,i„) = (xi +t,x2, ...,i„). On the other hand, the vector field X(t) = t2-^ on the one-dimensional space R is not complete as the solutions x(t) of the corresponding equation dx = x2(iiareoftheforma;(f) = except for the initial condition x(0) = 0, so they "run away" towards infinite values in a finite time. The points x0 in the domain of a vector field X : U C R™ —> R™ where X(x0) = 0 are called singular points of the vector field X. Clearly ¥\f (x0) = x0 for all t at all singular points. 8.3.15. Local qualitative description. The description of vector fields as assigning the tangent vector in the modelling space to each point of the Euclidean space is independent of the coordinates. It follows that the flows exhibit a geometric concept which must be coordinate-free. It is necessary to know what happens to the fields and their flows, when coordinates are transformed. Suppose y = F(x) is such a transformation with F : R™ —> R™ (or on some smaller domain there). Then the solutions x(t) to a system x' = X(x) satisfy x'(t) = X(x(t)), and in the transformed coordinates this reads y'(t) = (F(x(t)))'(t) = D1F(x(t))-x'(t) = D1F{x{t)) ■ X{x{t)). This means that the "transformed field" Y in the new coordinates is Y(F(x)) = D1F(x) ■ X(x). At the same time, the flows of these vector fields are related as follows: Flf oF(x) = FoFlf (x). By fixing x = x0 and writing x(i) = Fl^(a;o), the curve F(x(t)) is the unique solution for the system of equations y' = Y(y) with initial condition yo = F(x0), which equals the right-hand side. The following theorem offers a geometric local qualitative description of all solutions of systems of first order ordinary differential equations in a neighbourhood of each point x which is not singular. The flowbox theorem Theorem. If X is a differentiable vector field defined on a neighbourhood of a point x0 G Rn and X(x0) 0, then there exists a transformation of coordinates F such that in the new coordinates y = F(x), the vector field X is given as the field Tp-. J (77/1 Proof. Construct a diffeomorphism F with the required jjj i. properties, step by step. Geometrically, the essence of the proof can be summarized as follows: first select a hypersurface which goes through the point x0 ]fl and is complementary to the directions X(x) near to x0. Then fix the coordinates on it, and finally, extend them 584 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.L.7. Determine the general solution of the equation to some neighbourhood of the point x0 using the flow of the -2x field X y + 5y + 6y — e . Without loss of generality, move the point x0 to the origin by a translation. Then by a suitable linear transformation on Rn,setX(0) = ^(0). Solution. The characteristic polynomial of the equation is with such coordinates, write the flow of the field X go- x2 + 5x + 6 = (x + 2)(x + 3), its roots are -2 and -3. ing through the point (x1:..., xn) at time t = 0 as Xi(t) = The general solution of the homogenized equation is thus Pi{t,xi,... ,xn),i = \,... ,n. Next, define the components c\e~2x + C2e_3x, ci, c2 £ R. We are looking for aparticular (Z1'• • •'as solutionin the form axe~2x, (-2 is aroot of the characteristic fi(x1,... ,xn) = ipi(x1,0, x2, ■ ■ ■ ,xn). polynomial), a £ R, using the method of undetermined coefficients. Substitution into the original equation yields a = 1. Hence, the general solution of the given equation is OF d -(o,...,o) = - (^M,...,o),...^M,...,o)) This follows the strategy. Since X(0,..., 0) = ^ Cle-2x + c2e-3x+xe-2x. 9x1 dt^° □ 8.L.8. Determine the general solution of the equation y"-y' = 5. = (1,0,. ..,0), while the flow Flff at the time t = 0 yields \ highest derivative) is an equation (1) VW=f(t,y,y' « - fit ,/ J*-1 the solution in the form V = C1{x)y1{x) + C2(x)y2(x) H-----h Cn(x)yn(x), where y±,..., yn give the general solution of the corresponding homogeneous equation and the func- where / is a known function of k + 1 variables, t is the in-tions Ci (x),..., Cn (x) can be obtained from the system dependent variable, and y{t) is an unknown function of one variable. This type of equation is always equivalent to a sys- Ci (x) Vi (x) H-----1- Cn (x) yn (x) = 0, tern of k first-order equations. Introduce new unknown functions in a variable t as follows: ai(x)y'1(x) + ---+an(x)y'n(x) = 0, C\{x)y{r2\x) + --- + Cn{x) y{r2\x) = 0, a^y^ix) + ■■■ + an(x) y^H*) = m- Vo(t) = y(t), yi(t) = y'0(t),..., yk-i(t) = y'k-^)- Now, the function y(t) is a solution of the original equation (1) if and only if it is the first component of the solution of 585 CHAPTER 8. CALCULUS WITH MORE VARIABLES The roots of the characteristic polynomial A2 — 2 A + 1 are Ai = A2 = 1. Therefore, we are looking for the solution in the form d(x)ex + C2(x)xtx, considering the system C1(x)ex + C2(x)xex = 0, C\(x)ex+C/2(x)[ex+ xex] the system of equations x2 + l We can compute the unknowns C'^x) and C'2(x) using Cramer's rule. It follows from ex eř + x eř a2x 0 x ex e" ^a^Tl' ď 0 2+ ~2x p,x _e_ o i 1 e x2+1 x + 1 that Ci(x) = - c2 + l da; = -^ln (x2 + l) +Ci, C\ G C2(a;) = da; a;2 + 1 Hence, the general solution is = arctana; + C2, C2 G y = Cie* + G^xe* - \ ex In (a;2 + l) + xexarctana;, (' . ( % G R. □ 8.L.10. Find the only function y which satisfies the linear differential equation y& - 3y' -2y = 2ex, with initial conditions y(0) = 0, y'(0) = 0, y"(0) = 0. Solution. The characteristic polynomial is a;3 — 3a; — 2, with roots 2 and —1 (double). We are looking for a particular solution in the form aex, a G R, easily finding out that it is the function — \ex. The general solution of the given equation is thus cxe2x + c2e~x + c3xe~x - -ex. Substituting into the original conditions, we get the only satisfactory function, -e2x + —e~x + -xe~x - -ex. 9 18 3 2 □ Further problems concerning higher-order differential equations can be found on page 599 ■ yi ■ V2 Vk-2 = Vk-l y'k-i =/(i,yo,yi,---,yfc-i)- Hence the following direct corollary of the theorems 8.3.9-8.3.12: Solutions of higher-order ODEs Theorem. Consider a function f(t,yo, ■■■ ,yk-i) '■ U C Rfe+1 —> R with continuous partial derivatives on an open setU. Then for every point (t0, z0,... ,zk-i) G U, there exists a maximal interval Imax = [xq — a, xo+b], with positive numbers a, b G R, and a unique function y(t) : Imax —> R which is a solution of the k-th order equation yW=f{t,y,y,...,y(k-V) with the initial condition y(t0) = z0,y'(t0) = zu... ,y(fe_1)(fo) = Zk-i- This solution depends differentiably on the initial conditions and on potential further parameters differentiably entering the function f. Moreover, the solution is analytic if the latter dependence is analytic. In particular, the theorem shows that in order to determine unambiguously the solution of an ordinary fc-th order differential equation, the values of the solution and its first k — 1 derivatives must be determined at one point With a system of I equations of order k, the same procedure transforms this system to a system of kl first-order equations. Therefore, an analogous statement about existence, uniqueness, continuity, and differentiability is also true. If the right-hand side / of the equation is differentiable up to order r or analytic, including the parameters, than the same property is enjoyed by the solutions as well. 8.3.17. Linear differential equations. The operation of differentiation can be viewed as a linear mapping from (sufficiently) smooth functions to functions. Multiplying the derivatives of the particular orders j by fixed functions a,j (t) and adding these expressions, gives the linear differential operators y(t) n> D(y)(t): D(y)(t) = afc(f)y(fc)(f) + ■ ■ ■ + ai(t)y'(t) + a0(t)y(t). To solve the corresponding homogeneous linear differential equation of order k then means finding a function y satisfying D(y) = 0. The sum of two solutions is again a solution, since for any functions y± and y2, D(Vl + y2)(t) = D(Vl)(t) + D(y2)(t). 586 CHAPTER 8. CALCULUS WITH MORE VARIABLES M. Applications of the Laplace transform Differential equations with constant coefficients can also be solved using the Laplace transform. 8.M.I. Let C(y) (s) denote the Laplace transform of a function y(t). Integrating by parts, prove that Solution. (1) C(y%s) = sjZ(y)(s)-y(0) C{y"){s) = s2C{y)-sy{Q)-y'{Q) and, by induction: C(y^)(s) = snjZ(y)(s) - 8.M.2. Find the function y (t) which satisfies the differential equation y"{t) + Ay{t) = sin2i as well as the initial conditions y(0) = 0, y'(0) = 0. Solution. It follows from the above exercise 7.H.8 that s2£(y)(s) + A£(y)(s) = £(sin2i)(s). We also have 9 £(sin2i)(s) i. e., The inverse transform leads to y(t) = ±sin2i- \tcos2t . □ 8.M.3. Find the function y (t) which satisfies the differential equation y"(t) + 6y'(t) + 9y(t) = 50 sin t and the initial conditions y(0) = 1, y'(0) = 4. Solution. The Laplace transform yields s2C(y)(s)-s-4+Q(sC(y)(s)-l)+9C(y)(s) = 50C(smt)(s i. e., s2+4' 2 (s2 + 6s + 9)£(y)(s) = 50 s2 + l 50 + + s +10, s + 10 (s2 + l)(s + 3)2 (s + 3)2' Decomposing the first term to partial fractions, we obtain 50 _ As + B C D (s2 + l)(s + 3)2 ~ s2 + l + 7+3 + (s + 3)2' 50 = (As + B)(s + 3)2 + C(s2 + l)(s + 3) + D(s2 + 1). A constant multiple of a solution is again a solution. So the set of all solutions of a fc-th order linear differential equation is a vector space. Apply the previous theorem about existence and uniqueness, to obtain the following: The space of solutions of linear equations Theorem. The set of all solutions of a homogeneous linear differential equation of order k with continuously differen-tiable coefficients is a vector space of dimension k. Therefore, the solutions can be described as linear combinations of any set ofk linearly independent solutions. Such solutions are determined uniquely by linearly independent initial conditions on the value of the function y(t) and its first k — 1 derivatives at a fixed point t0. Proof. Choose k linearly independent initial conditions at a fixed point. For each of them, there is a unique solution. A linear combination of these initial condition then leads to the same linear combination of the corresponding solutions. All of the possible initial conditions are exhausted, so the entire space of solutions of the equation is obtained in this way. □ The same arguments as with the first order linear differential equations in the paragraph 8.3.4 reveal that all solutions of the non-homogeneous fc-th order equation D(y) = b(t) with a fixed continuous function b(t) are the sums of one fixed solution y(t) of this problem and all solutions y of the corresponding homogeneous equation. Thus the entire space of solutions is an affine fc-dimensional space of functions. The method of variation of constants exploited in 8.3.4 is one of the possible approaches to guess one non-homogeneous solution if we know the complete solution to the homogeneous problem. We shall illustrate the latter results on the most simple case: 8.3.18. Linear equations with constant coefficients. The previous discussion recalls the situation with homogeneous linear difference equations dealt with in paragraph 3.2.1 of the third chapter. The analogy goes further when all of the coefficients a,j of the differential operator D are constant. Such first-order equations (1) have solutions as an exponential with an appropriate constant at the argument. Just as in the case of 'difference equations, it suggests trying whether such a form of the solution y(t) = ext with an unknown parameter A can satisfy an equation of order k. Substitution yields D(ext) = (ak\k + flfc-iA^-1 + ■ ■ ■ + ajA + a0) eA*. The parameter A leads to a solution of a linear differential equation with constant coefficients if and only if A is a root of the characteristic polynomial akXk + ■ ■ ■ + ai A + ao. If this polynomial has k distinct roots, then we have the basis of the whole vector space of solutions. Otherwise, if A is a multiple root, then direct calculation, making use of the fact that A is then a root of the derivative of the characteristic polynomial as well, yields that the function y(t) = text is 587 CHAPTER 8. CALCULUS WITH MORE VARIABLES Substituting s = — 3, we get 50 = WD hence D = 5 and confronting the coefficients at s3, we have 0 = A + C, hence A = -C. Confronting the coefficients at s, we obtain 0 = 9A + 6j3 + C = 8A + 6.B, hence B = ^C. Finally, confronting the absolute term, we infer 50 = 9B + 3C + D = 12C + 3C + 5 hence C = 3, j5 = 4, A = -3. Since s + 10 s + 3 + 7 1 7 + (s + 3)2 (s + 3)2 s + 3 (s + 3)2' we have ... -3s+ 4 3 5 1 7 £(y)(s) = ^r— + —- + t—tttt + —- + s2 + l s + 3 (s + 3)2 s + 3 (s + 3) -3s 4 4 12 + + + s2 + l s2 + l s + 3 (s + 3)2' Now, the inverse Laplace transform yields the solution in the form y(t) = -3cosi + 4sini + 4e-3* + 12ie-3* . □ 8.M.4. Find the function y (t) which satisfies the differential equation y"(t) =cos(7rf)-y(i), i £ (0, +oo) and the initial conditions y(0) = ci, y'(0) = c2. Solution. First, we should emphasize that it follows from the theory of ordinary differential equations that this equation has a unique solution. Further, we should recall that C (/") (s) = s2C (/) (s) - s;im f(t) - ^im /'(*) and also a solution. Similarly, for higher multiplicities I, There are I distinct solutions ext,text,..., te_1 ext. In the case of a general linear differential equation, a nonzero value of the differential operator D is wanted. Again, as for systems of linear equations or linear difference equations, the general solution of this type of (non-homogeneous) equations D(y) = b(t), for a fixed function b(t), is the sum of an arbitrary solution of this equation and the set of all solutions of the corresponding homogeneous equation D(y)(t) = 0. The entire space of solutions is a finite-dimensional affine space, hidden in the huge space of functions. The methods for finding a particular solution are introduced in concrete examples in the other column. In principle, they are based on looking for the solution in a similar form as the right-hand side is, or the method of variation of the constants. 8.3.19. Matrix systems with constant coefficients. Before leaving the area of differential equations, consider a very special case of first-order systems, whose right-hand side is given by multiplication of a matrix A £ Mat„(R) of constant coefficients and an n2-dimensional unknown matrix function Y (t): (1) Y'(t) =A-Y(t). Clearly this is a strict analogy to the iterative models in chapter 3. Combine knowledge from linear algebra and univariate function analysis to guess the solution. Define the exponential of a matrix by the formula Bit) 00 4-k k=0 t->0+ t->0+ £(cos (bt)) (s) = s2+b2 be Applying the Laplace transform to the given differential equation then gives s2£ (y) (s) - sci - c2 = -^72 - C (y) (s), i. e., (D £^(s)-(a2 + i)(a2 + ^)+jfl + ^fl-Therefore, it suffices to find a function y which satisfies (1). Performing partial fraction decomposition, we obtain The right-hand expression can be formally viewed as a matrix whose entries b{j are infinite series created from the mentioned products. If all entries of A are estimated by the maximum of their absolute values || A|| = C, then the fc-th sum-mand in b{j(t) is at most jjnkCk in absolute value. Hence, every series b{j(t) is necessarily absolutely and uniformly convergent, and it is bound above by the value etnC. Differentiate the terms of the series one by one, to get a uniformly convergent series with limit A etA. Therefore, by the general properties of uniformly convergent series, the derivative d dtie tA\ Ae tA also equals this expression. The general solution of the system (1) is obtained in the form J-A Y(t) ■Yn. (S2 + I)(s2+Ti2 1 \S2 + 1 S2+TT2 where Yq £ Mat„(R) is the arbitrary initial condition Y(0) = Y0. The exponential etA is a well defined invert-ible matrix for all t. So we have a vector space of the proper dimension, and hence all solutions to the system (1). Notice 588 CHAPTER 8. CALCULUS WITH MORE VARIABLES The above expression of C (cos (bt)) (s) and the already proved formula yC(sini) (s) = pr^- then yield the wanted solution y{ť) = 7r21_1 (cosi — cos (irt)) + ci cos t + c2 siní . □ 8.M.5. Solve the system of differential equations x"(t) + x'(t) = y(t) - y"(t) + eť, x'(t) + 2x(t) = -y(í) + y'(í) + e"ť with the initial conditions x(0) = 0, y(0) = 0,x'(0) = 1, y'(0) = 0. Solution. Again, we apply the Laplace transform. This, using transforms the first equation to s2£ (x) (s) — s lim x(t) — lim x'(t) + sC (x) (s) — lim x(t) = = C (y) (s)-(s2£ (y) (s) - s lun+y(t) - Em^i)) and the second one to sC (x) (s) - lim x(t) + 2£ (x) (s) = = ->C (y) (s) + sC (y) (s) - tHm y(t) + ^. Evaluating the limits (according to the initial conditions), we obtain the linear equations s2C (x) (s)-l + sC (x) (s) = C (y) (s) - s2C (y) (s) + ^ and sC (x) (s) + 2C (x) (s) = -C (y) (s) + sC (y) (s) + ^ with the only solution £ (x) (s) 2s-l 2(s-l)(s+l) Once again, we perform partial fraction decomposition, getting r (r)(i) - I -A- + 3 _k__IJ.-3 __i I -J— !_ (X) (b) — 8 s_x "I" 4 (s+i)2 8 s+1 — 4 (s+1)2 + 4 s2-l ' Since we have already computed that £(íe-*)(s) = j^p, ;C(sinhi)(s) = ^ (s2-l we get yC(ísinhí) (s) - 2s x(t) = § íe"ť + \ sinhí, y(t) = 11 sinhi. We definitely advise the reader to verify that these functions of x and y are indeed the wanted solution. The reason is that the Laplace transforms of the functions y = e*, y = sinhf and y = t sinh f were obtained only for s > 1). □ that in order to get a solution, it is necessary to multiply by Y0 from the right. It is remarkable that dealing with a vector equation with a constant matrix A e Mat„(R), (2) y'(t)=A-y(t), for an unknown function y : R —> R™, then the columns of the matrix exponential etA provide n linearly independent solutions. The general solution is then given by linear combinations of them. The general solutions of the system (2) may be under-jjj,, stood better by invoking some linear algebra - the Jordan canonical form of linear mappings, see e.g.3.4.10. In terms of vector fields X, the system has the linear Y expression X(y) = 4>(y) where 4> is the linear mapping with the matrix A in coordinates. Clearly linear transformations of the system lead to another vector field with such linear description, since the differential of a linear mapping is the mapping itself. Any linear transformation of coordinates with the (constant) matrix T transforms the system into y' = (Ty)' = (TAT-1) ■ (TY) =A-y. In particular, a suitable change of coordinates T provides the matrix A in the Jordan canonical form expressing

d + d di-agonalisable and 3=0 where k is the order of nilpotency of the Jordan block corresponding to the eigenvalue A of the matrix A, pj are suitable _xjonstant vectors. In particular, if the nilpotent part of A is 1 trivial, then k = 0. This important result allows many generalizations. For example, the Floquet-Lyapunov theory generalizes this behaviour of solutions to systems with periodically time-dependent matrices A(t). 8.3.20. Return to singular points. Finally, recall the first-order matrix system in paragraph 8.3.12 when the derivative of the solutions of vector equations with respect to the initial conditions was discussed. Consider a differentiable vector field X(x) defined on a neighbourhood of its singular point xq £ K", i.e. X(xq) = 0. Then, the point xq is a fixed point of its flow FLf (x). The differential (p(t) = Dx FLf (x0) satisfies the matrix system with initial condition (see (2) on page 578) $'(t) = D1X(x0)-$(t), ${0)=E. The important point is that the differential D1X is evaluated along the constant flow line x0, since this is a singular point of the system. The solution is known explicitly, and this describes the evolution of the differential N, || Ak - A^ \\ < e. Hence, oo. The limit of the expression j(t) = e~* Y^k T. The whole expression has been estimated (for n > N and t > T > 0) by the value £((7+1)11^11. Summarizing, a very interesting statement is proved, which resembles the discrete version of Markov processes: 592 CHAPTER 8. CALCULUS WITH MORE VARIABLES Continuous processes with a stochastic matrix Theorem. Every primitive stochastic matrix A determines a vector system of equations y'(t) = (A-E)-y(t) with the following properties: • the basis of the vector space of all solutions is given by the columns of the stochastic matrix Y(t) =e~l e* , • if the initial condition yo = y(to) is a stochastic vector, then the solution y(t) is also a stochastic vector for all values oft, • every stochastic solution converges for t —> oo to the stochastic eigenvector yoo of the matrix A corresponding to the eigenvalue 1. 8.3.22. Remarks on numerical methods. Except for the exceptionally simple equations, for example, linear equations with constant coefficients, analytically solvable equations are seldom encountered in practice. Therefore, some techniques are required to approximate the solutions of the equations. Approximations have already been considered in many other situations. (Recall the interpolation polynomials and splines, exploitation of Taylor polynomials in methods for numerical differentiation and integration, Fourier series etc.). With a little courage, consider difference and differential equations to be mutual approximations. In one direction, replace differences with differentials (for example, in economical or population models). For other situations the differences may imitate well continuous changes in models. Use the terminology for asymptotic estimates, as introduced in 6.1.16. In particular, an expression G(h) is asymptotically equal to F(h) for h approaching zero or infinity, and write G(h) = 0(F(h)), if the finite limit of G(h)/F(h) exists. A good example is the approximation of a multivariate function j(x) by its Taylor polynomial of order k at a point x0. Taylor's theorem says that the error of this approximation isO(||/i||fe+1), where h is the increment of the argument h = X — Xq. In the case of ordinary differential equations, the simplest scheme is approximation using Eulerpolygons. Present this method for a single ordinary equation with two quantities: one independent and one dependent. It works analogously for systems of equations where scalar quantities and their derivatives in time t are replaced with vectors dependent on time and their derivatives. This procedure was used before in the proof of the Peano's existence theorem, see 8.3.8. Consider an equation y' = f(t,y) 593 CHAPTER 8. CALCULUS WITH MORE VARIABLES with continuous right-hand /. Denote the discrete increment of time by h, i.e. set tn = t0 + nh. It is desired to approximate y(t). It follows from Taylor's theorem (with remainder of order two) and the equation that y(tn+1) = y(tn) + y'(tn)h + 0(h2) = y(tn) + f(tn,y(tn))h + 0(h2). Define the recurrently the values yj by the first order formula Vj+i = vj + f(yj,Vj)h- This leads to the local approximation error 0{h2), occuring in one step of the recurrence. If n such steps are needed with increment h from to to t = tn, the error could be up to nO(h2) = j^(t—to)0(h2) = 0(h). More care is needed, since the function /(f, y) is evaluated in the approximate points (f j, ) and the already approximate previous values yj. In order to keep control, f(t,y) must be Lipschitz in y. Assuming inductively that the estimate is true for all i < j, \I(t3,y(tj)) - I(t3,yj)\ < C\y(tj) - Vj\ < C\t - t0\O(h) where C is the Lipschitz constant, assuming that the error does not exceed 0(h) with globally valid constant for yj. Inductively, the expected bound 0{h) for the global error estimate is obtained. Think about the details. The Euler procedure is the simplest method within the class of the Runge-Kutta methods. Dealing with higher order equations, we may either view them as vector valued first order systems (as in the theoretical column) and then even Euler method provides results for the initial condition on the necessary number of derivatives in one point. But in practical problems, it is often needed to find solutions passing through more then one prescribed point. For example, with second order equations, prescribe two values y(h) and y(t2) of the solution. This would need completely different methods. 594 CHAPTER 8. CALCULUS WITH MORE VARIABLES O. Additional exercises to the whole chapter 8.O.I. A basin with volume 300 hi contains 100 hi of water in which 50 kg of salt is dissolved. Water with 2 kg of salt per 1 hi starts flowing into the basin at 6 hl/min. The mixture, being kept homogeneous by permanent stirring, leaves the basin at 4 hl/min. Express the amount of salt (in kg) in the basin after t minutes have expired as a function of the variable t e [0,100]. 0 8.O.2. During a controlled experiment, a small smelting furnace is slowly cooling down while the outer temperature keeps at 300 K. The experiment began at noon. At 1 pm, the temperature in the furnace was estimated at 1300 K. At 3 pm, it was only 550 K. Supposing the measurements were accurate, compute what the temperature in the furnace was at 2 pm. O 8.O.3. The half-life of the radioactive sulfur isotope 35S is 87.5 days. After what period are there only 900 grams left of the original amount of 1 kilogram of this isotope? (You may express the result in terms of the natural logarithm.) O 8.O.4. The half-time of a radioactive element A is 5 years; for an element B, it is 1 year. If we have 5 kg of element B and 1 kg of element A, after what period will we have the same amount of both? (You may express the result in terms of the natural logarithm.) O 8.O.5. The half-time of a radioactive element A is 8 years; for an element B, it is 2 years. If we have 3 kg of element B and 1 kg of element A, after what period will we have the same amount of both? (You may express the result in terms of the natural logarithm.) O 8.O.6. The half-life of the radioactive cobalt isotope 60Co is 5.27 years. Having 4 kg of this isotope, after what period does 1 kg of it decay? (You may express the result in terms of the natural logarithm.) O 8.O.7. Solve the following differential equation for the function y = y(x): i + y2 y 1+x2 O 8.O.8. Determine all solutions of the following equation with separated variables: y -y2 +xy' = 0. o 8.O.9. Solve the equation 1 + f =e*. o 8.O.10. Solve the equation 2y = x3y'. Q 8.O.H. Determine all solutions of the equation \J4 — y2 dx + y dy = 0. o 8.0.12. Solve y'tanx = y2 + 1 — 2y. 595 CHAPTER 8. CALCULUS WITH MORE VARIABLES o 8.0.13. Determine the general solution of the differential equation a +1 _ V yl x l—y2 8.0.14. Find the general solution of the differential equation (x + 1) dy + xy dx = 0. o o 8.0.15. Find the solution of the differential equation sin y cos xdy = cos y sin x dx which satisfies 4y(0) =tt. O 8.0.16. Solve the initial problem (x2 + 1) (y2 - 1) + xyy' = 0, y(l) = y/2. o 8.0.17. Determine the particular solution of the equation y' sin x = y In y which goes through the point [7r/ 2, e]. O 8.0.18. Find all solutions of the differential equation 2(l + e*)yy'=e*, which satisfy y(0) = 0. O 8.0.19. Solve the homogeneous equation (xy' — y) cos ^ = x. o 8.O.20. Determine the general solution of the homogeneous differential equation y3 = x3y'. O 8.0.21. Find all solutions of the equation xy' = \Jx2 - y2 + y. o 8.0.22. Determine the general solution if we are given xy' = y cos (in ^) . o 8.0.23. Solve the equation (x + y) dx — (x — y) dy = 0 as homogeneous. O 8.0.24. Calculate y' = (x + y)2. O 596 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.0.25. Find the general solution for / = x-y+3 y x+y—5 8.0.26. Calculate / = x-y+l 8.0.27. Determine all solutions of the differential equation / _ 5y-5x-\ y ~ 1y-1x-\- 8.0.28. Find the general solution of the equation i _ x-y-l y x+y+3' 8.0.29. Determine the general solution for the equation / = 2x-y-5 " x—3y—5' 8.O.30. Express the solutions of the equation , _ x+2y-T y ~ x-3 8.0.36. Find all solutions of the equation y' cos x = (y + 2 cos x) sin 597 x. o o o o o as explicitly given functions. O 8.0.31. Using the method of constant variation, calculate y' + 2y = x. O 8.0.32. Determine the general solution of the equation y' = 6x + 2y + 3. O 8.0.33. Solve the linear equation y' = 4xy + (2x + l)e2x2. O 8.0.34. Solve the equation y'x + y = x In x. O 8.0.35. Calculate the linear differential equation y'x = y + x2\nx. o CHAPTER 8. CALCULUS WITH MORE VARIABLES o 8.0.37. Find the solution of the equation y' = 6a — 2y which satisfies the initial condition y(0) = 0. O 8.0.38. Solve the initial problem y' + y sin a; = sin a, y (J) = 2. o 8.0.39. Find the solution of the equation y' = Ay + cos a which goes through the point [0,1]. O 8.O.40. Solve the following equation for any a, b e K: ay'+y = ex, y (a) = b. 8.0.41. Determine the general solution of the equation 3a2y' + ay = 4j- 8.0.42. Solve the Bernoulli equation / S — t y — ay — ye 8.0.43. Calculate the Bernoulli equation iv 2 • y — - = y sma. 8.0.44. Find all solutions of the equation y' = ^+a^. 8.0.45. Solve the equation ay' + 2y + a5y3ex = 0. 8.0.46. Find the general solution of the following equation provided a, b > 0: ydy = (a^ + b^ dx. 8.0.47. Interchanging the variables, solve 2y + (y2 - 6a) y' = 0. o o o o o o o 598 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.0.48. Solve the equation 2y In y+y-x ' 8.0.49. Calculate the general solution of the following equation: x dx = — y3 j dy. 8.O.50. Interchanging the variables, calculate (x + y)dy = ydx + y\ny dy. 8.0.51. Solve y' (e~v -x) = l. 8.0.52. Calculate 2x-y2 8.0.53. Solve the equation 2ydx + xdy = 2y3 dy. 8.0.54. Calculate o o o o o o o y" + 3y' + 2y= (20x + 29) e3x. o 8.0.55. Find any solution of the non-homogeneous linear equation y" + y' + | y = 25 cos (2x). O 8.0.56. Determine the solution of the equation y" + 2y' + 2y = 3t~x cosx. 599 CHAPTER 8. CALCULUS WITH MORE VARIABLES o 8.0.57. Find the solution of the equation y" = 2y' + y + l, which satisfies y(0) = 0 and y'(0) = 1. O 8.0.58. Find the solution of the equation y" = Ay - 3y' + 1 which satisfies y(0) = 0 and y'(0) = 2. O 8.0.59. Determine the general solution of the linear equation y" — 2y' + 5y = 5e2x sin a;. 8.O.60. Taking advantage of the special form of the right-hand side, find all solutions of the equation y" + y' = x2 — x + 6e2x. 8.0.61. Solve y(4) - 2y" + y = 8 (ex + e~x) + 4 (sina; + cos: 8.0.62. Using the method of constant variation, calculate y"-2y' + y=ei 8.0.63. Solve y" + Ay' +Ay = e~2x In2. o o o o o 8.0.64. Using the method of constant variation, find the general solution of the equation y" + Ay= -TTT. O 8.0.65. Solve the equation y" +y = tan2 x. O 8.O.66. Find the solution of the differential equation yP) = _2y" -2y' -y + sin{x) which satisfies y(0) = -\, y'(0) = ^, and y"(0) = -1 - ^. O 8.0.67. Calculate the equation y'" - 2y" - y' + 2y = 0. O 600 CHAPTER 8. CALCULUS WITH MORE VARIABLES 8.O.68. Find the general solution of the equation y W + 2y" + y = 0. o 8.0.69. Solve 3/(6) + 2t/5) + 4y W + Ay'" + 5y" + 2y' + 2y = 0. O 8.O.70. Find the general solution of the linear equation y(5) _ 3y(4) + 2y'" = 8x - 12. o 601 CHAPTER 8. CALCULUS WITH MORE VARIABLES Key to the exercises 8.C1. 2. 8.C.2. Factorize the denominator to get \. 8.C.3. 0. 8.C.4. Expand the fraction with ^ and use the substitution t = xy ( (x, y) —> (0, 2), means t —> 0). 2. 8.C.5. Use the polar coordinates x = r cos ip,y = r sin ip, (x, y) —> (oo, oo), means r —> oo, ip e (0, f). 0. 8.C.6. Use two different parametrization y = 1 — ex and y = x to get two different values —4 and 0, thus the limit does not exists. 8.C.7. Try polar cordinates, where r —> 0,

, that is it depends on the direction of approach to [0,0], thus the limit does not exists. 8.C.8. Try polar coordinates, where r —> oo,

U assigning the foot points to the tangent vectors, we write TXU for the vector space of all vectors X with p(X) = x at a point x e U, and we use the notation X(U) forthesetof all smooth vector fields on the open subset U. The linear combinations of the special vector fields admitting smooth functions as the coefficients generate the entire X(U). Thus we write general vector fields as CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 2 / / dxdy=2[x]2_2- [y]_2 = 32 □ 9.B.2. Compute x Ax + xy dy, where c is the positively oriented curve going through the vertices A = [0,0]; 5= [1,0]; C = [0,1]. Solution. The curve c is the boundary of the triangle ABC. The integrated functions are continuously differentiable on the whole R2, so we can use Green's theorem: 1 -x+l xAdx + xydy = / / y dx dy = / / y dx dy l d -x+l 0 0 1 xA 2x2 dx ■ + x x2 - 2x + 1 da; □ 9.B.3. Calculate (xy + x + y) dx + (xy + x — y) dy, where c is the circle with radius 1 centered at the origin. Solution. Again, the prerequisites of Green's theorem are satisfied, so we can use Green's theorem, which now gives (a;y + x + y) dx + (xy + x — y) dy y + 1 — x — ldxdy d 1 2-k r2 (sin p — cos ip) dr dp o o 1 2-k = I r dr I sin p — cos pdp o o = ^ [ - cos v - sin V between two open sets U C R™, V C Rm defines the mapping F* : TV -4- TV by applying the differential D1F to the individual tangent vectors. Thus if y = F(x) = (Mx), ...Jm(x)) then F,:TXU^ TF(x)V, _d_ dx, (y) EE j=ivi=i dfj(x) dxi X,(x JL 9yj □ When we studied the vector spaces in chapter two, we came accros the useful concept of linear forms. They were denned in paragraph 2.3.17 on page 106. This idea extends naturally now. A scalar valued linear mapping denned on the tangent space TXU is such a linear form at the foot point x. The vector space of all such forms T*U = (TXU)* is thus naturally isomorphic to Rn* and the collection T*U of these spaces comes equipped by the projection to the foot points, let us denote it again by p. Having a mapping r\ : U cK"^ T*U with values r/(x) e T*U on an open subset U, i.e., p o rj = id[/, we talk about a differential form rj on U, or a linear form. Every differentiable function / on an open subset U C R™ defines the differential form df on U (cf. 8.1.7). We use the notation fl1 (U) for the set of all smooth linear differential forms on the open set U. In the chosen coordinates (x1,..., xn) we can use the differentials of the particular coordinate functions to express every linear form r\ as r)(x) = ri1(x)dx1 H-----\-r)n(x)dxn, where r\i (x) are uniquely determined functions. Such a form rj evaluates on a vector field X(x) = X1(x)-^ + ■ ■ ■ + Xn(x)-^- as ri(X)(x) = ri(x)(X(x)) = rll(x)X1(x)+- ■ ■+rln(x)Xn(x). If the form r\ is the differential of a function /, we get just back the expression X(f)(x) = df(X(x)) = jj^X^x) + ■■■ + §^-Xn(x) for the derivative of / in the direction of the vector field X. 9.1.2. Exterior differential forms. As we discussed already in chapters 1 and 4, the volume of fc-dimensional parallelepipeds S, as a quantity depending of the k vectors spanning S, is an antisymmetric fc-linear form on the vectors, see 2.3.22 on page 111. Remember also the computation of the volume of parallelepipeds in terms of determinants in 4.1.22 on page 247. Thus, if we want to talk about the (linearized) volume on fc-dimensional objects, we need a concept which will be linear in k distinct tangent vector arguments and will assign a scalar quantity to them. Moreover, we will require that interchanging any pair of arguments swaps the sign, in accordance with the orientations. 607 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.B.4. Compute f(2e2xsiny — 3y3)dx + (e2xcosy + ^x3)dy, where c is the positively oriented ellipse Ax2 + 9y2 = 36. Solution.: We will use Green's theorem, choosing the linear deformation of polar coordinates x = 3rcosip, p e [0,2tt], y = 2r sin p r £ [0,1], leading to (the Jacobian of the transformation is 6r): 4 (2e2x siny - 3y3)dx + (e2x cosy + -a;3) dy = 2e2xcosy + 4a;2 - (2e2xcosy - 9y2) da; dy : 6r [4(3rcosy)2 + 9(2rsiny>)2] = d 1 2ir 0 0 1 2ir 3 = 216 J r3 dr J dp = 216 ■ [—\Q ■ 2ir = 108tt. o o □ 9.B.5. Compute J(exlny - y2a;)da; + - ^x2yj dy, c where c is the positively oriented circle (x — 2)2 + (y — 2)2 1. Solution. ex 1 (exlny - y2a;)da; + (---x2y)dy = V ^ d 1 2ir e e --a;y---h 2a;y da; dy : y y r(r cos p + 2) ■ (r sin y> + 2) dr dip = o o 1 2ir r3 sin y> cos p + 2r2 (sin y> + cos ^) + 4r dr di/; ^ o o 2ir — f sin p cos y> dijS + — / sin y> + cos p dp + Air = 1 rsin p1 2ir r . n 2ir g L 2 J0 + [-cos^ + siny>J0 +47T = 47r. Exterior differential forms Definition. The vector space of all fc-linear antisymmetric forms on a tangent space TXU, U C K™, will be denoted by Ak(TxU)*. We talk about exterior k-forms at the point xeU. The assignment of a fc-form r](x) £ AkT*U to every point x e U from an open subset in R™ defines an exterior differential k-form on U. The set of smooth exterior fc-forms on U is denoted fJ^ (U). Next, let us consider a smooth mapping G : V —> U between two open sets V C Rm and (7 C t", an exterior fc-form r)(G(x)) £ Ak(T^^U), and choose arbitrarily k vectors Xi(x),... ,Xk(x) in the tangent space TXV. Just like in the case of linear forms, we can evaluate the form r] at the images of the vectors Xi using the mapping y = G(x) = (gi(x),..., gn(x)). This operation is called the pullback of the form rj by G. G*(r/(G(a;)))(X1(a;),...,Xfc(a;)) = r/(G(a;))(G4X1(a;)),. .., G^X^x))), which is an exterior form in Ak (T* V). In the case of linear forms, this is the dual mapping to the differential D1G. We can compute directly from the definition that, for instance, OXjy, □ and so (1) G*(dy{) = ^-dx1 + --- + ox i which extends to the linear combinations of all dy{ over fuc-tions. Another immediate consequence of the definition is the formula for pullbacks of arbitrary fc-forms by composing two diffeomorphisms: (2) (GoF)*a = F*(G*a). Indeed, as a mapping on fc-tuples of vectors, (G o F)*a = a o ((D1G o D1F) x ... x (D1G o D1F)) = G*(a) o (D1F x ... x D1F) = F* o G*a as expected. 9.1.3. Wedge product of exterior forms. Given a fc-form , £ AkRn* an(j an l_(orm p e AkRn*, €£L_kY/ we can create a (k + £)-form a A (3 by all possible permutations a of the arguments. We just have to alternate the arguments in all possible orders and take the right sign each time: (aAp)(X1,...,Xk+t) = a&Ek 608 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.B.6. Calculate the integral (ex siny — xy2)dx + ( ex cosy — '^'^ ) where c is the positively oriented circle x2+y2+4x+4y+7 = o. O 9.B.7. Compute f(3y - eshlx) dx + (7x + vV + 1) dy, where c is the positively oriented circle x2 + y2 = 9. O Compute the integral fA y3, A 9 x3. , / - + 2xy - 2- )dx + - + x2 + — dy, J x 3 y 3 where c is the positively oriented boundary of the set D = {(x,y)eR2 :4 dim U. Thus, ft (U) contains only the trivial zero form in this case. Another straightforward consequence of the definition is that the pullback of the wedge product by a smooth mapping G : V -> U satisfies G*(a A 13) = G*a AG* 13. We should also notice that 0-forms fP (Rn) are just smooth functions on R™. The wedge product of a 0-form / and a fc-form a is just the multiple of the form a by the function /. Similarly, the top degree forms in ff1 (U) are all generated by the single generator £12...n, since there is just one possibility of n different choices among n coordinates, up to the ordering. This means that actually the n-forms cu are identified with functions via the formula u>(x) = f(x)dxi A ■ ■ ■ A dxn. At the same time, while the pullback on the functions / G ffi(U) by a transformation F : Rn ->• Rn, y = F(x), is trivial, i.e. F*f(x) = f(y) = f o F(x), a straightforward computation reveals (1) F*u{x) = det(D1F)(x)f(F(x))dx1 A ■ ■ ■ A dxn for all cu = fdy1 A ■ ■ ■ A dyn. 609 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.B.11. Find the area bounded by the cycloid which is given parametrically astp(ť) = [a(t—siní); a(l—cosi)],fora > 0, t G (0, 2tt), and the x-axis. Solution. Let the curves that bound the area be denoted by ci and c2. As for the area, we get m(D) = \ JCl -V d-x + x dy + I JC2 -y dx + x dy. Now, we will compute the mentioned integrals step by step. The parametric equation of the curve c\ (a segment of the x-axis) is (i; 0); t G [0; 2a7r], so we obtain for the first integral that -y dx + x dy = - / 0 ■ 1 di + / t ■ 0 di = 0. Ci The parametric equation of the curve c2 is ip(ť) G (a(t — siní), a(l - cosí));í G [2tt; 0]. The formula for the area expects a positively oriented curve, which means for the considered parametric equation that we are moving against the parametrization direction, i. e. from the upper bound to the lower one. We thus get for the area of the cycloid that o If If — / — y dx + x dy = — / a(t — siní) ■ a(siní) dí— a(l — cosi) ■ a(l — cosi) dí : 1 2tt 2rr a It sin í — sin í — 1 + 2 cos í — cos í dí : 1 , = — a I tsiní + 2cosí — 2 dí = = — a2 [—í cos í — siní + 2 cos í — 2]^ = 37ra2 □ 9.B.12. Compute I = jjx3dydz + y3 dx dz + s z3 dx dy,where S is given by the sphere x2 + y2 + z2 = 1. Solution. It is advantageous to work in spherical coordinates x = p sin p cos ip p=[Q, 1], y = p sin p sin f/» U, G = (g±,..., gn), it means we will evaluate a; at a point G(y) = x at the values of the vectors G*(Xi),..., G*,(Xn). However, this means we will integrate the form G*cu in coordinates (yi,..., y„), and we already saw in the previous paragraph, cf. 9.1.3(1) that (G*a/)(y) = /(G(y)) det(731G(U))dy1 A ■ ■ ■ A dyn. Substituting into our interpretation of the integral, we get G*(/o;Kn) = / /(G(y)) det(731G(U))dy1 ■ ■ ■ dyn, V JG-^lf) which is, by the theorem 8.2.8 on the coordinate substitution in the integral, the same value as fv fuiM.™ if the determinant of the Jacobian matrix is positive, and the same value up to the sign if it is negative. Our new interpretation thus provides the geometrical meaning for the integral of an n-form on R™, supposing the corresponding Riemann integral exists in some (and hence any) coordinates. This integration takes into account the orientation of the area we are integrating over. We shall come back to this point in a moment. 9.1.5. Integrals along curves. Our next goal is to integrate 'fit objects over domains which are similar to curves or surfaces in R3. Let us first shape 5£2|£gb£ our mind on the simplest case of the lowest dimension, i.e. the curves in R™. 610 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS The Jacobian of this transformation is — p2 sin p. The given integral is then equal to I = dy dz + y dx dz + z dx dy = 3x2 + 3y2 + 3z2 dx dy dz = 3 0 0 0 „2 „„„2 , +p2 cos2 p) dp dp dip 1 2l IT = 3 pA sin ip (sin2 ip (cos2 ip + sin2 ip) + ooo „2 , 2tt- Recall the calculation of the length of a curve in R™ by univariate integrals, which was discussed in paragraph 6.1.6 on page 378. The curve was parametrized as a mapping c(t) : R —> R™, and the size of the tangent vector \\c'(t)| was expressed in the Euclidean vector space. This procedure was given by the universal relation for an arbitrary tangent vector, i.e., we actually found the function p : R™ —> R which gave the true size when evaluated at d (t). This mapping satisfied p(a v) = \a\p(v) since we ignored the orientation of the curve given by our parametrization. If we wanted a signed p2 sin p (p2 sin2 p cos2 ip + p2 sin2 p sin2 ip+ length, respecting the orientation, then our mapping p would be linear on every one-dimensional subspace L c R™. Of course we could have multiplied the Euclidean size by a positive function and integrate this quantity. In view of our geometric approach to integration, we should rather integrate linear forms along curves, while the size of vectors is given by a quadratic form, rather than a linear one. However, in dimension one, we take the square root of the values of the (positive definite) quadratic form, in order to get a linear form (up to sign) which is just the size of the vectors. Let us proceed in a much similar way dealing with linear differential forms rj on R". The simplest ones are the differentials df of functions / on R™. In order to motivate our development, let us consider the following task. Imagine, we are cycling along a path c(t) in R2, the function / is the altitude of the terrain. If we want to compute the total gain of altitude along the path c(t), we should "integrate" the immediate infinitesimal gains, which should be the derivatives of / in the directions of the tangent vectors to the path, i.e. dj(c'(t)). Thus, let us consider a differentiable curve c(t) in R™, t e [a, b], write M for the image c([a, &]), and assume that a differentiable function / is defined on a neighborhood of TM. The differential of this function gives for every tangent 'vector the increment of the function in the given direction. It is expressed by the differential of the composite mapping foe + cos ipj dp dp dip = 1 2tt i pA sin ip dp dp dip -1-11 = -^. cos p\0 □ 9.B.13. The vector form of the Gauss-Ostrogradsky theorem. The divergence of a vector field F(x, y, z) = f(x,y,z)-^ + g(x,y,z)-§^ + h{x,y,z)-§-z is defined as divX := fx + gy + hz. Then, the Gauss-Ostrogradsky theorem can be formulated as follows: divF(x,y, z) dx dy dz = II F(x,y, z)-n(x,y, z) dS, where n(x, y, z) is the outer unit normal to the surface S at the point [x, y,z] e S (S is the boundary of the normal domain V). d{foe){t) = ^-{e{t))e'1{t) + + ^(c(t))c'n(t). We can thus try to define the value of the integral in the following way 9.B.14. Find the flow of the vector field given by the func- tion F = (xy2, yz l,z =3. over the cylinder x2 + y2 = 4, z = df = M dx1 ■ + -^-(c(t))c'n(t))dt, and we immediately verify that the change of the parametrization of the curve has no effect upon the value. Indeed, writing c(t) = c(ip(s)), a = ip(ci), b = ip(b), our procedure yields Solution. First of all, vector field: we compute the divergence of the divF = V-F = ( d(xy2) d(yz) d(x2z) dx - + dy + - dz 2 2 y +z +x . dx1 (C(^(S)))c'1(^(S)) + . dxn and the theorem about coordinate transformations for univari d. ate integrals gives just the same value if we have ^ > 0, i.e., 611 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Therefore, the flow T of the vector filed is equal to y2 + z + x2 dx dy dz = 2 2ir 3 p ■ (p2 sin2 p + z + p2cos2p) dp dp dz = if we keep the orientation of the curve, and the same value up to sign if the derivative of the transformation is negative. If we extend the same definition to an arbitrary linear form r] = r]idx1 + ... r]ndxn we arrive at the same formulae with rji replacing the derivatives J^-, 77 = / (r11(c(t))c'1(t) + --- + rln(c(t)yn(t))dt, 0 0 1 2 2i 3 p ■ (p (sin p + cos p) + z ) dp dp dz 0 0 1 2 2i 3 p ■ (p (sin p + cos p) + z ) dp dp dz 0 0 1 2 2i 3 p + pz dp dp dz = 0 0 1 2 3 3 4 2 = 2tt J J p3 + pz dp dz = 2tt J[£- + ^-z]2 dz = 0 1 1 3 = 2./4 + 2zdz=2.[4z + z2]3 = 2.[12 + 9-4-l] □ 9.B.15. Find the flow of the vector field given by the function F = (y, x, z2), over the sphere x2 + y2 + z2 = 4. M again independent of the parametrization of the curve c as above. In the above example with n = 2, / was the altitude of the terrain, and the integral of df along the path modelled the total gain of elevation. Thus, we should expect that the total gain along the path should depend on the values c(a) and c(b) only, while different curves with the same boundary points would produce different integrals of r\ for a general 1-form r\. This will indeed be the special claim of the Stokes theorem below. Before we treat the higher dimensional analogs, we shall look at more abstract approach to suitable subsets in R™ and the role of coordinates on them. 9.1.6. Manifolds. The straightforward generalizations of parameterized curves c(t) : R —> R™ are the differ-entiable mappings p : V C Rfe —> R™, k < n, with injective differential dp(u) at every point of its open domain V. Such mappings are called immersions. With the curves, we did not care about their self-intersections etc. Now, for technical reasons, we shall be more demanding. Solution. The divergence of the given vector field is: div^ = V.^=(| + ^ + ^) = 2z. ax ay oz Thus, the wanted flow equals 2z da; dy dz = v ooo 2 2tt tt = 2 J p3 dp J dtp J sin p cos p dp = 0 0 0 = 2[£]g ■ mi- ■ [i^vo = = 2 ■ — . 2tt ■ 0 = 0. 4 Manifolds in R™ A subset M C R™ is called a manifold of dimension k if every point x e M has a neighborhood U C R™ which is the image of a diffeomorphism p : V x V ^ H, V cRk, V C Rn~k, such that • the restriction p = p|y : K^Misan immersion, • p-^M) = V x {0} C Rn. p2 sin p ■ 2pcos p dp dp dip =rhe manifolds M are carrying the topology inherited from □ C. Equation of heat conduction 9.C.I. Find the solution to the so-called equation of heat conduction (equation of diffusion) ./img/0214_eng.png 612 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS ut(x,t) = a2 uxx(x,t), itl, t > 0 satisfying the initial condition u (x,t) = /(a). Notes: The symbol ut = |^ stands for the partial derivative of the the u with respect to t (i. e., differentiating with respect to t and considering x to be constant), and similarly, Ft2 uxx = %^ denotes the second partial derivative with respect to x (i. e., twice differentiating with respect to x while considering t to be constant). The physical interpretation of this problem is as follows: We are trying to determine the temperature u(x, t) in an thermally isolated and homogeneous bar of infinite length (the range of the variable x) if the initial temperature of the bar is given as the function /. The section of the bar is constant and the heat can spread in it by conduction only. The coefficient a2 then equals the quotient f^, where a is the coefficient of thermal conductivity, c is the specific heat and q is the density. In particular, we assume that a2 > 0. Solution. We apply the Fourier transform to the equation, with respect to variable x. We have oo F(ut)(u,t) = ^= J ut(x,t)e-^xdx = — OO where differentiated with respect to t, i. e., T{ut) (u,t) = [T{u) (u,t))' = {T{u))t (u,t). At the same time, we know that F (a2 Uxx) (w, t) = a2 T (uxx) (u, t) = —a2oj2 T (u) (ui, t). Denoting y(ui, t) = T (u) (ui, t), we get to the equation yt = -a2u2 y. We already solved a similar differential equation when we were calculating Fourier transforms, so it is now easy for us to determine all of its solutions y{uo,t) = K (a/le""2"2*, K (ui) G R. It remains to determine K(ui). The transformation of the initial condition gives T'(/) (uj) = lim T (u) ((jJ,t) = lim y(cj,t) = K (cj)e° = K (lj), hence y{ui,t) = T{f){Li)e-a2uj2t, K(u)eR. Now, using the inverse Fourier transform, we can return to the original differential equation with solution This definition is illustrated by the picture above. Manifolds can be typically (at least locally) given implicitly as the level sets of differentiable mappings, see paragraph 8.1.23 and the discussion in 8.1.25. The mapping ip from the definition is called the local parametrization or local map of the manifold M. The manifolds are a straightforward generalization of curves and surfaces in the plane R2 or the space R3. We have excluded curves and surfaces which are self-intersecting and even those which are self-approaching. For instance, we can surely imagine a curve representing the figure 8 parametrized with a mapping p with every where-injective differential. However, we will be unable to satisfy the second property from the manifold definition in a neighborhood of the point where the two branches of the curve meet. Tangent and cotangent bundles of manifolds The tangent bundle TM of the manifold M is the collection of vector subspaces TXM c TxRn which contain all vectors tangent to the curves in M. There is the footpoint projection p : TM -4- M. Similarly, the cotangent bundle T*M of the manifold M is the collection of the dual spaces (TXM)*, together with the footpoint projection. ./img/0214_eng.png Clearly, every parametrization p defines a diffeomor-phism p.* : TV —> T(p(V)) C TM, p-t{d{t)) = j^(c(t)). Due to the chain rule, this definition does not depend of the choice of the representing curve c(t). We shall also write Tip for the mapping ip*. In particular, the local maps p (extended to p, as in the above definition) induce the local maps ip* : TV = U x Rfe -4- TM cl"xt™ of the tangent bundle. Thus, the tangent bundle TM is again a manifold, which locally looks as U xl* over sufficiently small open subsets U C M. But we shall see that TM might be quite different from M xRk globally. Dealing with the cotangent bundle, we can use the dual mappings (Tp-1)* on the individual fibers T*M to obtain local parametrizations. 613 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 4- / J-(/)(o;)e-^a', : do; Computing the Fourier transform .F(/) of the function f(t) = e~a*2 for a > 0, we have obtained (while relabeling the variables) 7b J""" dp = -L c> 0. /2tt J " " ^ y/2c ' — oo According to this formula (consider c = a2t > 0, p = r — s — x), we have l r e-a2^2t e-jw(s -x) du) V2-7T J — oo Therefore, /2aH (X,t) = oo / _ -pa J f(s)e~-^rds. □ D. Variational Calculus £. Complex analytic functions 9.E.I. Check that the mapping z n> zn, n£N, defined on the entire C has got the complex derivative z i-> nz""1. Solution. We can proceed two ways. Either check the definition of the complex derivative directly, cf. ??, or use the detour via the explicit expression of the mapping / (x+i y) = (x + i y)n in two real coordinates x, y and see that its derivative is complex. The first very simple possibility repeats the computation with polynomials in one real variable and is shown in 9.4.1. □ Notice that two differentiable immersions p and tp parametrizing the same open subset U C M provide the composition t/>-1 o ip. We view this as a coordinate change for U and we have just seen that coordinate changes on M induce coordinate changes on TM. Further, if M and N are two manifolds and F : M —> N a mapping, we say that F is differentiable (up to order r or smooth or analytic), if the compositions ip-1 o F o p with two local parametrizations p of M and t/j of N (of the same order of differentiability as we want to check) is differentiable (up to order r or smooth or analytic). Again, the chain rule property of differentiation shows that this definition does not depend on the particular choice of the parametrizations. Each differentiable mapping F : M —> N defines the tangent mapping TF : TM —> TN between the tangent spaces, which clearly is differentiable of order one less than the assumed differentiability of F. Vector fields and differential forms on manifolds Smooth vector fields X on a manifold M are smooth sections X : M -> TM of the footpoint projection p : TM -> M,i.e.,po X = \&m- Smooth fc-forms r\ on a manifold M are sections M -4- Ak(TM)* such that the pullback of this form by any parametrization V —> M yields a smooth exterior fc-form on V. We write X (M) for the space of smooth vector fields on M, while Ft (M) stays for the space of all smooth exterior fc-forms on M. Notice that all our coordinate formulae for the vector fields, forms, pullbacks etc. on Rm hold true in the more abstract setting of manifolds and their local parametrizations.1 9.1.7. Integration of exterior forms on manifolds. Now, we are almost ready for the definition of the in-(N tegral of fc-forms on fc-dimensional manifolds. For the sake of simplicity, we will examine smooth forms cu with compact support only. First, let us assume that we are given a fc-dimensional manifold M C R™ and one of its local parametrizations p : V C Rfe -4- U C M C R™. We consider the standard orientation on Rfe given by the standard basis (cf. 4.1.22 for the definition of the orientation of a vector space). The choice of the parametrization p also fixes the orientation of the manifold U C M. This orientation will be the same for those choices of local parametrizations, which differ by diffeomor-phisms with positive determinants of their Jacobi matrices. The orientation will be the other one in the case of negative determinants. The manifold M is called orientable if there 1 Actually, instead of dealing with manifolds as subsets of M", we might use the same concept of local parametrizations of a space M with differentiable transition functions i/>-1 of. We just need to know what are the "open subsets" in M, thus we could start at the level of topological spaces. On the other hand, there is the general result (the so called Whitney embedding theorem) that each such abstract n-dimensional manifold can be realized as embedded in the M2", so we essentialy do not loose any generality here. 614 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS is a covering of the entire set M by local parametrizations

V C Kfe, we can easily compute the result, following the same definition. Let us denote ip*(uj)(u) = f(u)dui A ■ ■ ■ A duk. Invoking the relation 9.1.2(2) for the pullback of a form by a composite mapping, we get UJ = r + e, and 0 < /£iI. (t) < 1 everywhere. At the same time, we had f(t) ^ 0 if and only if \t\ < r + e. Next, if we define Xr,£,x0(x) = fE,r(\x - X0\), then we get a smooth function which takes the value 1 inside the ball Br-e(x0), with support exactly Br+e(x0), and with values between 0 and 1 everywhere. moznaobr. char.fce Lemma (Whitney's theorem). Every closed set K c R™ is the set of all zero points of some smooth non-negative function. Proof. The idea of the proof is quite simple. If K = R™, the zero function fulfills the conditions, so we can further assume that K Rn. The open set U = R™ \ K can be expressed as the union of (at most) countably many open balls Br% (xi), and for each of them, we choose a smooth non-negative function /, on R™ whose support is just Br% (xi), see the function Xr,e,x0 above. Now, we add up all these functions into an infinite series oo f(x) = ^2akfk(x), k=l where the positive coefficients ak are selected so small that this series would converge to a smooth function f(x). To this purpose, it suffices to choose ak so that all partial derivatives of all functions a^fkix) up to order k (inclusive) would be bounded from above by 2~k. Then, the series J2k akfkis bounded from above by the series J2k ^~k< hence by Weierstrass criterion, it converges uniformly on the entire This property is called paracompactness and, actually, each metric space is paracompact. Thus in particular all our manifolds enj oy this property too. But we do not want to go into details of the proof. 616 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Rn. Moreover, we get the same for all series of partial derivatives, since we can always write them as , ~ 9x{^ ■ ■ ■ dx{ dx{^ ■ ■ ■ dx{ k=0 k=r where the first part is a smooth function as it is a finite sum of smooth functions, and the second part can again be bounded from above by an absolutely converging series of numbers, so this expression will converge uniformly to dx y ■fdx . It is apparent from the definition that the function f(x) satisfies the conditions of the lemma. □ Partition of unity on a manifold Theorem. Consider a manifold M C Rn equipped with a locally finite cover by open images Ui of parametrizations Ui such that the closures of all images pi(Vi) are compact and, eventually, choose a partition of unity f subordinated to this cover. The integral is defined by the formula / ^ = / y2fiuj=y2 fiuj> JM JM i i JUt 617 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS where the right-hand integrals have already been denned since each of the forms feu has support inside the image under the parametrization ipi (and they equal to JM feu for the same reason). Actually, we can assume that our sum is finite, since it suffices to consider integral over the image of parametrizations covering the compact support of cu. Hence, it is a well-defined number, yet it remains to verify that the resulting value is independent of all our choices. To this purpose, let us choose another system of parametrizations t/j : Vj —> Uj, again with compatible orientations, providing a locally finite cover of M. Let gi be the corresponding partition of unity. Then the sets Wij = UiDUj form again a locally finite covering and the set of functions fgj provide the partition of unity subordinated to this covering. We arrive at the following equalities: Y hu = Y fi(Y9i)LJ = Y fi9iw i jM i jM j i,j jM Y 9jw = Y 9j(Y,fi)Lj = Y fw"' where the potentially infinite sums inside of the integrals are all locally finite, while the sums outside of the integral can be viewed as finite due to the compactness of the support of cu. Thus, we have checked that the choices of the partition of unity and the parametrizations do not influence the value of the integral. 9.1.10. Exterior differential. As we have seen, the differential of a function can be interpreted as a mapping d : ffi(Rn) -> i?1 (Rn). By means of parametrizations, this definition extends (in a coordinate free way) to functions / on manifolds M, where the differential df is a linear form on M. The following theorem extends this differential to arbitrary exterior forms on manifolds Mcl". Exterior differential Theorem. For all m-dimensional manifolds McK™ and k = 0,..., m, there is the unique mapping such that (1) d is linear with respect to multiplication by real numbers; (2) for k = 0, this is the differential of functions; (3) if a e ft (M), ft arbitrary, then d(a A/3) = (da) A (3 + (-l)ka A (df3); (4) d(df) = Ofor every function f on M. The mapping d is called the exterior differential. The equality d o d = 0 is valid for all degrees k. 618 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Proof. Each fc-form can be written locally in the form a= ^2 o,i1--ikdxil A ■ ■ ■ A dxik. il< — —--—dxi A dxi, A ■ ■ ■ A dxik. r,t\<---77hr- ~ 7hT7)—)dx3 a dx- a dx^ a " ' a dx*> Kdxidxj dxjdxi i V, x = G(y) = (g1 (y),..., gm(y)), and compute G* (da) of an exterior form a = a dx{l A ■ ■ ■ A dxik (which gives the result for sums of such expressions too). This is straightforward: G*(da)^^)G*(dxt)AG*(dxn)A--- -oG)(^yi+...)A(^yi+. Now, notice d(G*(dxj)) = G*(d(dxj)) = 0 and thus d(G*(a)) = d((a o G)G*(dxn) A...) = d(a o G) A G*(dxil) A ■ ■ ■ A G*(dxik) = J2((^°G)(^-dy1 + ...))AG*dxnA..., i clearly the same expressions. □ 9.1.11. Manifolds with boundary. In practical problems, we often work with manifolds M like an open ball in the three-dimensional space. At the same time, we are interested in the boundaries of these manifolds DM, which is a sphere in the case of a ball. The simplest case is the one of connected curves. It is either a closed curve (like a circle in the plane), then its boundary is empty, or the boundary is formed by two points. These points will be considered including the orientation inherited from the curve, i.e. the initial point will be taken with the minus sign, and the terminal point with the plus sign. The curve integral is the easiest one, and we can notice that integrating the differential df of a function along the curve M denned as the image of a parametrization c : [a,b] —> M, then we get directly from the definition that f df= f c*(df)= [b±(foc)(t)dt JM J[a,b] Ja at = f(c(b))-f(c(a)). Therefore, the result is not only independent of the selected parametrization, but also of the actual curve. Only the initial and terminal points matter. Splitting the curve into several 3Such operators are intrinsically defined on all manifolds. Actually, for all fc > 0, the only operation d : Qk —¥ Qk+1 commuting with pullbacks and with values depending only on the behavior of the argument a on any small neighborhood of x (locality of the operator), is the exterior derivative. Thus even the linearity, as well as the dependence on the first derivatives are direct consequences of naturality. See the book Natural operations in differential geometry, Springer, 1993, by I. Kolar, P.W. Michor and J. Slovak for full proof of this astonishing claim. 620 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS consecutive disjoint intervals, the integral splits into the sum of differences of the values at the splitting points. This sum will be telescoping (i.e., the middle terms cancel out), resulting in the same value again. Notice, we have already proved the behavior expected in 9.1.5 when dealing with the elevation gain by a cyclist. We shall discuss this phenomenon in general dimensions now. To be able to do this, we need to formalize the concept of the boundary of a manifold and its orientation. The simplest case is the closed half-space M = (—oo, 0] x Rn_1. Its boundary is dM = {{x1, x2, ■ ■ ■, xn) e Rn; x1 = 0}. The orientation on this boundary inherited from the standard orientation is the one determined by the form dx2 A ■ ■ ■ A dxn. Oriented boundary of a manifold Let us consider a closed subset M C R™ such that its interior M C M is an oriented m-dimensional manifold covered by compatible parametrizations ipi. Further, let us assume that for every boundary point x e dM = M \ M, there is a neighborhood in M with parametriza-tion ip : V C {-oo, 0] x Rm_1 -> M such that the points x e dM n ip{ V) from just the image of the boundary of the half-space (-oo, 0] x Rm_1. The subset Mel™ covered by the above parametrizations with compatible orientations is called an oriented manifold with boundary. The restrictions of the parametrizations including boundary points to the boundary dM defines the structure of an (m — l)-dimensional oriented manifold on dM. Think of a closed unit balls B(x,r) c R™ as such manifolds. Their interiors are an n-dimensional manifolds, přikladu just open subsets in R™, but their boundaries S™-1 are the spheres with the inherited structure of (n — 1)-dimensional manifolds. The inherited orientations are well understood via the outward normals to the spheres. Another example is a plane disc sitting as a 2-dimensional manifold in R3 with its 1-dimensional boundary being a circle. Here the chosen position of the normal to the plane defines the orientation of the circle, one or the other way. In practice, we often deal with slightly more general manifolds where we allow for corners in the boundary of all smaller dimensions. A good example is the cube in R3 having the sides as 2-dimensional parts of the boundary and also the edges between them as 1-dimensional parts and the vortices as O-dimensional parts of the boundary. Yet another class of examples is formed by all simplexes and their curved em-beddings in R™. Since those lower dimensional parts of the boundary will have Riemann measure zero, we can neglect them when integrating over dM. Thus we shall not go into details of this technical extension of our definitions. 621 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.1.12. Stokes' theorem. Now, we get to a very important and useful result. We shall formulate the main theorem about the multidimensional analogy of curve integrals for smooth forms and smooth manifolds. A brief analysis of the proof shows that actually, we need once continuously differentiable exterior forms as integrands on twice continuously differentiable parametrizations of the manifold. In practice, the boundary of the region is often similar as in the case of the unit cube in R3, i.e., we have discontinuities of the derivatives on a Riemann measurable set with measure zero in the boundary. In such a case, we divide the integration to smooth parts and add the results up. We can notice that although new pieces of boundaries appear, they are adjacent and have opposite orientations in the adjacent regions, so their contribution is canceled out (just like in the above case of boundary points of a piecewise differentiable curve). Stokes' theorem Theorem. Consider a smooth exterior (k — l)-form uj with compact support on an oriented manifold M with boundary dM with the inherited orientation. Then we have I dui = I uj. ■JM JdM Proof. Using an appropriate locally finite cover of the manifold M and a partition of unity subordinated to it, we can express the integrals on both sides as the sum (even a finite sum, since the support of the considered form uj is compact) of integrals of forms uj supported in individual parametrizations. Thus we can restrict ourselves to just two cases M = Rfe or the half-space M = (-00,0] x R^"1. In both cases, uj will surely be the sum of forms ujj ujj = a,j(x)dxi A ■ ■ ■ A dxj A ■ ■ ■ A dxk, where the hat indicates the omission of the corresponding linear form, and a, (x) is a smooth function with compact support. Their exterior differentials are dujj = (-1V 1——dx1 A ■ ■ ■ A dxk. dx.j Again, we can verify the claim of the theorem for such forms ujj separately. Let us compute the integrals JM dujj using the Fubini's theorem. This is most simple if M = R™, f f°° da dujj = (—/ ( / —— dxj)dxi ■ ■ ■ dxj ■ ■ ■ dxk jRk-l J-00 oxj = (—\)J I [ajj _ dxi ■ ■ ■ dxj ■ ■ ■ dxk = 0. Notice, we are allowed to use the Fubini's theorem for the entire R™ since the support of the integrated function is in fact compact and thus we can replace the integration domain by a large multidimensional interval /. At the same time, the forms ujj are all zero outside of such a large interval I and thus 622 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS the integrals JgM uij all vanish and the claim of the Stokes' theorem is verified in this case. Actually, we may also say that dM = 0 and thus the integral is zero. Next, let us assume M is the half-space (—00,0] x Rfe_1. If j > 1, the form uij evaluates identically to zero on the boundary dM, since 2; 1 is constant there and thus dx1 is identically zero on all tangent directions to dM. Integration over the interor M yields zero, using the same approach as above: f° f f°° da dcjj = (—/ ( / —-2- dxjjdxi - ■ -dxj- ■ -dxk = / [aj]™ dxx ■ ■ ■ dxj ■ ■ ■ dxk = 0 J-ooJRk-1 since the function a, has compact support. So the theorem is also true in this case. However, if j = 1, then we obtain f dui1 = f ( [ ^-dxi)dx2......dxk Jm JK*-i \J-00 OXl J a1(0,x2,...,xk)dx2---dxk= / u1. K*-1 JdM This finishes the proof of Stokes' theorem. □ 9.1.13. Green's theorem. We have proved an extraordinarily strong result which covers several standard integral relations from the classical vector analysis. For instance, we can notice that by Stokes theorem, the integration of exterior differential dui of any fc-form over a compact manifold without boundary is always zero (for example, integral of any 2-form dui over the sphere S8 cl3 vanishes). Let us look step by step at the cases of Stokes' theorem with k dimensional boundaries dM in R™ in low dimensions. Green's theorem In the case n = 2, k = 1, we are examining a domain M in the plane, bounded by a closed curve C = dM. Differential 1-forms are ui(x, y) = f(x, y) dx + g(x, y) dy, with the differential du = + ||)dx A dy. Therefore, Stokes' theorem yields the formula f(x,y)dx + g(x,y)dy = I ( + ^- J dx A dy lc Jm\ dy dx which is one of the standard forms of the Green's theorem. Using the standard scalar product on R2, we can identify the vector field X with a linear form u>x such that u>x (Y) = (Y,X). In the standard coordinates (x, y), this just means that the field X = f(x,y)J^+ g(x, y) corresponds to the form cj = f(x, y) dx + g(x, y) dy given above. The integral of uix over a curve C has the physical interpretation of the work done by movement along this curve in the force field X. Green's theorem then says, besides others, that if u>x = dF for some function F, then the work done along a closed curve is always zero. Such fields are called potential fields 623 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS and the function F is the potential of the field X. In other words, the work done when moving in potential fields does not depend on the path, it depends only on the initial and terminal points. With Green's theorem, we have verified once again that integrating the differential of a function along a curve depends solely on the initial and terminal points of the curve. 9.1.14. The divergence theorem. The next case deals with integrating over some open subset in R3 and it has got a lot of incarnations in practical use. We shall mention a few. Gauss-Ostrogradsky's theorem In the case n = 3, k = 2 we are examining a region M C R3, bounded by a surface S. All 2-forms are of the form uj = f(x, y, z) dyAdz+g(x, y, z) dzAdx+h(x, y, z) dxA dy, and we get du = (|£ + || + §|) dx A dy A dz. The Stokes' theorem says that / dy A dz + g dz A dx + hdx A dy s df dg dh\ , it + -ir + it )dx adyAdz- iM\dx dy dz J This is the statement of the Gauss-Ostrogradsky theorem. This theorem has a very illustrative physical interpretation, too. Every vector field X = f(x,y,z)^+ g(x,y,z)-§^ + h(x,y,z)jfe can be plugged into the first argument of the standard volume form cu^s = dx A dy A dz on R3. Clearly, the result is a 2-form uix(x, y, z) = f(x, y, z)dy A dz + g(x, y, z)dz A dx + h(x, y, z)dx A dy. The latter 2-form infinitesimally describes the volume of the parallelepiped given by the flux caused by the field X through a linearized piece of surface. If we consider the vector field to be the velocity of the flow of the particular points of the space, this infinitesimally describes the volume transported pointwise by the flow through the given surface S. Thus the left hand side is the total change of volume inside of S, caused by the flow of X. The integrand of the right-hand side of the integral, is related to the so-called divergence of the vector field, which is the expression denned as d{ujx) = (divX) dx Ady A dz. The Gauss-Ostrogradsky theorem says ix^r3 = / divXo;K3, s Jm i.e. the volume of total flow through a surface is given as the integral of the divergence of the vector field over the interior. In particuclar, if div X vanished identically, then the total volume flow through the boundary surface of the region is zero as well. 624 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Such fields, with div X = 0, are called divergence free or solenoidal vector fields. They correspond to dynamics without changes of volumes (e.g. modelling dynamics of incompressible liquids). In order to reformulate the theorem completely in terms 'i ■■ of functions, let us observe that the inherited volume form u>s on S is defined by the property v* A u>s = cjR3 at all points of S, where v* is dual form to the oriented (outward) unit normal to S. All forms of degree 2 are multiples of u>s by functions. In particular, ix{v* Aws) = v ■ Xujs, i.e. we have to integrate the scalar product of the vector field X with the unit normal vector with respect to the standard volume on S. Thus, we have proved the folowing result formulated in the classical vector analysis style. Actually, a simple check reveals that the above arguments work for all open submanifolds Met™ with boundary hy-persurface S and vector fields X. The reader should easily verify this in detail. Divergence theorem Theorem. Let X be a vector field on a n-dimensional manifold Mcl" with hypersurface boundary S. Then / divXdxi... dxn = / X-vdS, Jm Js where v is the oriented (outward) unit normal to S and dS stays for the volume inherited from Rn on S. Notice the 2-dimensional case coincides with the Green's theorem above. 9.1.15. The original Stokes theorem. If cu is any linear form, then the integral of dcu over a surface depends on the boundary curve only. This is the most classical Stokes' theorem: The classical Stokes' theorem In the case n = 3, k = 1 we deal with a surface M in R3 bounded by a curve C. The general linear forms are cu = f dx + gdy + hdz, with the integral f dx + g dy + h dz = I duj, c Jm where dw = (|| - ff )dy A dz + (|| - §f )dz A dx + (If - %)dxAdy. _^ Again, we use the standard scalar product to identify the vector field X = /J| + gJ^ + h-§^ with the form = f dx+gdx+hdz. Finally, reverting the above relation between the vector fields and two forms on R3, the 2-form du>x can be identified with the vector field rot X, dwx = o;K3(rotX, , ). 625 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS This field is called the rotation or curl of the vector field X. The Stokes' theorem now reads: / UJx = I TOtX. Jc Jm Consequently, the fields X with the property u>x = dF for some function F (the fields of gradients of functions), have got the property rot X = 0. They are called conservative (or potential) vector fields. 9.1.16. Another kind of integration. As we have seen, so-!|8ia> lutions to ODEs are flows of vector fields. As a jf^} modification, we can prescribe one-dimensional M and any vector fields X, Y on M valued in D. Since Dy = TyN for all // A, A and V are tangent to i(N) C 4Marius Sophus Lie (1842-1899)) was an excellent Norwegian mathematician, the father of the Lie theory. Originally invented to deal with systems of partial differential equation via continuous groups of their symmetries, the theory of Lie groups and Lie algbebras is nowadays in the core of a vast part of Mathematics. It is a pity we do not have time and space to devote more attention to this marvelous mathematical story in this textbook. 627 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS M. We claim that the restriction of the Lie bracket [X, Y] to i(N) is the image i*([X, Y]), where the vector fields X, Y are viewed as the given fields on N, i.e., i*X(x) = X(i(x)), i*Y(x) = Y(i(x)). Thus, the bracket has to be in the image again. The latter claim is a consequence of a more general statement: Claim. If p : N —> M is a smooth map and two couples of vector fields X, Y and X, Y satisfy Tip o X = X o p, Tp o Y = y o p, then their Lie brackets satisfy the same relation: Tpo[X,Y] = [X,Y]0p. Indeed, consider a smooth function / on M and compute, using X(fop)(x) = (TpX)f = (Xop)(x)f = X(p(x))f, and similarly for Y: [X, Y](f o p)(x) =XY(fo p)(x) - YX(f o p)(x) = X((Yf)0p)(x)-Y((Xf)0p)(x) = X(Yf)(p(x))-Y(Xf)(p(x)) = ([X,Y]f)o Q forgetting the first coordinate. On the submanifold Q, there is the (n — 1)-dimensional involutive distribution D generated by the fields y = Y%\q, i = 2,..., n (notice we again use the argument from the beginning of the proof about the brackets of restricted fields). Now, our assumption says we find suitable coordinates (q2,... ,qm) on Q around the point x e Q, so that for all small constants bn+1, ...,bm, the integral submanifolds of D are defined by qn+1 = bn+1, ...,qm=bm. Finally, we need to adjust the original coordinate functions yi all over the neighborhood U of x. The obvious idea is to use the flow of X1 = Y1 to extend the latter coordinates on Q. Thus we define the coordinate functions in all y e U using the projection p, xi(y) = yi(y),x2(y) = q2(p(y)),-- -,xm = qm{p{y))- The hope is that all submanifolds TV given by equations xn+1 = bn+1,..., xm = bm (for small bf) will be tangent to all fields Yi,...,Yn. Technically, this means Yi(xj) = 0 for alii = 1,..., n, j' = n + 1,..., m. By our definition, this is obvious for the restriction to Q, and obviously Y\ (xf) = 0 in all other points, too. Let us look closely on what is happening with one of our functions Yi(x,j) along the flows of the field X1. We easily compute with the help of the definition of the Lie bracket ■^(Yiixj)) = Y1(Yi(xj)) = YMfa)) + [Y^fa) m = Yi{Xi{xA) + cmY^x/) + y^/clikYk(xj) k=2 m = y^cukYkjxj). k=2 This is a system of linear ODEs for the unknown functions Yi(xj) in one variable x1 along the flow lines of Y1. The initial condition at the point in Q is zero and thus this constant zero value has to propagate along the flow lines, as requested. The induction step is complete. □ 9.1.19. Formulation via exterior forms. As we know from linear algebra, a vector subspace of codimension k is defined by k independent linear forms. Thus, every smooth n-dimensional distribution D C TM on a manifold M can be (at least) locally defined by m — n linear forms uij on M. A direct computation in coordinates reveals that the differential of linear from cu evaluates on two vector fields as follows (1) duo{X, Y) = X{uo{Y)) - Y{uj{X)) - uo{[X, Y]). 629 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Indeed, if X = J2t X,£-, Y = ^ Y£-, cu = J2tcu,dx,, then X(co(Y))-Y(co(X)) = jyXi^jYjyY^icjjXj)) = duj{X,Y) +uj{[X,Y]). Thus, the involutivity of a distribution denned by linear forms cjn+i,... ,cjrn should be closely linked to properties of the differentials on the common kernel. Indeed, there is the following version of the latter theorem: Frobenius' theorem Theorem. The distribution D defined on an m-dimensional manifold M by (m — n) independent smooth linear forms oJn+i, ■ ■ ■ ,wm is integrable if and only if there are linear forms ctij such that duj^ = J2e a^e A uji. Proof. Let us write cu = (cun+i,... ,wm) fortheR"1-™-valued form. The distribution is D = kero;. Now, the formula (1) (applied to all components of cu) implies that involutivity of D is equivalent to du\ u = 0. If the assumption of the theorem on the forms holds true, dui clearly vanishes on the kernel of cu and therefore D is invo-lutive, and one of the implications of the theorem is proved. Next, assume D is integrable. By the stronger claim proved in the latter Frobenius theorem, for each point x e M, there are coordinates (x1,... ,xm) such that D is the common kernel of all dxn+i, ■ ■ ■, dxm. In particular, our forms uij are linear combinations (over functions) of the latter (m — n) differentials. Moreover, there must be smooth invertible matrices of functions A = (a^) such that dxk = P-ki^e, k, £ = n + 1,..., m. e Finally, dcu^ includes only terms with dx{ A dxj with j > n and all dxj can be expresed via our forms u>i from the previous equation. Thus the differentials have got the requested forms. □ 2. Remarks on Partial Differential Equations The aim of our excursion into the landscape of differential equations is modest. We do not have space in this rather elementary guide to come close enough to this subtle, beautiful, and extremely useful part of mathematics dealing with differential equations. Still we mention a few issues. First, the simplest method reducing the problem to already mastered ordinary differential equations is explained, based on the so called characteristics. Then we show more simple methods how to get some families of solutions. Next, we present a more complicated theoretical approach dealing with formal solvability of even higher order systems of differential equations and its convergence - the 630 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS famous Cauchy-Kovalevskaya theorem. This is the only instance of general existence and uniqueness theorem for differential equations involving partial derivatives. Unortunately, it does not cover many of interesting problems of practical importance. Finally, we display a few classical methods to solve boundary problems involving some of the most common equations of second order. 9.2.1. Initial observations. In practical problems, we often meet equations relating unknown functions of '**STy^*' more variables and their derivatives. We already '■Is X~ handled the very special case where the rela-s> tions concerned functions x(t) of just one vari- able t. More explicitely, we dealt with vector equations x(k) = F(t, x, x, £, x\ x{k-^), F : Rnk+1 -> Rn, where the dots over ieB" meant the (iterated) derivatives oix(t) = (x1(t),... ,xn(t)), up to the order k. The goal was to find a (vector) curve x(t) in Rn which makes this equation valid. Two more comments are due: 1) we can omit the explicit appearance of t on the cost of adding one more variable and equation x0 = 1; and 2) giving new names to the iterated derivates Xj = x^) and adding equations ±j = Xj+1, j = 1,..., k — 1, we reduce always the problem to a first order system of equations (on a much bigger space). Thus, we should like to work similarly with the equations F(x, y, UXxi V>xyi ^yyi • • • ) 0, where u is an unknown function (possibly vector valued) of two variables x and y (or even more variables) and, as usual, the indices denote the partial derivatives. Even if we expect the implicit equation to be solved in some sence with respect to some of the highest partial derivatives, we cannot hope for a general existence and uniqueness result similar to the ODE case. Let us start with a most simple example illustrating the general problem related to the choice of the initial conditions. 9.2.2. The simplest linear case. Consider one real function u = u(x, y), subject to the linear homogeneous equation (1) a(x,y)ux + b(x,y)uy = 0 where a and b are known functions of two variables denned for x, y in a domain 17 C t2. We consider the equation in the tubular domain fl x R c R3. Usualy, fl is an open set together with a nice boundary, a curve dfl in our case. An obvious simple idea suggests to write fl as a union of non-intersecting curves and look for u constant along those curves. Moreover, if those curves were transversal to the boundary dfl, then initial conditions along the boundary should extend inside of fl. Thus, consider such a potentially existing curve c(t) = (x(t), y(t)) and write 0 = JtuW)) = ux{c{t))x{i) + uy(c(t))y(t). 631 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS This yields the conditions for the requested curves: (2) x = a(x,y), y = b(x,y). Since u is considered constant along the curve, we obtain a unique possibility for the function u along the curves for all initial conditions x(0), y(0), and u(x(0), y(0)), if the coefficients a and b are at least Lipschitz in x and y. The latter curves are called the characteristics of the first order partial differential equation (1) and they are solutions of its characteristic equations (2). If the coeficients are dif-ferentiable in all variables, then also the solution u will be dif-ferentiable for differentiable choices of initial conditions on a curve transversal to the characteristics and we might have solved the problem (1) locally. Still it might fail. Let us look at the homogeneous linear problem (3) yux — xuy = 0, u(x, 0) = x. We saw already the solutions to the characteristic equations x = y, y = -x and the characteristics are circles with centers in the origin, x(t) = Rsmt, y(t) = Rcost. If we choose any even differentiable function tp(x) = u(x, 0) for the initial conditions at points (x, 0), we are lucky to see that the solution will work. But for odd functions, e.g. our choice tp(x) = x, there will be no solution of our problem in any neighbourhood of the origin. Clearly, this failure is linked to the fact that the origin is a singular point for characteristic equations. 9.2.3. The quasi-linear case. The situation seems to get more tricky once we add a nontrivial right-hand value f(x, y, u) to the equation (1), i.e. we try to solve the problem (allowing a and b to depend on u) (1) a(x, y, u)ux + b(x, y, u)uy = f(x, y, u). But in fact, the very same idea leads to characteristic equations on R3, writing z = u(x,y) for the unknown function along the characteristics. Geometrically, we seek for a vector field tangent to all graphs of solutions in the tubular domain fl x R. Remind z = u(x, y), restricted to a curve in the graph, implies z = uxx + uyy, and thus we may set z = f(x, y, z), x = a(x, y, z), y = b(x, y, z) in order to get such a characteristic vector field. Characteristic equations and integrals The characteristic equations of the equation (1) are (2) x = a(x,y,z), y = b(x,y,z), z = f(x,y,z). This autonomous system of three equations is uniquely solvable for each initial condition if a, b, and / are Lipschitz. A function t/j on fl x R which is constant on each flow line of the characteristic vector field, i.e., i/j(x(t),y(t),z(t)) = const for all solutions of (2), is called an integral of the equation (1). If i/jz ^ 0, then the implicit function theorem guarantees the unique existence of the function z = u(x,y) satisfying the chosen initial conditions. 632 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Check yourself that the latter functions u are solutions to our problem. This approach covers the homogeneous case as well, we just consider the autonomous characteristic equations with i = 0 added. Let us come back to our simple equation 9.2.2(3) and choose j(x,y,u) = y for the right-hand side. The characteristic equations yield x = R sin t, y = R cos t as before, while z = y = Rcost and hence z = Rsint + z(0). Thus, we may choose t/j(x,y,z) = z — x as an integral of the equation, and the solution u(x,y) = x + C with any constant C. Notice, there will be plenty of solutions here since we may add any solution of the homogenous problem, i.e. all functions of the form (3) u(x,y) = h(x2 + y2) with any differentiable function h. Thus, the general solution u(x, y) = x + h(x2 + y2) depends on one function of one variable (the above constant C is a special case of h). We may also conclude that for "reasonable" curves dfi C R2 (those transversal to the circles centred at the origin and not containing the origin) and "reasonable" initial value u\qq (we have to watch the multiple intersection of the circles with dfi!) there will be (at least locally) a unique solution extending the intital values to an open neighborhood of dfi. Of course, we may similarly use characteristics and integrals for any finite number of variables x = (x1,..., xn) and equations of the form . , du , du , ai(x,u)---1-----\-an[x,u)-— = J[x,u) ox i dxn with the unknown function u = u(x1,... ,xn). As we shall see later, typically we obtain generic solutions dependent on one function of n — 1 variables, similarly to the above example. 9.2.4. Systems of equations. Let us look what happens if we add more equations. There are two quite different ways how to couple the equations. We may seek for an unknown vector valued functions u= («!,..., um) : R™ —> Rm, subject to m equations (1) Ai(x,u) ■ Vu{ = fi(x,u), i = l,...m, where the left hand side means the scalar product of a vector valued function A{ : Rm+n —> Rn and the gradient vector of the function Ui. Such systems behave similarly as the scalar ones and we shall come back to them later. The other option leads to the so called overdetermined systems of equations. Actually we shall not pay more attention to this case in the sequel and so the reader might jump to 9.2.6 if getting lost. Consider a (scalar) function u on a domain in fi C R™ and its gradient vector Vu. For each matrix A = (ciij) with m rows and n columns, with differentiable functions aij(x, u) on fi x R, and the right hand value function F(x, u) : fi x R —> Rm, we can consider the system of equations 633 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS (2) A(x,u) ■ Vu = F(x,u). Of course, in both cases, we have got m individual equations of the type from the previous paragraph and we could apply the same idea of characteristic vector fields for all of them. The problem consists in coupling of the equations and obtaining possibly inconsistent neccesary conditions from the individual characteristic fields. Let us look at the overdetermined case now. We can get most close to the situation with the ordinary differential equations if A is invertible and we move it to the right hand side, arriving at the system of equations (3) Vu = A'1^,^ ■ F{x,u) = G{x,u). The simplest non-trivial case consists of two equations in two variables: ux = f(x, y, u), uy = g{x, y, u). Geometrically, we describe the graph of the solution as a surface in R3 by prescribing its tangent plane through each point. An obvious condition for the existence of such u is obtained by differentiating the equations and employing the symmetry of the higher order partial derivatives, i.e. the condition uxy = uyx- Indeed, t^xy fy + Iu9 9x + 9uj ^yx: where we substituted the original equations after applying the chain rule. We shall see in a moment that this condition is also sufficient for the existence of the solutions. Moreover, if the solutions exist, then they are determined by their values in one point, similarly to the ordinary differential equations. 9.2.5. Frobenius' theorem again. Similarly, we can deal with the gradient Vu of an m-dimensional vector valued function u. For example, if m = 2 and n = 2 we are describing the tangent planes to the two-dimensional graph of the solution u In general we face ttiti equations (1) — = Fpi{X,U), l = 1,. . . ,71, p = 1,. . . ,771. The necessary conditions imposed by the symmetry of higher order derivatives then read w dxtdxj ~ dxj "t" duq rV~ dxt "t" duq rV for all i, j and p. Let us reconsider our problem from the geometric point of view now. We are seeking for the graph of the mapping u : R™ —> Rm. The equations (1) describe then-dimensional distribution D on Rm+n and the graphs of possible solutions u = («!,..., um) axe, just the integral manifolds of D. The distribution D is clearly defined by the m linear forms Up = dup-^Fpidxi, p=l,. 634 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS while the vector fields generating the common kernel of all u)p can be chosen as xt = A + y-^A. Now we compute differentials dcup and evaluate them on the fields Xi —dcuv = . pl dxj A dxi + pl du„ A dxi ^ dx; J ^ du„ i,j J i,q H l,j ■> q H -dup(Xj,X^(^ + Y^Fqj) tdFvj + 9fpj f ^ dii ^ du„ ' Q qi) Thus, vanishing of the differentials on the common kernel is equivalent to the neccesary conditions deduced above, and the Frobenius theorem says that the latter conditions are sufficient, too. We have proved the following: Theorem. The system of equations (1) admits solutions if and only if the conditions (2) are satisfied. Then the solutions are determined uniquely locally around x G Q by the initial conditions u(x) G Rm. Remark. The Frobenius' theory deals with the so called overdetermined systems ofPDEs, i.e. we have got too many equations and this causes obstructions towards their integra-bility. Although the case in the last paragraph sounds very special, the actual use of the theory consists in considering differential consequences of a given system until we reach a point, where the special theorem applies and gives not only further obstractions but also the sufficient conditions. 9.2.6. General solutions to PDE's. In a moment, we shall deal with diverse boundary conditions for the solutions of PDEs. In most cases we shall be ^Nfc happy to have good families of simple "guessed" ^r^ispB^-J— solutions which are not subject to any further conditions. We talk about general solutions in this context. Unlike the situation with ODEs, we should not hope to get a universal expression for all possible solutions this way (although we can come close to that in some cases, cf. 9.2.3(3)). Instead, we often try to find the right superpositions (i.e. linear combinations) or integrals built from suitable general solutions. Let us look at the simplest linear second order equations in two variables, homogeneous with constant coefficients: (1) Auxx + 2Buxy + Cuyy + Dux + Euy + Fu = 0 where A, B, C, D, E, F are real constants and at least one of A, B, C is non-zero. Similarly to the method of characteristics, we try to reduce the problem to ODEs. Let us again assume solution s 635 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS in the form u = f(p), where / is an unknown function of p and p(x, y) should be nice enough to get close to solutions. The necessary derivatives are ux = f'px, uy = f'py, V'xx f PxPx + f Pxxi V'xy f PxPy + f Pxyi ^yy f'PyPy + f'Pyy Thus (1) becomes too complicated in general, but restricting to afnnep(a;,y) = ax+/3y with constants a, /3, we arrive at (2) (Aq2 + 2Baf3 + Cf32)/" + (Da + E0)f + Ff = 0. This is a nice ODE as soon as we fix the values of a and (3. Let us look at several simple cases of special importance. Assume D = E = F = Q, A / 0. Then, after dividing by a2, we solve the equation (A + 2B^ + Cfr)/" = 0 and the right choice of the ratio A = j3/a =^ 0 kills the entire coefficient at /". Thus, (2) will hold true for any (twice dif-ferentiable) function / and we arrive at the general solution u(x,y) = f(p(x, y)), withp(a;, y) = x + Ay. Of course, the behavior will very much depend on the number of real roots of the quadratic equation A + 2BX + C\2 = 0. The wave equation. Put A = 1,C = B = Q, thus our equation is uxx = -^Uyy, the wave equation in dimension 1. Then the equation 1 — ^A2 = 0 has got two real roots A = ±c, and we obtain p = x ± cy leading to the general solution u(x, y) = f(x - cy) + g(x + cy) with two arbitrary twice differentiable functions of one variable / and g. In Physics, the equation models one-dimensional wave development in the space parametrized by x while y stays for the time. Notice c corresponds to the speed of the wave u(x, 0) = f(x) + g(x) initiated in the time y = 0, and while the / part moves forwards, the other part moves backwards. Indeed, imagine u(x,y) = f(x — cy) describes the displacement of a string at point x in time y. This remains constant along the lines x — cy = constant. Thus, a stationary observer sees the initial displacement u(x, 0) moving along x-axis with the speed c. In particular, we see that the initial condition along a line in the plane is not enough to determine the solution, unless we request the solution will move only in one of the possible directions (i.e. we posit either / or g to be zero). The Laplace equation. Now we consider A = C = 1, B = 0, i.e. the equation uxx + uyy = 0. This is the Laplace equation in two dimensions and its solutions are called harmonic functions. Proceeding as before, we obtain two imaginary solutions to the equation A2 + 1 = 0 and our method produces p = x ± iy, a complex valued function instead of the expected real one. This looks ridiculous, but we could consider / to be a mapping / : C —> C viewed as a mapping on the complex plane. Remind that some of such mappgins have got differentials D1f(p) which actually are multiplications by complex numbers at each point, cf. ??. This is in particular true for 636 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS any polynomial or converging power series. We may request that this property holds true for all iterated derivatives of this kind. In general, we call such functions on C holomorhic and we discuss them in the last part of this chapter. The reader is advised to come back to this exposition on general solutions to Laplace equation after reading through the begining of the part on complex analytic functions below, starting in 9.4.1. Now, assuming / is holomorphic, we can repeat the above computation and arrive again at (A2 + l)/"(p) = 0 independently of the choice of / (here f'(p) means the complex number given by the differential D1f,f"(p)is the iteration of this kind of derivative). Moreover, the derivatives of vector valued functions are computed for the components separately and thus both the real and the imaginary part of the general solution f(x + iy)+g(x — iy) will be real general solutions. For example, consider f(p) = p2 leading to u(x, y) = (x + iy)2 = (x2 — y2) + i 2xy and simple check shows that both terms satisfy the equation separately. Notice the two solutions x2 — y2 and xy povide the bases of the 2-dimensional vector space of harmonic homogeneous polynomials of degree two. The diffusion equation. Next assume A = k, B = C = D = F = 0, and add the first order term with E = — 1. This provides the equation Uy KUXX: the diffusion equation in dimension one. Applying the same method again, we arrive at the ODE «q2/" -Pf = 0 which is easy to solve. We know the solutions are found in the form f(p) = eyp with v satisfying the condition kcPv2 — (3v = 0. The zero solution is not interesting, thus we are left with the general solution to our problem by substituting p{x, y) = ax + /3y and v = Again, a simple check reveals that this is a solution. But it is not very "general" - it depends just on two scalars a and (3. We have to find much better ways how to find solutions of such equations. 9.2.7. Nonhomogeneous equations. As always with linear equations, the space of solutions to the homogeneous linear equations is a real vector space (or complex, if we deal with complex valued solutions). Let us write the equation as Lu = 0, where L is the differenatial operator on the left hand side. For instance, L Ad2 B d2 cd2 D& Ed F dx2 dxdy dp2 dx dy in the case of the linear equation 9.2.6(1). 637 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS The solutions of the corresponding non-homogeneous equation Lu = f with a given function / on the right hand side form an affine space. Indeed, if Lu\ = f, Lu2 = f, Lu3 = 0, then clearly L(ui —u2) = 0 while L(ui +u3) = f. Thus, if we succeed to find a single solution to Lu = f, then we can add any general solution to the homogeneous equation to obtain a general solution. Let us illustrate our observation on some of our basic examples. The non-homogenous wave equation uxx — uyy = x + y has got the general solution u{x,y) = \{x3 -y3)+f{x-y) + g{x + y) D depending on two twice differentiable functions. The non-homogeneous Laplace equation is called the Poisson equation. A general complex valued solution of the Poisson equation uxx + uyy = x + y is u{x,y) = \{x3 + y3) + f(x - iy) + g(x + iy) D depending on two holomorphic functions / and g. 9.2.8. Separation of variables. As we have experienced, a straightforward attempt to get solutions is to expect them in a particular simple form. The method of separation of variables is based on the assumption that the solution will appear as a product of single variable functions in all variables in question. Let us apply this method on our three special examples. Diffusion equation. We expect to find a general solution of kuxx = ut in the form u(x, t) = X(x)T(t). Thus the equation says kX" (x)T(t) = T'(t)X(x). Assume further u ± 0 and divide this equation by u = XT: X" (x) _ T'(t) X{x) ~ nT(t)' Now the crucial observation comes. Notice the terms on the left and right are function of different variables and thus the equation may be satisfied only if both the sides become constant. We shall have to distinguish the signs of this separation constant, so let us write it as —a2 (choosing the negative option). Thus we have to solve two independent ODEs X' + a2X = 0, T" + q2kT = 0. The general solutions are X(x) = A cos ax + B sin ax T(t) = Ce~a2Kt with free real constants A, B, C. When combining these solutions in the product, we may absorb the constant C into the other ones and thus we arrive at the general solution u(x, t) = (A cos ax + B sin ax) e~a . This solution depends on three real constants. 638 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS If we choose a positive separation constant instead, i.e. A2, there will be a sign change in our equations and the resulting general solution is 2 , u(x,t) = (Acoshax + Bsinhax) ea . If the separation constant vanishes, then we obtain just u(x, t) = A + Bx, independent of t. The Laplace equation. Assume u(x, y) = X(x)Y(y) satisfies the equation uxx + uyy = 0 and proceed exactly as above. Thus, X" Y + Y"X = 0 and dividing by XY and choosing the separation constant a2, we arrive at X" = a2X, Y" = -a2Y. The general solution depends on four real constants A, B, C, D u(x, y) = {A cosh ax + B sinh ax) (C cos ay + D sin ay). If the separation constant is negative, i.e. —a2, the roles of x and y swap. The wave equation. Let us look how the method works if there are more variables there. Consider a solution u(x, y, z, t) = X(x)Y(y)Z(z)T(t) of the 3D wave equation 1 ~2~Utt — Uxx A Uyy A Uzz. Playing the same game again, we arrive at the equation ±T"XYZ = X'YZT + Y"XZT + Z'XYT. Dividing by u ^ 0, c2 T X Y Z and since all the individual terms depend on different single variables, they have to be constant. Again, we shall have to keep attention to the signs of the separation constants. For instance, let us choose all constants negative and look at the individual four ODEs 11 2 A o2 1 2 X2 with the constants satisfying — a2 = —{? — j2 — S2. The general solution is u(x, y, z, t) = X(x)Y(y)Z(z)T(t) with linear combinations T(t) = A cos cat + B sin cat X(x) = C cos f3x + D sin f3x Y(y) = Ecosjy + F sin 71/ Z(z) = G cos Sz + H sin Sz with eight real constants A through H. If we choose any of the separation constants positive, the corresponding component in the product would display hyperbolic sine and cosine instead. Of course, the relation between the constants sees the signs as well. We can also work with complex valued solutions and choose the exponentials as our building blocks (i.e. X(x) = e±ifix or x(x) = e±,3x, etc). For instance, take one of the solutions with all the separation constants negative u(x, y, z, t) = el<3x elSz e~lcat = e^+7V+Sz-cat) _ 639 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Similarly to the ID situation, we can again see a "plane wave" propagating along the direction (/3,7, <5) with angular frequency CQ. 9.2.9. Boundary conditions. We continue with our examples of second order equations and discuss the three most common boundary conditions for them. Let us consider a domain C R", bounded or unbounded, and a differential operator L denned on (real or complex valued) functions on fl. We write dfl for the boundary of fl and assume this is a smooth manifold. Locally, such a submanifold in R™ is given by one implicit function p : R™ —> R and the unit normal vector v(x), x e dfl, to the hypersurface dfl is given by the normalized gradient () nv^ir We say that a function u is differentiable on fl, if it is differ-entiable on its interior and the directional derivatives D],u(x) exist in all points of the boundary. Typically we write ^ for the derivative in the normal direction. For simplicity, let us restrict ourselves to L of the form and look at the equation Lu = F(x, y, u, |^). Cauchy boundary problem At each point of the boundary x e dfl we prescribe both the value p(x) = u(x) and the derivative ip{x) = §7(2;) in the normal unit direction. The Cauchy problem is to solve the equation Lu = F du 8v on fl, subject to u = p and |^ = ip on dfl. We shall see that the Cauchy problems very often lead locally to unique solutions, subject to certain geometric conditions on the boundary dfl. At the same time, it is often not the convenient setup for practical problems. We shall illustrate this phenomenon on the 2D Laplace equation in the next but one paragraph. An even simpler possibility is to request only the condition on the values of u on the boundary dfl. Another possibility, often needed in direct applications, is to prescribe the derivatives only. We shall see, that this is reasonable for the Laplace and Poisson equations. DlRICHLET AND NEUMANN BOUNDARY PROBLEMS At each point of the boundary x e dfl we prescribe the value p(x) = u(x) or the derivative ip{x) = ^(x) in the normal unit direction. The Dirichletproblem is to solve the equation Lu = F on fl, subject to the condition u = p on dfl. The Neumann problem is to solve the equation Lu = F on fl, subject to the condition = ip on dfl. 640 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.2.10. Uniqueness for Poisson equations. Because the proof of the next theorem works in all dimensions n > 2, we shall formulate it for the general Poisson equation Theorem. Assume u is a twice differentiable solution of the Poisson equation (1) on a domain fl C Kn. If u satisfies the Dirichlet condition u = p on dfi, then u is the only solution of the Dirichlet problem. If u satisfies the Neumann condition |^ = ip on dfi, then u is the unique solution of the Neumann problem, up to an additive constant. The proof of this theorem relies on a straightforward consequence of the divergence theorem. Remind 9.1.14, saying that for each vector field X on a domain 17 c R™ with hyper-surface boundary dfi (2) / 6ivXdx1...dxn = X ■ v ddfi, J m Jan where v is the oriented (outward) unit normal to dfi and ddfi stays for the volume inherited from R™ on dfi. 1ST and 2ND green's identity Lemma. Let M "- " be a n-dimensional manifold with boundary hypersurface S, and consider two differentiable functions p and ip. Then (3) / (pAip + Vp ■ Vip) dx1... dxn = / pVip ■ v dS. Jm Js This version of the divergence theorem is called the 1st Green's identity. Next, let us consider one more differentiable function p and X = ppVip — ippVp. The the divergence theorem yields the so called 2nd Green's identity (4) p(V ■ {pV))ip - ip{V ■ (pV))p dx1... dxn m fi(pVip — ipVp) ■ v dS, s where V ■ (p V) means the formal scalar product of the two vector valued differential operators. Proof of the Green's identities. The first claim follows by applying (2) to X = pS/ip, where p and ip are differentiable functions and S7ip is the gradient of ip. Indeed, ix^K" = p{Vip ■ v)dS div X = pAip + Vp ■ Vip, where the dot in the second term denotes the scalar product of the two gradients. Let us also notice that the scalar product S7ip -v is just the derivative of ip in the direction of the oriented unit normal v. The second identity is computed the same way and the two terms with the scalar products of two gradients cancel each other. The reader should check the details. □ 641 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Remark. A special case of the 2nd Green's identity is worth mentioning. Namely, if fi = 1 and both t/j and p vanish on the boundary dfi, we obtain pAifj — ipAp dxi... dxn = 0. n This means that the Laplace operator is self-adjoint with respect to the L2 scalar product on such functions. Proof of the uniqueness. Assume ui and u2 are solutions of the Poisson equation on fi, thus u = u± — u2 is a solution of the homogeneous Laplace equation, Au = Alii - Au2 = F -F = 0. At same time, either u = ui — u2 = 0 on dfi or 4^ = 0 on dfi. Now we exploit the first Green's identity (3) with p ip = u, I (uAu + Vu ■ Vu) dxi... dxn = I Jn J as 9,1 JQ u—db. In Jan dn In both problems, Dirichlet or Neumann, the right hand side vanishes. The first term in the left hand integrand vanishes, too. We conclude || Vu||2 dxi... dxn = 0, in but this is possible only if Vu = 0 since the integrand is continuous. Thus, u = u1 — u2 is constant. But if we solve a Dirichlet problem, then ui and u2 coinside on the boundary and thus they are equal. □ 9.2.11. Well posed problems. Consider the Cauchy boundary problem for uxx + uyy = Q,dfi given by y = 0 and p(x) = u(x, 0) = Aa sin ax ip(x) = uy(x, 0) = Ba sinaa; with the scalar coefficients Aa and Ba depending on the chosen frequency a. Simple inspection reveals, that we can find such a solution within the result from the separation method: 1 u (x, y) = (Aa cosh ay H--Ba sinh ay) sin ax. a Now, choose Ba = 0 and Aa = ^, i.e. u(x,y) = — cosh ay sinaa;. a Obviously, when moving a towards infinity, the Cauchy boundary conditions can become arbitrarily small and still small change of Ba causes arbitrarily big increase of the values of u in any close vicinity of the line y = 0. Imagine, the equation describes some physical process and the boundary conditions reflect some measurements, including some periodic small errors. The results will be horribly instable with respect to these errors in the derivatives. We should admit that the problem is in some sense ill-posed, even locally. This motivates the following definition. 642 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Well-posed and ill-posed boundary problems The problem Lu = F on the domain fl with boundary conditions on dfl is called well-posed if all three conditions hold true: (1) The boundary problem has got a solution u (a classical solution means u is twice continuously differentiable); (2) the solution u is unique; (3) the solution is stable with respect to initial data, i.e. "small" change of the boundary conditions results in a "small" change of the solution. The problem is called ill-posed, if any of the above conditions fails. Usualy, the stability in the third condition means that the solution is continuously dependent on the boundary conditions in a suitable topology on the chosen space of functions. Also the uniqueness required in the second condition has to be taken reasonably. For instance, only uniquenes up to some additive constant makes sense for the Neumann problems. 9.2.12. Quasilinear equations. Now we exploit our experience and focus on the (local) Cauchy type problems for equations of arbitrary order. Similarly to the ODEs, we shall deal with problems, where the highest order derivatives are prescribed (more or less) explicitly and the initial conditions are given on a hypersurface up to the order k — 1. Some notation will be useful. We shall use the multi-indices to express multivariate plynomials and derivatives, cf. 8.1.15. Further we shall write Vfeu = {dau; \a\ = k} for the vector of all derivatives of order k. In particular, Vu means again the gradient vector of u. Quasi-linear PDEs For unknown scalar function u on a domain fl C R™ we prescribe its derivatives (1) ^aa(x,u,.. ^V^ujdaU = b(x,u,Vfe_1u), \a\=k where b and aa are functions on the tubular domain fl x RN, accomodating all the derivatives, with at least one of aa nonzero. We call such equations the (scalar) quasilinear partial differential equations (PDE) of order r. We call (1) semilinear if all aa do not depend on u and its derivatives (thus all the non-linearity hides in b). The principal symbol of a semi-linear PDE of order k is the symmetric fc-linear form P on fl, P(x) : (Rn')k —> R, P(x,t,...,$= 5>Q(x)r- For instance, the Poisson equation Au = f(x, y, u, Vu) on R2 is a semi-linear equation and its principal symbol is the positive definite quadratic form P((, n) = (2 + if, independent of (x, y). 643 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS The diffusion equation ^ = Au on R3 has got the symbol P(t, £, v) — C2 + V2> i-e- a positive semi-definite qua-dratic form, while the wave equation Du = — = 0 has got the indefinite symbol P(t, Q = r2 — (2 on R2. We shall focus on the scalar equations and reduce the problem to a special situation which allows a further reduction to a system of first order equations (quite similarly to the ODE theory). Thus we extend the previous definition to systems of equations. Notice, these are systems of the first kind mentioned in 9.2.4. Systems of quasi-linear PDEs A system of quasi-linear PDEs determines a vector valued function u : fl c R™ —> Rm, subject to the vector equation (2) A(x, u,..., Vfe_1u) ■ Vfeu = b(x, u,..., Vfe_1u). Here A is a matrix of type m x M with functions ajiQ : flxKNas entries, M = (n+£_1) is the number of k-combinations with repetition from n objects, Vfeu is the vector of vectors of all the fcth-order derivatives of the components of u, b : fl x RN —> Rm, and ■ means the scalar products of the individual rows in A with the vectors VkUi of the individual components of u, matching the individual components in b. 9.2.13. Cauchy data. Next, we have to clarify the boundary condition data. Let us consider a domain U C R™ and a smooth hypersurface r C U, e.g. r given by an implicit equation f(x1,...,xn) = 0 locally. Consider the unit normal vector u(x) at each point x e T (i.e. v = V/ if given implicitly). We would like to find minimal data along r determining a solution of 9.2.12(1), at least locally around a given point. To make things easy, let us first assume that r is prescribed by xn = 0. Then v(x) = (0,..., 0,1) at all x e r and knowing the restriction of u to r, we also now all derivatives da with a = (qi, ..., qn-i, 0), 0 < \a\. Thus, we have to choose reasonably differentiable functions cj on r, j' = 0,..., k — 1, and posit for all j dau(x) = Cj(x), a = (0,..., 0, j), x e T. All the other derivatives 9a« on f, 0 < \a\ < oo with a„ < k axe, computed inductively by the symmetry of partial derivatives. Moreover, if a(o,...,o,fc) 7^ 0, we can establish the remaining fc-th order derivative by means of the equation 9.2.12(1) and hope to be able to continue inductively. Indeed, writing a — a(o,...,o,k)(x,u, ■ ■ ■, Vfe_1(u)) 7^ 0 (and similarly leaving out the arguments of the other functions aa), the equation 9.2.12(1) can be rewritten as (1) -fa—ku=-(- Yl a^u + bix,^...,^-1^). n \a\=k,an^:k Now, on r we can use the already known derivatives to compute directly all the dau with an < k + 1. But differentiating 644 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS the latter equation by we obtain the missing derivative of order k + 1 from the known quantities on the right-hand side. By induction, we obtain all the derivatives, as requested. In the general situation we can iterate the derivative P>l(x)u °f u m the direction of the unit normal vector v to the hypersurface F\ Cauchy data for scalar PDE The (smooth or analytic) Cauchy data for the fcth order quasi-linear PDE 9.2.12(1) consist of a hypersurface F c U andfc (smooth or analytic) functions Cj, 0 < j < k—1, prescribing the derivatives in the normal directions to r (2) {Dl(x)yu{x) = c3{x), xeF A normal direction v(x), x e r, is called characteristic for the given Cauchy data, if (3) aa{x,u,...,SJk-1u)v{x)a = 0. \a\=k The Cauchy data are called non-characteristic if there are no characteristic normals to r. Notice the situation simplifies for the semi-linear equations. Then the characteristic directions do not depend on the chosen functions Cj from the Caychy data and they are directly related to the properties of the principal symbol of the equation. In the case of the hyperplane r = {xn = 0} treated above, the Cauchy data are non-characteristic if and only if a(o,...,o,fc) 7^ 0. For instance, semi-linear equations of first order always admit characteristic directions since their principal symbols are linear forms and so they must have non-trivial kernels (hy-perplanes of characteristic directions). In the three second order examples of the Laplace equation, diffusion equation, and wave equation very different phenomena occur. Since the symbol of the Laplace equation is a positive definite quadratic form, characteristic directions can never appear, independently of our choice of r. On the contrary, there are always non-trivial characteristic directions in the other two cases. Characteristic cones of semi-linear PDEs The characteristic directions of a semi-linear PDE on a domain n C K™ generate the characteristic cone C(x) (ZTxfl in the tangent bundle, C(x) = {£ e TXQ; P(x)(t,...,t) = 0}. The Cauchy data on a hypersurface r are non-characteristic if and only if (TT)1- DC = {0}, i.e. the orthogonal complements to the tangent spaces to r with respect to the standard scalar product on R™ never meet the characteristic cone. Notice, cones for linear forms are hyperplanes in the tangent space, quadratic cones appear with second order, etc. The tangent vectors to characteristics of the first order quasi-linear equations (as introduced in 9.2.2) are orthogonal to the characteristic normals. We have learned that the first order equations propagate the solutions along the characteristic 645 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS lines and so we are not free to prescribe the Cauchy data for the solution in such a case. 9.2.14. Cauchy-Kovalevskaya Theorem. As seen so many times already, the analytic mappings are very rigid and most questions related to them boil down to some estimates and smart combinatorial ideas. It is time to remind what happens for analytic equations and Cauchy data in the very special case of theODEs. For a single scalar autonomous ODE of first order, the Cauchy data consist of a single point "hypersurface'T = {x} in n C K and the value u(x). In particular, the Cauchy data are always non-characteristic in dimension one. Already in 6.2.15 we gave a complete proof that the induced derivatives of u provide a converging power series and thus the only solution, on certain neighborhood of x. In 8.3.13 we extended the same proof to autonomous systems of ODEs, which verified the same phenomenon for general systems of ODEs of any order k. Here the Cauchy data again consist of the only point in r and all derivatives of u of orders less than k (and again, they are always non-characteristic). In subsequent paragraphs we shall comment on how to extend the ODE proof to the following very famous theorem. In particular, the statement says that we have to expect general solutions to fcth order scalar equations in n variables to depend on k independent functions of n — 1 variables. This is in accordance with our experience from simple examples. Cauchy-Kovalevskaya theorem Theorem. The analytic Cauchy problem consisting of quasi-linear equation 9.2.12(1) with analytic coefficients and right hand side, and analytic non-characteristic Cauchy data 9.2.13(2) has got a unique analytic solution on a neighborhood of each point in T. Notice that we have computed explicitly the formal power series for the solution (by an inductive procedure) for the special case when T is denned by xn = 0. In this case, the theorem claims that this formal series always converges with non-trivial radius of convergence. The full proof is very technical and we do not have space to bother the readers with all details. In the next paragraphs, we shall provide indications toward the steps in the proof. If the track (or interest) will be lost, the reader should rather jump to 9.2.18. 9.2.15. Flattening the Cauchy data. The first step in the proof is to transform the non-characteristic data to the "flat" hypersurface T discussed in the beginning of 9.2.13. Remind that for such T the non-characteristic condition in 9.2.13(3) reads a(o,...,o,fc) 7^ 0. Let us start with the general equation and its analytic Cauchy data on an analytic let" (we omit the arguments 646 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS of all the functions and I = 0,..., k — 1) (ft u (1) ^ aadau = b, -^-j(x) = Ci(x), x e T. \a\=k We shall work locally around some unspecified fixed point in r. Since r is an analytic hypersurface in R™, there are new local coordinates y = \P(x), such that r = {x; Vn{x) = 0}. Moreover,

dv d$ — du dyn oyn 0 and (perhaps big) constant C such that -^\\daasri\r^ < C, ±\dabs\r^ < C, and thus \daasrt\ < C|a|!r-IQI, \dabs\ < C\a\\r-^, for all coefficients and multiindices a. In particular, all the coefficients can be majorized by the function Cr h(x1,. . . ^-i^o, . . .,VN) = - „_!-=^v-' Now, the majorizing system for the vector (Vb,..., Vjv) is |r = £ n£hJLVr+h,s = 0,...,N. ™ ^ at ■ i 2 0ls«e--- domain 17 c R™ with one of the usual boundary conditions. Thus consider a general linear operator written in coordinates (xi,..., xn). If we consider any other coordinate system y = where A is the Laplace operator A = ^2 + ■ ■ ■ + 0 a real constant. The operator L lives on domains in Rn+1. Let us first return to the 2D wave equation utt = at k > 0, the diffusion equation is considered on domains in Rn x R. Again, let us have a look at the simplest ID diffusion equation ut = kuxx. It describes the diffusion process in a one-dimensional object with diffusivity k (assumed to be constant here) in time. First of all, let us notice that the usual boundary value presription of the state at time t = 0 is not matching the assumption of the Cauchy-Kovalevskaya theroem. Indeed, taking r = {t = 0}, the normal direction vector ■§? is characteristic. 652 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS The intuition related to the expectation on diffusion prob-,, lems suggests that Dirichlet boundary data should suf-[^» fice (we just need the inicial state and the diffusion r\£^ then does the rest), or we can combine them with some Neumann data (if we supply heat at some parts of the boundary). Moreover, the process should not be reversible in time, so we should not expect that the solution would extend accross the line t = 0. Let us look at a classical example considered already by Kovalevskaya. Posit u(0,x)=g(x) = on a neighborhood of the origin (perfect analytic boundary data and equation), and expect u is a solution of ut = uxx in the form u(t, x) = J2k,e>o C 0, once we find the inverse Fourier image of the Gaussian /(£) = e~Kt$ . But Fourier images T(f) of Gaussians / are again Gaussians, up to constant, see ??, with any real constant a > 0. Thus, we can write for t > 0 u(Z,t) = H 0+ is exactly the function ip, as expected. We shall come back to such convolution based principles a few pages later, after investigating simpler methods. 9.2.22. Superposition of the solutions. A general idea to solve boundary value problems is to take a good supply of general solutions and try to take lin-^t-^_ ear combination of even infinite many of them. This means we consider the solution in a form of a series. The type of the series is governed by the available solutions. Let us illustrate the method on the diffusion equation discussed above. Imagine we want to model the temperature of a a homogeneous bar of length d. Initially, at time t = 0, the temperature at all points x is zero. At one of its ends we keep the temperature zero, while the other end will be heated with some constant intensity. Set the bar as the interval x G [0, d] c K, and the domain Q = [0, d] x [0, oo). Our boundary problem is d (1) ut = kuxx, u(x, 0) = 0, u(0, t) = 0, T^u(d, t) = p, where p is a constant representing the effect of the heating. The idea is to exploit the general solutions 2 , u(x, t) = (A cos ax + B sin ax) ea from 9.2.6 with free parameters a, A, and B. We want to consider a superposition of such solutions with properly chosen parameters and get the solution to our boundary problem in the form combining Fourier series terms wit the exponentials. This approach is often called the Fourier method. The condition u(0,t) = 0 suggests to restrict ourselves to A = 0. Then, ux (x, t) = Ba cos(aa;) e~Q Kt. It seems to be difficult now to guess how to combine such solutions, to get something constant in time, as the Neumann part boundary condition requests. But we can help with a small trick. 654 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS There are some further obvious solutions to the equation -those with u depending on the space coordinate only. We may consider v(x, t) = px and seek further our solution in the form u(x, t)+v(x). Then u must again be a solution of the same diffusion equation (1), but the boundary conditions change to u(x, 0) = — px, u(0,t) = 0, £u(d,t) = 0. Now, we want ux(x, t) = Bacos(ax) e~a'iKt = 0, i.e. we should restrict to frequencies a = ^nir, with odd non-negative integers n. This has settled the second of the boundary condition. The remaining one is u(x, 0) = — px which sets the condition on the coefficients B in the superposition 2^ B2k+1 sin i -—- i = -px k>0 ^ ' on the interval x e [0, d\. This is a simple task of finding the Fourier series of the function x, which we handled in 7.1.10. Combining all this, we get the requested solution u(x, t) to our problem: - *P** E sin(^) e"*12^ • fc>0 Even though our supply of general solutions was not big, superposing countably many of them helped us to solve our problem. Notice the behavior at the heated end. If t —> oo, then the all exponential terms in the sum vanish faster than the very first one, the sine terms are bounded, and thus the entire component with the sum vanishes quite fast. Thus, for big t, the heated end will increase its temperature nearly linearly with the speed p. 9.2.23. Separation in transformed coordinates. As we have seen several times, it is very useful to view a given equation rather as an inpendent object expressed in some particular coordinates. The practical problems mostly include some symmetries and then we should like to find some suitable coordinates in order to see the equation in some simple form. As an example, let us look at the Laplace operator A in the polar coordinates in the plane, and cylindrical or spherical coordinates in the space. Writing as usual x = r cos p, y = r sin p for the polar transformation, the Laplace operator gets the neat form (1) 4= ii , Ii!_,il = Iifri),Iii V. ) qr2 r2 Qyj2 r dr r dr \ dr) r2 dip2 The reader should perform the tedious but straightforward computation. Similarly, (2) A - i-^-lV-^A + + =- W ^ ~ r dr V dr! ^ r2 dip2 ^ dz2 ' (2) A = JS-fr2-^-) H__K__^- H__i__^-(smib-Q-) ^ ^ r2 dry dr) r2 sin ip dip2 r2 si'ntpdtpK ^ dtp) in the cylindrical and spherical coordinates, respectively. 655 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Let us illustrate the use on the following problem. Imagine a twisted circular drum, whose rim suffers a small vertical displacement. We should model the stabilized position of the drumskin. Intuitively, we should describe the drumskin position by the 2D wave equation, but since we are interested in the state with vanishing, we actually take u as the vertical displacement in the interior of the unit circle, fl = {x2 + yy < 1} C R2 and request Au = 0, subjet to the Dirichlet boundary problem prescribing the vertical displacement u(x,y) = f(x,y) of the rim. Obviously, we want to consider the problem in the polar coordinates, where the boundary condition gets the neat form u(l, 0. We shall apply the separation of variables method to these data. Expecting the solution in the form u(r,p) = R(r)" + a2

0) $(" + p2$ = 0, r2R" + rR' + (a2r2 -02)R = O. The angular component equation has got the obvious solutions A cos /3p + B sin ftp, and again we have to restrict (3 to integers in order to get single-valued solutions. With (3 = m, the radial equation is the well known Bessel's ODE of order m (notice our equation gets the form we had in ?? once we substitute z = ar), with the general solution R(r) = CJm(ar) + DYm(ar), where Jm and Ym are the special Bessel functions of the first and second kinds. We have obtained a general solution which is very useful in practical problems, cf. ??. Non-homogeneous equations. Finally, we add a few comments on the non-homogeneous linear PDEs. , T Although we provide arguments for the claims, we - *| shall not go into technical details of proofs because of the lack of space. Still, we hope this limited insight will motivate the reader to seek for further sources to learn more. As always, facing a problem Lu = /, we have to find a single particular solution to this problem, and we may then add all solutions to the homogeneous problem Lu = 0. Thus, if we have to match say Dirichlet conditions u = g on the boundary dfi of a domain fi, and we know some solution w, i.e. Lw = / (not taking care of the boundary conditions), than we should find a solution v to the homogenenous Dirichlet problem with the boundary condition g — w^n. Clearly the sum u = v + w will solve our problem. In principle, we may always consider superpositions of known solutions as in the Fourier method above. We shall indicate a more conceptual and general approach now briefly. 657 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Let us come back to the ID diffusion equation and our solution of a homogeneous problem by means of the Fourier transform in 9.2.21. The solution of ut = kuxx with u(x, 0) = p is a convolution of the boundary values u(x, 0) with the heat kernel Now, the crucial observation is that u(x,t) = Q(x,t) is a solution to L(u) =ut — kuxx = 0 for all x and t > 0, while on neigborhood of the origin it behaves as the Dirac delta function in the variable x. (The first part is a matter of direct computation, the second one was revealed in 9.2.21 already.) The latter observation suggests, how to find the particular solutions to a non-homogeneous problem. Consider the integral of the convolution (2) u(x,t) = J^J Q(x-y,t-s)f(y,s)dy^ds. The derivative ut will have two terms. In the first one we differentiate with respect to the upper limit of the outer integral, while the other one is the derivative inside the integrals. The derivatives with respect to x are evaluated inside the integrals. Thus, in the evaluation of L = ^ — k-^j the terms inside of the integral cancel each other (remember Q is a solution for all x, and t > 0) and only the first term of ut survives. It seems obvious that this term is the evaluation of the integrand with s = t. Although, these values are not properly denned, we may verify this claim in terms of taking limit (t— s) -> 0+. This leads to lim / Q(x-y,t-s)f(y,s)dy = f(x,s). Thus, (2) is a particular solution and clearly u(x, 0) = 0. The solution of the general Dirichlet problem L(u) = /, u(x, 0) = p on fi = R x [0, oo) is u(x,t)= Q(x - y,t)p(y) dy + (3) Q(x -y,t- s)f(y, s) dy ) ds. o Let us summarize the achievements and try to get generalization to general dimensions. First, we can generalize the heat kernel function Q writing its nD variant depending on the distance r from the origin only. Consider the formula with x e R™ as the product of the ID heat kernels for each of the variables in x. 1 IMI2 (4) Q(x,t) = -==e--Gr . sJ(AllKt)n Then taking the n-dimensional (iterated) convolution of Q with the boundary condition p on the hyperplane t = 0 provides the solution candidate (5) u(x,t)= Q(x-y,t)p(y)dy1...dyn. 658 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Indeed, a straightforward (but tedious) computation reveals that Q is a solution to L(u) = 0 in all points (x, t) with t > 0, and Q behaves again as the Dirac delta at the origin. In particular (5) is a solution to the Dirichlet problem L(u) = 0, u(x, 0) = p and we can allso obtain the non-homogeneous solutions similarly to the ID case. 9.2.26. The Green's functions. The solutions to the (non-homogeneous) diffusion equation constructed in the last paragraph are built on a very simple idea - we find a solution G to our equation which is denned everywehere expcept in the origin and blows up in the origin at the speed making it into a Dirac delta function at the origin. A convolution with such kernel G is then a good candidate for solutions for. Let us try to mimic this approach for the Laplace and Poisson equations now. Actually, we shall modify the strategy by requesting 659 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS about 1 page to be finished - the spherical symmetric solution to Laplace => Green's function => solution to poisson similarly to the diffusion. 660 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 3. Remarks on Variational Calculus Many practical problems look for minima or maxima of real functions J : S —> R defined on some spaces of functions. In particular, many laws of nature can be expressed as certain "minimum principle" concerning some space of mappings. The basic idea is exactly the same as in the elementary differential calculus: we aim at finding the best linear approximations of J at fixed arguments u G 5, we recognize the critical points (those with vanishing linearization), and then we perhaps look at the quadratic approximations at the critical points. However, all these steps are far more intricate, need a lot of care, and may provide nasty surprises. 9.3.1. Simple examples first. If we know the sizes of tangent vectors to curves, we may ask what is the shortest distance between two points. In the plane R2, this means we have got a quadratic form g(x) = (gij(x)), 1 < i, j < 2, at each x e R2 and we want to integrate (the dots mean derivatives in time t, u(t) = u2(t)) are differentiable paths) (1) J(u)= j2^/g(u(ť))(ii(ť))dt Jt1 to get the distance between the two given points = (ui(íi),u2(íi)) = A and u(t2) = («1(^2),u2{t2)) = B. If the size of the vectors is just the Euclidean one, and we consider curves u (ť) = (t, v (ť)), i.e., graphs of functions of one variable, the length (1) becomes the well known formula (2) J{U)= / y/l+v(t)2dt. Quite certainly we all believe that the mimimum for fixed boundary values v(ti) and v(t2) must be a straight line. But so far, we have not formulated the problem itself. What is the space of curves we deal with? If we allowed non-continuous ones, then shorter paths are available! So we should aim at proving that the lines are the minimal curves among the continuous ones. Do we need them to be differentiable? In some sense we do, since the derivative appears in our formula for J, but we need to have the integrand defined only almost ev-erywehere. For example, this will be true for all Lipschitz curves. In general, g{u)(u) = gn{u)u\ + 2g12(u)u1ii2 + g22(u)u\. Such lengths of vectors are automatically inherited from the ambient Euclidean R3 on every hypersurface in the space. Thus, finding the minimum of J means finding the shortest track in a real terrain (with hills and valleys). If we choose a positive function a on R2 and consider g(x) = a(x)2 idR2, i.e., the Euclidean size of vectors scaled by ot(x) > 0 at each point x e R2, we obtain (3) J(u)= [ 2a{t,v{t))^l + v{tfdt. We can imagine the speed 1/q of a moving particle (or light) in the plane depends of the values of a (the smaller is a, the 661 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS bigger is the speed) and our problem will be to find the shortest path in terms of the time necessary to pass from A to B. As a warm up, consider a = 1 in the entire plane, except the vertical strip V = {(t, y); t e [a,a + b]} where a = N and take A = (0,0), B = (a+b, c), a,b,c> 0. Wecanimag-ine V is a lake, you have to get from A to B by running and swimming, and you are swimming N times slower than running. If we believe that the straight lines are the minimizers for constant a, then it is clear that we have to find the optimal point P = (a, p) on the bank of the lake where we start swimming. The total time T(p) will then be (s is our actual speed when running straight) \AP\ \PB\ J-- + s/N J (VP2 + a2 + AV(c-p)2 + &2) and we want to find the minimum of T(p). The critical point is given by P Ar C-P _, • M ■ I —P== = iv —-======== => sin ip = iv sin tp, VP2 + a2 V(c-p)2 + &2 where p is the angle betwen our running track and the normal to the boundary of V, while ip is the angle between our swimming track and the normal to the boundary (draw a picture yourself!). Thus we have recovered the famous Snell law of light diffraction saying that the proportion of the sine values of the angles is equal to the proportion of the speeds. (Of course, to finish the solution of the problem, the reader should find the solution p of the quartic equation and check that it is a minimum.) 9.3.2. Variational problems. We shall restrict our attention to the following class of problems. General first order variational problems Consider an open Riemann measurable set fi C K™, the space C1 (i?) of all differentiable mappings u : fl —> Rm, a C2 function F = F(x, y,p) :8"xRmx Rnm and set the functional (1) J{u)= I F(xXz),£>Vz))a^, Jn i.e., J(u) is computed as the ordinary integral of aRiemann integrable function j(x) = F(x,u(x), D1u(x)) where D1 u is the Jacobi matrix (the differential) of u. The function -F is called the Lagrangian of the variational problem and our task is to find the minimum of J and the corresponding minimizer u with prescribed boundary values u on the boundary dfl (and perhaps some further conditions restricting u). Mostly we shall restrict ourselves to the case n = m = l, like in the previous paragraph, where u is a real differentiable function denned on an interval (ti, t2) and the function F = F(t,y,p) : R3 ->• R, (2) J(u) = f F(t,u(t),u(t))dt. Jn 662 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS We saw F = sj\ + p2, F = a{t)\J\ + p2 in the previous paragraph. If we take F = y\J\ + p2, the functional J computes the area of the rotational surface given by the graph of the function u (up to a constant multiple). In all cases we may set the boundary values u(ti) and u(t2). Actually, our differentiability assumptions are too strict as we saw already in our last example above, where F was dif-ferentiable except of the boundary of the lake V. We can easily extend our space of functions to piecewise differentiable u and request F(t, u(t),u(t)) to be piecewise differentiable for all such u's (as always, picewise differentiable means the one-side derivatives exist at all points). A maybe shocking example is the following functional: (3) J(u) = [ (u(t)2 - l)2 dt Jo on piece-wise differentiable functions on [0,1] (i.e. F is the neat polynomial (p2 — l)2). Clearly, J(u) > 0 for all u and if we set u(0) = = 0, then any zig-zag piecewise linear function u with derivatives ±1 satisfying the boundary conditions achieves the zero minimum. At the same time, there is no minimum among the differentiable functions u (find a quick proof of that!), but we can approximate any of the zigzag minima by smooth ones at any precision. 9.3.3. More examples. Let us develop a general method how to find the analogy to the critical points form the elementary calculus here. We shall find the necessary steps dealing with a specific set of problems in this paragraph. Let us work with the Lagrangian generalizing the previous examples: (1) F(t,y,p) =yVl+P2 r > 0, and write Ft, Fy, Fp, etc., for the corresponding partial derivatives. Consider the variational problem on an interval I = (h, t2) with fixed boundary conditions u(ti) and u(t2) and assume u e C2 (7), u(t) > 0. Let us consider any differentiable v on I with v(ti) = v(t2) = 0 (or even better v with compact support inside of 7). Then u + Sv fulfills the boundary conditions for all small real <5's and consider J(u + Sv) = [ 2F{t,u{t) + Sv{t),u{t) + Sv{t)). Jt! Of course, the necessary condition for u being a critical point must be \0J(U + Sv) = 0, i.e., (remind the derivative with respect to a parameter can be swapped with the integration) (2) 0= / Fy(t,u(t),u(t))v(t) + Fp(t,u(t),u(t))v(t)dt. Jt! Integrating the second term in (2) per partes immediately yields (remember v(ti) = v(t2) = 0) 0 = jf \Fy{t,u{t),u{t))v{t) - jtFp{t,u{t),u{t)))v{t)dt. 663 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS (3) —Fp(t,u(t),u(t))=Fy(t,u(t),u(t)). This condition will be certainly satisfied if the so called Euler equation holds true for u (we prove this is a necessary condition in lemma 9.3.6) d It An equivalent form of this equation for ii(t) =^ 0 is (we omit the arguments t of u and ii) (4) Ft(t,u,u) = jt(F{t,u,ii) -uFp(t,u,uj). Inourcaseof_F(f,y,p) = yr(l+p2)1/2,Ft vanishes identically, Fp = yrp(l + p2)-1/2 and thus, if we further assume r / 0, ti > 0, the term in the bracket has to be a positive constant C: C7 = ur(lW)1/2-uuru(l+u2)-1/2 = ur(l+u2)-1/2. We have arrived at the differential equation (5) u = C(l + u2)1/2r which we are going to solve. Consider the transformation ii = tan r, i.e., u = C(l + (tanr)2)1/21- = C(cos r)-1/1-, and so du = (cos t)~1It tanrdr. Consequently, dt = idu = £(cosr)_1/r(ir and by integration we arrive at the very useful parametrization of the solutions by the parameter r (which is actually the slope of the tangent to the solution graph): (6) t = t0 + — [ (coss-^r)ds u = C(cosr)-1/r. r Jo Now, we can summarize the result for several interest-|^ mg values of r. First, if r = 0 (which r ^W^wSv we excluded on the way), then the Euler equation (3) reads u{\ + u2)-3'2 = 0, which implies ii = 0 and thus the potential minimizers should be straight lines as expected. (Notice that we have not proved yet that the Euler equation is indeed a necessary condition, we shall come to that in the next paragraphs.) For general r/0, the Euler equation (3) tells (a straightforward computation!) 1 + ii2 u = r- u and thus the sign of the second derivative coincides with the sign of r. In particular, the potential minimizers are always concave functions (if r < 0) or convex (if r > 0). If r = —1, the parametrization (6) leads to (an easy integration!) (7) t = to — C sin r, u = C cos r, thus for r G [—tt/2, tt/2] our solutions are half-circles with radius C in the upper halfplane, centred at (to, 0). For r = —1/2, the solution is C C (8) t = t0 - — (2r + sin2r), — (l + cos2r) 664 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS which is a parametric description of a fixed point on a circle with diameter C rolling along the t axis, the so called cycloid curve. Now, r e [—7r/2,7r/2] provides t running from to + \C-k to to — \C-k, while u is zero in the point to±\Cn and reaches the highest point at t = to. (Draw pictures!) Next, look at r = 1/2. Another quick integration reveals f = fo + 2C tanr = i0 + 2Cw, and we can compute ii and substitute into (5) to obtain u = C+-^(t-t0)2. Thus the potential minimizers are parabolas with the axis of symmetry t = to. If we fix A = (0,1) and a fo, there are two relevant choices C = 5 (1 ± \/l — i2,) whenever \ t0 < 1 (and no options for |fo I > 1). The two parabolas will have two points of intersection, A and another point B. Clearly only one of them should be the actual minimizer. Moreover, the reader could try to prove that the parabola u = \t2 touches all of them and has them all on the left (this is the so called envelope of all the family of parabolas). Thus, there will be no potential minimizer joining the point A = (1, 0) to an arbitrary point on the right of the parabola u = \t2. The last case we come to is r = 1, i.e., the case of the area of the surface of the rotational body drawn by the graph. Here we better use another parametrization of the slope of the tangent, we set ii = sinh r. A very similar computation as above then immediately leads to t = 10 + *f /J" cosh s ds and we arrive at the result6 (9) u(i) = Ccosh: c 9.3.4. Critical points of functionals. Now we shall devel-ope a bit of theory verifying that the steps done in the previous examples realy provided necessary conditions for solutions of the variational ^» problems. In order to underline the essential features, we shall first introduce the basic tools in the realm of general normed vector spaces, see 7.3.1. The spaces of piece-wise differentiable functions on an interval with the Lp norms can serve as typical examples. We shall deal with mappings T : S —> R called (real) functionals. The first differential Let 5 be a vector space equipped with a norm || ||. A continuous linear mapping L : S —> R is called a continuous linear functional. A functional T : S —> R is said to have the differential Du^aXSi point u e S if there is a continuous linear functional L such that (1) Inn + W = 0. •u-s-0 \\v\\ Some more details on the set of examples of this paragraph can be found in the article "Elementary Introduction to the Calculus of Variations" by Magnus R. Hestenes, Mathematics Magazine, Vol. 23, No. 5 (May - Jun., 1950), pp. 249-267. 665 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS In the very special case of the Euclidean 5 = R™, we have recovered the standard definition of the differential, cf. 8.1.7 (just notice that all linear functionals are continuous on a finite dimensional vector space). Again, the differential is computed via the directional derivatives.7 Indeed, if (1) holds true, than for each fixed v e S ,™ c-r-/ w i ,. T(u + tv)—T(u) d (2) SFtuYv) = hm —i-'--= — F{u+tv) t-s-o t dt\o exists and L(y) = ST(u)(y). We call ST(u) the variation of the functional T atu. A point u e S is called a critical point if ST(u) = 0. We say that T has got a local minimum at u if there is an open neighborhood U of u such that T(w) > T(u) for all w e U. Similarly, we define local maxima and talk about local extrema. If u is an extreme of T, then in particular t = 0 must be an extreme of the function T(u + tv) of one real variable t, where v is arbitrary. Thus the extremes have to be at critical points, if the variations exist. Next, let us assume the variations exist at all points in a neighborhood of a critical point u e S. Then, again exactly as in the elementary calculus, considering two increments v,w e S we consider the limit (3) SzT(u){v,w) = hm —--'--^- ' '. If the limits exist for all u, v, then clearly S2T(u) is a bilinear mapping. Then, S2T(u)(w, w) is a quadratic form which we can consider as a second order approximation of T at u. We call it the second variation of T. Moreover, again as in the elementary calculus, <52 ^(m) (w, w) = ^(u+tw), if the second variation exists. We may summarize: Theorem. Let T : S —> R be a functional with a local extreme in u £ 5. If the variation 8F{u) exists, then it has to vanish. If the second variation 82F{u) exists (thus in particular, exists on a neighborhood of u), then 82J7(u)(w,w) > Ofor a minimum, while 82F{u) (w,w) < Ofor a maximum. Proof. Assume T has got a local minimum at u. We have already seen, f(f) = T(u + tv) has to achieve a local minimum for each v at t = 0. Thus /'(0) = 0 if f(t) is differentiable, and so ST(u) vanishes. Now assume S2T(u)(w, w) = /"(0) = r < 0 for some w. Then the mean value theorem implies f(t)-f(0) = f'{c)t = -{f'{c)-f'(0))ct c forsomef > c> 0. Thus, fori small enough/(f)-/(0) < 0 which contradicts /(0) being a local minimum. The claim for maximum follows analogously (or we may apply the already proved result to the functional —T). □ In functional analysis, this directional derivative is usually called the Gateaux differential, while the continuous functional L satisfying (1) is usually called the Frechet differential, going back to two of the founders of functional analysis from the beginning of the 20th century. 666 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Corollary. On top of all assumptions of the above theorem suppose F{v + tw) is 2 times differentiable at t = 0 and 82F{v) (w,w) > Ofor all v in a neighborhood of the critical point u and w G 5. Then T has got a minimum at u. Proof. As before we consider f(t) = T(u + tw), w = z — u. Thus, for some 0 < c < 1 Hz) - m = f(i) - /(o) = /'(o) + \f"{C) = —82J-(u + cw)(w,w) > 0. □ Remark. Actually, the condition from the collolary is far too strong in infinite dimensional spaces. It is possible to replace it by the condition S2T continuous at u and S2T(u) (w,w) > C\\w\\ for some real constant C > 0 just in the critical point u. In the finite dimensional case, this is equivalent to the requirement S2T continuous and positive definite. 9.3.5. Back to variational problems. As we already noticed, the answer to a variational problem minimizing a functional (we omit the arguments t of the unknown function u) (1) J(u) = / F{t,u,ii)dt depends very much on the boundary conditions and the space of functions we deal with. If we posit u(ti) = A, u(t2) = B with arbitrary A, B e K we may deal with spaces of differentiable or piecewise differentiable functions satisfying these boundary conditions. But these subspaces will not be vector spaces any more. Thus, strictly speaking, we cannot apply the concepts from the previous paragraph here. However, we may fix any differentiable function v on [t1,t2] satisfying v(ti) = A, v(t2) = B, e.g. v(t) = A + (B - A) j^zj-, and replace the functional J by J(u) = J(u + v) = / F(t,u + v,ii + v)dt. Now, the intitial problem transforms to one with boundary conditions u(ti) = u(t2) = 0 and computing the variations jgj[u + Sw) = -^J(u + v + Sw) does not change, i.e. we have to request w(ti) = w(t2) = 0 and we differentiate in a vector space. Essentially, we just exploit the natural affine structures on the subspaces of functions defined by the general boundary conditions and thus the derivatives have to live in their modeling vector subspaces. 667 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS The first and second variations Corollary. Let F(t,y,p) be a twice differentiable Lagrangian and consider the variational problem of finding a minimum of the functional (1) on the space of differentiable functions u g C1^!,^] with boundary conditions u(ti) = A, u(t2) = B. Then the first and second variations exist and can be computed for all v g S = {v g C1 [ti,t2]; v(ti) = v(t2) = 0}, as follows: rt2 (2) 8J(u)(v)= {Fy{t,u,u)v + Fp{t,u,u)v)dt rt2 52J(u)(v,v)= (Fyy(t,u,u)v2+ (3) Jt! 2Fyp(t, u, u)vv + Fpp(t, u, u)v2) dt. If u is a local minimum of the variational problem, then SJ(u)(y) = 0 for all v g S, while S2J(u)(y,v) > 0 for all v in a neighborhood of the origin in S. Proof. Thanks to our strong assumptions on the differentiability of F, u, and v, we may differentiate the real function f(t) = J(u + tv) at t = 0 swapping the integral and the derivative. This immediately provides both formulae. The remaining two claims are straightforward consequences of the theorem and corollary in the previous paragraph 9.3.4. □ 9.3.6. Euler-Lagrange equations. We are following the path which we already tried when discussing our first bunch of examples in 9.3.3. Our next step was to guess the consequences of vanishing of the first variation in terms of a differential equation. Now we complete the arguments. We start with a simple result called the fundamental lemma of the calculus of variation. Lemma. Assume u is a continuous function on the interval [ii, *2] and for all compactly supported smooth p g c?[hM u(t) 0. Due to the continuity, u(t) > u(c)/2 > 0 on a neighborhood (c — s,c + s) c (t1, t2), s > 0. Next, remind the smooth variants of indicator functions constructed in 6.1.6. For every pair of positive numbers 0 < e < r, we constructed a function r + e, and 0 < -u(c)2(r - e) > 0. Jt! 2 668 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS If we find some negative value u(c), the same argumentation finishes the proof (notice there is no need to consider the boundary points due to the continuity of u). □ Euler-Lagrange equations Theorem. Consider a twice differentiable Lagrangian F(t,y,p) on [ti,t2] x R2 and a differentiable critical point u of the functional J{u) = J*t*2 F(t,u,u) dt with fixed boundary values u(t{), u(t2). Then u is a solution of the differential equation (1) Fy(t,u,u)-^-Fp(t,u,u) = 0. Notice that the derivative in the second term of the Euler-Lagrange equation means the so called total derivative, i.e. we should differentiate the composed mapping via the chain rule. This can be a problem, if we do not assume u to be twice differentiable. Proof. We already know that vanishing of the first variation SJ(u) is a necessary condition for u being a critical point. Thus we can start with the equality (2) in the previous paragraph 9.3.5 and compute by integrating per partes: 0= / [Fy(t,u,u)v + Fv(t,u,u)v) dt Jti (2) ft2 d = j (Fy(t,u,u) - -^j.Fv{t,u,u))vdt + Fp(t, u, u)v]t=t2 - Fp(t, u, u)v\t=tl. Finally we exploit the above fundamental lemma of the calculus of variation with arbitrary smooth test functions v with compact supports inside (ti,t2). Thus the last term with boundary values vanishes and by the lemma, the Euler-Lagrange equation has to hold true for u. □ 9.3.7. Remarks. We made our life comfortable by taking very strong differentiablity assumptions in the theorem above. In the last century, there was a lot effort to get much more general results with weaker assumptions. This is really important in practice where we need to deal with piece-wise differentiable extremals. On the other hand, we need even twice differentiable critical points in order to write down the Euler-Lagrange equation explicitely. Another difficult point is to recognize which of the critical points are the minima or maxima of the functional. We saw that the second variation is a very specific quadratic functional, see 9.3.5(3), and there is a rich theory dealing with its properties. We do not have time to into details here, but we mention just one simple necessary condition for the extreme (with a bit tricky proof), to get feeling about the topic). Lemma (Lagrange necessary condition). Consider a twice differentiable Lagrangian F(t,y,p) on [ti,t2] x R2 and a differentiable critical point u of the functional J{u) = J^2 F(t, u, u) dt with fixed boundary values u(ti), u(t2). If 669 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS u is a local minimum of J, then Fpp(t,u(t),ü(t)) > Oonthe entire interval [ti,t2]. Proof. Assume there is t0 e (ti,t2) such that Fpp(t0,u(t0),u(t0)) = -[i < 0. Similarly as ■^'■'^f in the proof of of the fundamental lemma in the previous paragraph, we choose s > 0 so that Fpp(t, u, ii) < — |/i on (t0 — s,t0 + s), and a smooth analog of an indicator function ip = petT, centered at t0 and satisfying r + e < s. Then the second variation evaluated on v(t) = ap(t0 + ^) with some a > 0 can be estimated as follows (we use a constant C > \Fyy(t u,u) C > \Fyp(t, uu) I on the entire interval, the fact that v = p and the derivative \2) dt < to+as (Ca2 + 2Ca\i>\) dt - \p \ v2 dt to+as = 2CsaA + 2Cal t0+s to —as to+s \p\dr—\pa \ p2 dr 2Csa3 +4Csa2 -\p{ to+s p2 dr)a. The integral on the right-hand side is strictly positive and, thus, the entire expression is negative if a is small enough. This is a contradiction and the proof is complete. □ 9.3.8. Special cases. Very often the Lagrangians do not depend on all variables and then the variations and the Euler-Lagrange equations get special forms. The following summary is a straigtforward consequence of the general equation 9.3.6(1), whose equivalent form we saw already in 9.3.3(4) 670 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Special forms of Lagrangians Case 1. If the Lagrangian is F(t, y), i.e., does not depend on the derivatives, then the Euler-Lagrange equation says (1) Fy(t,U)=0 which is an implicit equation for u(t). Moreover, the second variation is &2J(u)(v, v) = Jt*2 Fyy(t, u)v2 dt. Case 2. If the Langrangian is F(t,p), then the Euler-Lagrange equation is (2) ^-Fp(t,u) = 0 and its solutions are given by the first order differential equation Fp(t,u) = C with a constant parameter C. Moreover, the second variation is S2J(u)(v, v) = Jt*2 Fpp(t, u)vv2dt. Case 3. If the Langrangian is F(u, u), then there is a consequence of the Euler-Lagrange equation (for 11,0) (3) jt(F(u,u)-uFp(u,u))=Q which again reduces the equation to the first order including a free constant parameter. 9.3.9. Remarks on higher dimensional problems. 671 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.3.10. Problems with free boundary conditions. 672 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.3.11. Constrained and isoperimetric problems. 673 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS placeholder 674 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS placeholder 675 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 4. Complex Analytic Functions In the rest of the chapter, we shall look at the (single complex variable) functions denned on the complex plane C = R2. On many occasions we saw how helpful it was to extend objects from the real line into the complex plane. We provide a few glimpses into the rich classical theory, and we hope the readers will enjoy the use of it in the practical column. 9.4.1. Complex derivative. An open and connected subset fl C C is called a region, or a domain. A mapping / : fl —> C is called a complex function of a single complex variable. Working with complex numbers, we may repeat the definition of derivative: Complex derivative We say that a complex function / : fl —> C has the complex derivative f'(a) at a point a e fl, if the complex limit z^a z — a exists. We say that / is differentiate in the complex sense, or holomorphic, on fl, if its complex derivative f'(z) exists at each z e fl. Clearly, this definition restricts itself to the definition of the derivative of functions of one real variable along R C C, when restricting the denning limit to real z and a. We shall see that the existence of a complex derivative is much more restrictive than in the real case. The simplest example of a differentiable complex function is z i-> zn, n e N. Indeed, exactly as with the real polynomials, we compute (z + h)n-zn = h(nzn-1 + \n{n-l)zn-2h+--- + hn-1) and thus for all z e C we obtain the limit (znY = lim---- = nzn 1. h^o h By the very definition, the mapping / i-> /' is linear over the complex scalars and thus all polynomials f(z) are differentiable this way: n n — 1 f(z) = akzk m- /'(*) = J2(k+iH+i*fc- k=0 k=0 9.4.2. Analytic functions. A complex function / : fl —> C is called analytic in the region fl if for each a e fl, there is an open disc D = {\z — a\ R; Further, if the radius of convergence is R > 0, then f is differentiable (in complex sense) for\z — a\ < R,andits derivative equals the power series oo 71=1 obtained by differentiating f(z) term by term. Moreover, the power series representing f'(z) has the same radius of convergence as f(z). The open disc \z — a\ < R is called the disc of convergence of f(z) = ]Cr=o cn(z - a)n. Proof. LetL = i. Suppose \ z — a\ < R. We show that J2r^=o cn(z ~ a)n converges. Notice, we mean that L = 0 if R = oo, and so the statement is trivially true in this case. For our fixed z, there is e > 0 such that (L + e)\z — a\ < 1. Since L = lim sup^^^ cn | ", this means that cn |" < L + e, i.e. \cn\ < (L + e)n for sufficiently large n. Therefore, \cn\\z — a\n < (L + e)n\z — a\n and \cn\\z — a\n is majorized by the convergent geometric series Y^=o Pn wim p = (L + e)\z - a\. Therefore f(z) = Y^=o c™(z ~ aT converges. Now suppose z — a | > R. By the definition of lim sup, for any e > 0, there exists infinitely many cn satisfying \cn\ > (L-e)n. Choose e > 0 small enough such that (L — e)\z — a\ > 1. Then cn \ \ z — a\n > (L — e)n \ z — a\n for infinitely many n, and because (L — e) |z — a > 1, this implies that cn(z — a)n does not converge to 0 as n —> oo. Therefore J2r^=o °n(z — a)n diverges. Next, we move to the derivative. First, notice that/ (z) = J27=oc^(z^aT md 9(z) = E^=1ncrl(2;-a)rl-1 have the same radius of convergence because lim n~ = 1. 8Actually the opposite implication is true as well: A holomorhpic function on a domain £2 is analytic on £2. We shall not provide a full proof of this result, but we come close to it below. The reader may find the full argument at nearly all basic textbooks on complex analysis. 677 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Fix zq in the disc of convergence, so that \z0 — a\ < r < R for some value of r. Let Sn(z), En(z) be denned by jv oo sn(z) = J2cn(z-a)n, EN{z)= E cn{z-a)n. n=0 n=n+l We may think of Sn, which consists of the lower order terms of the power series, as the main term, and En as the error term. Notice that g(z) = Y^=i ncn(z ~ is the term- by-term derivative of f(z). We prove that f'(zo) = g(zo), which means hm----g(z0) = 0. Ti-t-0 h Thus, given any e > 0, we must show that there exists a S > 0 such that if 0 < \h\ < S, then the expression above has absolute value less than e. To do so, we break the expression into three parts and estimate each of those separately. More precisely, since f(z) = Sn(z) + En(z), and we know the derivative SN of the polynomial Sn, we write f(z0 + h)-f(z0) -h--g{Z0) = SN(z0 + h) - SN(z0) -SW(*o)) + h ( n f ^ , EN{z0 + h)-EN{z0) {SN(z0) - g(z0)) +---. We analyze the individual terms. The first term contains the main term and its derivative, which exists because Sn is a polynomial. Thus, this term approaches 0 as h —> 0. In other words, given | > 0, we can find S > 0 such that 0 < \h\ < S implies Sn(zq + h) - Sn(zq) h -Sjv(zo) The second term is SN(zo) — g(zo). Since SN(zo) —> g(zo) as —> oo (because we know that g(z) is a power series which converges absolutely for z in it's convergence disc centered at a, and SN(z) is the iV-th partial sum of this power series), this means that for | > 0, we can find some iVi such that if N > Nlt then IS^^o) - 5(«o)| < f ■ The third term is the most tricky one to estimate effectively. We can write oo EN(zo+h)-EN(zo) = E (cn(z0+h-a)n-cn(z0-a)n) n=N+l Expanding (z0 + h — a)n — (z0 — a)n = h((zo + h — a)n~1 + (z0 + h — a)n~2(z0 — a) H----+ (z0 — a)"-1), we obtain En(zo + h) - EN(z0) ^ , 1 -7j-= 2^ (cn((zo + h - a) + n=N+l (z0 + h- a)n-2(z0 - a) + ■ ■ ■ + (z0 - a)™"1). Observe, that for h sufficiently small, \zo + h — a\ < r as well as \z0 — a\ < r. Therefore, if we replace all terms by their 678 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS absolute values and apply the triangle inequality, we obtain: EN(z0 + h) - EN(z0) h n=N+l The series on the right converges, and furthermore, its value approaches 0 as TV —> oo. Indeed, 2~217=n+i \c„\nr just the tail of the series g(r) with absolute values on all of its individual terms, and we know that g(z) converges absolutely for \z — a\ < R. So the series in question does converge, and since it is the end of a convergent series, its terms must approach 0 as TV —> oo. Therefore, given | > 0, we can find such that for all sufficiently small h and N > N2, EN(z0 + h) - EN(z0) h e <3- Select now TV > max{7Vi, A^}. Then an application of the triangle inequality yields: Sn(zq + h)- Sn(zq) h \SN{z0) -g(z0)\ + &n(zo) + EN(z0 + h) - EN(z0) h < e □ 9.4.3. Corollaries. We can apply the above theorem any *i*JU number of times to obtain the following con-'-*£C"7>f** sequences. In particular, notice the straightfor-ward existence of the antiderivative which we shall link with integrals in the next subsections. Corollaries on the derivatives of power series Corollary. Consider any power series oo /(*) = £c„(*-a)n 71=0 with convergence radius R > 0 and write D for its convergence disk. (1) f(z) is infinitely (complex) differentiate in D and each of its k-th derivatives f(k\z) can be obtained by differentiating term-by-term k times. The resulting power series has radius of convergence again equal to R. (2) There exists the (complex) antiderivative oo 1 ^) = £^c«(*-a)n+1> 71=0 such that F'(z) = f(z) in the disc of convergence D, which is the same for both series. (3) The coefficients ck of the power series f(z) are Proof. All the claims are more or less obvious. Differentiating consecutively we see that / is infinitely differentiable at all z e D, as claimed in (1). Furthermore, we see that f{k)(z) = E^=fc■■■ (n-k+l)cn(z-a)n-k,which 679 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS in particular yields (2) with k = \. Finally, substituting z = a gives f1-^ (a) = kick, since all terms containing (z — a)n~k where n > k vanish at z = a. Therefore ck = k{ '. □ 9.4.4. Links to the real calculus. Each complex valued function / on fl can be viewed as a mapping (J' f : fl C R2 -> R2. If this mapping is differ-feslr:^ entiable in the real sense (i.e. all partial derivatives are continuous) we may write /(*) = f(a) + £>7(a)(* -a) + (z- a)a(z) for a e fl and z in a small neighborhood of a in fl, with D1j being the Jacobi matrix of first partial derivatives in the two real coordinates and limz^a a(z) = 0. Thus it is legitimate to question whether the real linear approximation D1f(a) is complex linear. Obviously this happens if and only if the complex derivative f'(z) exists. If f(z) = u(x + iy) + iv(x + iy) is the coordinate expression of a complex differentiable function / viewed as a differentiable mapping R2 —> R2, z = x + i y, then clearly |(s) = /'(*M. f(*) = /'(*)- Thus, + i |^ = i (|f + i g) and we have arrived at the sufficient and necessary conditions for D1f(z) being complex linear (1) UX=Vy, Uy = -VX. Yet another argument goes as follows: the rank two matrix describing multiplication by a complex number a + ib has a in the diagonal entries. — b and b are the other two entries. In particular, this implies uxx + uyy = 0. The same Laplace equation holds true for the other component function v of any holomorphic function / = u + i v. At the level of differentials, it is useful to consider two (complex valued) linear forms dz = dx + i dy, dz = dx — i dy, together with the dual basis at the complexified tangent space d_ _ 1 f d_ _ .d_\ d__l(d_ .d_ dz 2 \dx dy J dz 2 \dx dy A straightforward check reveals that a differentiable function /:i7cC = R2^Cis complex differentiable if and only if dz 9.4.5. Integrals along paths. Another important link concerns integration along paths. A continuous path 7 in the complex plane is a continuous mapping 7 : J C R —> C denned on a bounded closed interval J = [a,b\. A path is called a simple closed path, or a Jordan curve in the complex plane9, if it does not intersect itself and 7(a) = 7(6). 9We shall be interested in piecewise differentiable Jordan curves only, and it is quite easy to see that these curves always divide the complex planes into exactly two connected components (it is obvious for piecewise linear and the rest comes via approximation). For general Jordan curves, this is a 680 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS The composition of a path 7 with any continuous mapping / denned on the target of 7 is again continous, and thus the (complex valued) Riemann integral f o 7(f) dt exists, but it is dependent on the parametrization of the path. An easy way out is to restrict ourselves to differentiable paths with the derivative j(t) ^ 0 for all t, and define the integral I7 along a path 7 as the Riemann integral This coincides perfectly with the Riemann integrals of real functions of one variable restricted to a reparametrization 7 of an interval inKcC. Writing / = u + iv and 7(f) = x(t) + i y(t), f{l)i= (u + iv)(x + iy) = (ux — vy) + i(vx + uy). Now we may check that the complex value I7 is independent of the choice of parametrization directly by the substitution formula for real integrals. We should also notice that actually the (complex valued) linear form f(z) dz on R2 = C equals (u + i v)(dx + i dy) = (udx — v dy) + i(v dx + u dy). Thus, J7 equals the integral of the linear form f(z)dz over the (unparametrized) submanifold 7 C C in the sense introduced in the first part of this chapter: J7 = / (ux — vy) dt + i / (vx + uy)dt= / f(z)dz. In fact, any choice of parametrization (7 0 on J) determines the orientation of 7. Thus the integral is independent of the parametrization, up to sign. If 7 is a composition 72 o ^ of two paths (we simply concatenate the curves 71 : [a, b] —> C and 72 : [b, c] —> C, 72(b) = 7i(&)), then clearly In particular if 72 = 7i , i.e. the same curve with opposite parametrizations, then f f(z) dz = 0. Clearly, our definition of integration extends to piecewise differentiable paths. By uniform continuity over compact domains, the value J7 depends continuously on the choice of the path 7 in the C° metric on the functions on the interval j. Thus we may approximate any integral J7 by integrating the same function over a piecewise linear path 7. 9.4.6. Antiderivatives. If F(z) is an antiderivative of f(z), then clearly and therefore we have verified the straightforward generalization of the Newton integral formula in one real variable calculus: difficult topological result attributed to the French mathematician Camille Jordan (1838-1922). This is the same Jordan related to the Jordan canonical form of matrices discussed in Chapter 4. / f(z)dz= J f(z)dz+ I f(z)dz -F(~f(t)) =F'(n(t))- 7(f) =Mt))- 7(t) 681 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Newton integral formula For each piecewise differentiable path 7 : [a,b] —> C and antiderivative F(z) of the function f(z) denned on a neighborhood of 7, (1) J7 = f f(z) dz = F(7(b)) - F(7(a)). In particular, the value of the integral depends only on the values of F in the endpoints of 7 and not on the path itself. Corollary. Antiderivatives to complex functions on connected domains are uniquely defined, up to complex constants. Proof. Assume F'(z) = G'(z), i.e. (F - G)'(z) = 0. Then for each path 7, 7(0) = z, 7(1) = w, {F - G){w) - {F - G){z) = / 0dz = 0, ■'1 and thus F — G is a constant function. □ As an example, consider the paths jr : [0,2ir] —> C, 7r(i) = re!',i.e. the positively oriented boundary of the ball 5(0, r) centered at the origin, with radius r > 0. It is easy to compute the integral of f(z) = zn along these paths, for all n e Z. {•2tt zndz= I rnenUireudt (2) 7r JO E2li[e(n+l)rt]2W = 0 n^_X ir° fi* e° dt = 2ni n =-1. Jo In particular, we see that the integral of any polynomial along any circle vanishes (cf. more details in ??). 9.4.7. Cauchy integral theorem. The formula 9.4.6(1) can be applied to closed paths and we arrive immediately at the important Cauchy integral theorem on the convergence discs of analytic functions. This result is actually available for all holomorphic functions on much more general domains. Recall that a domain fl is called simply connected if every simple closed continuous path in fl can be shrunk continuously into a point without leaving fl. Cauchy integral theorem Theorem. Let f : fl —> C be analytic on a simply connected domain fl and 7 be a closed piecewise differentiable path in fl. Then f f(z) dz = 0. Sketchy proof. The analytic function / has an anti-derivative F on each of its convergence discs. Assume first that the entire domain fl is contained in one such disc. Then we break the closed path 7 : [0,1] —> fl into the intervals 682 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 0 < 11 < t2 < ■ ■ ■ < tm = 1 on which 7 is differentiable, and /m —1 f(z)dz='£(F(1(tj+1))-F(1(tj))) 3=0 = F(t(1)) - p(7(0)) = 0 since 7(0) = 7(1). In particular, if T is a triangle lying entirely in a convergence disc of the analytic function f(z) inside of fl, then Jqt f(z) — 0. where dT is the oriented boundary of T. Next, without loss of generality, the path 7 can be viewed as a polygon, since any piecewise differentiable path 7(f) can be uniformly approximated by piecewise linear functions 7„(f), that form closed polygons. The integrals of / over jn approximate the integral under question. Thus, if we show f(z) dz = 0, this would imply that f f(z) dz = 0, too. It seems to be clear, that the interior of any closed polygon 7„ can be triangulated into closed triangles Tj so that all Tj lie in fl with their interiors. Actually, here we need the assumption that fl is simply connected if we want to fill in all the details. The integral along the path jn is then equal to the sum of the integrals over all the individual triangles (notice we integrate twice in the opposite directions over each edge which does not belong to jn). Finally, possibly refining the polygon 7„ and the triangulation, we may assume that the size of each triangle Tj is so small, that Tj lies entirely in some convergence disc of f(z). Therefore, / f(z)dz = J2 f f(z)dz = 0, and hence / /(z) dz = 0 as requested. □ 9.4.8. Cauchy integral theorem again. We were quite sloppy about the topological issues in the above sketch of the proof. Actually, there is a more general theorem deducing the conclusion of the Cauchy integral theorem under the assumption that the function / is complex differentiable (holomorphic). We shall prove this theorem under the additional assumption that / is (continuously) differentiable as a mapping of two real variables. Both conditions are obviously satisfied for analytic functions. We remark that the general claim of the theorem is proved by a procedure similar to the above argumentation, dealing first with the claim for triangles etc. The reader may find the full proof in any basic textbook on complex analysis. Theorem (Cauchy integral theorem). If f(z) is holomorhic in a simply connected domain fl C C, then for every piece-wise differentiable simple closed path 7 C fl I f(z) dz = 0. J1 683 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Proof of a special case. Without loss of generality, assume that 7 is a piecewise differentiable path bounding some simple connected region G. P^TV Write as usual f(z) = u(x, y) + i v(x, y) and 4^/u^s^-~ so f(z)dz= I udx — vdy + i / vdx + udy. SG JdG JdG Now, assuming / is differentiable as a functions of two real variables, the Green's version 9.1.13 of the general Stokes' theorem 9.1.12 implies »0^/0(|-t+'(£-t)H = 2i I — dx dy = 0, Jg 9z since H = 0 is equivalent to being holomorphic. □ The Cauchy integral theorem has an immediate consequence, ensuring the existence of antiderivatives: 9.4.9. Theorem. For every analytic function f(z) in a simply connected region fl, its antiderivative F(z) exists in that region. Proof. If fl is the convergence disc of a power series expression for /, then the claim is obvious, cf. 9.4.3. In general, fix a point z0 e fl, and consider an arbitrary C £ fl and any path 7 C fl with the initial point z0 and end point (. Define F(() = / f(z) dz. J1 Choose any other 7 with the same beginning and end, and prolong the path 7 by 7-1. This provides a closed path y = 7_1 07, and therefore, by the Cauchy integral theorem, f(z) dz = 0. Thus F(Q is well defined, independent of the choice of 7. Next, consider a small h such that the entire oriented segment v joining ( and £ + h is in fl. Then F((+h)-F(Q = J f(z) dz- Jf(z) dz = ff{C+ht)hdt. In particular F(C + h)-F(Q lim -1-^ = lim /(C + hi) dt = /(C) h—yO fl h—yO 0 and thus, F(Q is the requested antiderivative. □ Clearly, the antiderivative of an analytic function on a simply connected domain is again analytic. This follows immediately from corollary 9.4.6 and the local formula for anti-derivative, F(z) = J2^=o 7^p[cn(z ~ a)n+1 of the analytic functions/(z) = J3^10 c„ (2: —a)n on the individual convergence discs. It is related to the much more general concept of analytic extension which we shall discuss now. 684 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.4.10. Uniqueness. At first glance, the power series representing locally an analytic function in a domain fl should glue together. We shall deal first with the uniqueness issues. Uniqueness theorem Lemma. Consider an analytic function f(z) in fl and a sequence of its zeroes an G fl which has a limit point a G fl. Then f(z) = 0 everywhere in fl. Proof. We start with a simple observation on the non-vanishing of analytic functions: Claim. Let f(z) 0 be analytic in fl, with f(a) = 0 for some a G fl. Then, there exists e > 0 such that f(z) Ofor 0 < \z — a\ < e. Indeed, in some neighbourhood of a, the analytic function f(z) is represented by a power series f(Z) = lZn=0 °™(Z ~ aT- SlnCe f(a) = °> We haVe C0 = °- Let Cfc be the first non-zero coefficient in the series. Then f(z) = (z - a)kg(z), where g(z) = J27=k cn(z - aT~k and g(a) ^ 0. Therefore, by continuity of g(z), there exists a disc centered at a of the radius e > 0 where g(z) does not have zeroes and, consequently, f(z) ^ 0 for 0 < \z — a\ < e. The lemma is now a simple corollary of the above claim. Under the assumptions, f(z) must vanish identically on a non-trivial disc centered at a. Assume f(w) ^ 0, w G fl, and choose a path 7 with 7(0) = a, 7(1) = w. Define to to be the infimum of the nonempty set {t G [0,1]; /(7(f)) 7^ 0}. Then t0 > 0 and /(7(f)) is identically zero for t G [0,t0). Thus the above claim applies for a = j(t0), which leads to a contradiction with f(to) 7^ 0 and we are done. □ As a corollary, we see that any function f(z), analytic in two concentric discs is represented in those discs by the same power series f(z) = Y^=o Cn(z ~ a)™> where a is the centre of those discs, and Ck are Taylor coefficients Ck = -gf1-^ (a), A; = 0,1,.... 9.4.11. Analytic extension. The basic idea for gluing nonzero powerseries together is very simple. Consider a power series f(z) = J2^=o c^(z ~ a)™> converging in D = {\z — a\ < r} and some point b G D. If \z — b\ + \b — a\ < r, then J2™=o \cn\(\z — b\ + \b — a\)n converges and, thus, on the smaller disc Di, = {\z — b\ < s = r — \b — a\} we may rewrite the power series f(z) by expanding (z — a)n = (z-b + b-a)n: OO J2cn(z-a)n n=0 where all the series converge absolutely, and so the order of summation is irrelevant. 00 n / \ n=0fc=0 ^ ' 00 00 / \ 685 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Thus, writing dk = ]T^=fc (™)c„(& - a)n~k, the new power series f(z) = J2T=o dk(z — b)k converges at least on the disc Di, and we shall call it the re-expansion of J2n°=o Cn(z ~ a)n at tne centre b. The re-expansion is guaranteed to converge for \z — b\ < r — \b — a\. However, the radius of convergence of J2n°=o dn(z — b)n can be larger. The concept of analytic extension is based on this. Analytic elements and extensions An analytic element

o = $1, ■ ■ ■, &n =

C, if (i) for all t G [0,1] the radii of convergence of the discs Dt are Rt > 0 and the centres of C, then >pt =$t,forattt G [0,1]. Proof. Clearly, fo = fi = fi on the non-empty open subset D0 nDi nD2 of D0 nD2. By the uniqueness theorem J2 = fo everywhere in Do n D2, which proves the first claim. Next, considers = {t G [0, l]:9t= t0 and 9ta are immediate extensions of Pt for all t. □ 686 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS 9.4.12. Technical observations. In a sequence of simple observations, we show that analytic extension along a path can be always be obtained by analytic extensions along some chain of elements and vice versa. First, consider the family o along some 7(1) : [0,1] —> C and write R(t) for the corresponding radii of convergence. Proposition. If R(t) = 00 for some r G [0,1], then it is infinite for all t. If finite, then R(t) is continuous on [0,1]. Proof. If R(t) = 00, then each element C is the piece-wise linear path through the centres of 0 for all t and is continuous, due to the compactness of [0,1] it is separated from zero and, therefore R(t) > c for some constant c > 0. Uniform continuity of 7(1) implies that 35 > 0 such that \t2 — fi < S yields |t(*2) — < c. Since the intervals Jt = Ut n (t — |,f + |) cover [0,1] one can choose a finite subcover Jtl,..., Jtn_ x for some t1 < ... < tn-i, to which we append t0 = 0 and tn = 1 if these terminal points are missing from the sequence. Then, |7(iJ+i) — j(tj) I < c and the centres of $tj+1 and o or o or oo = ^0 along js(t). Let Rs(t) be the radius of convergence of 0 for all (s, t) G [0,1] x [0,1], there exists p > 0 such that Rs(t) > p for all s and t. Notice that j(s, t) is uniformly continuous and thus, fixing so £ [0,1] we may choose an interval VSa around so on the s-axis such that max |7(s,i) - 7(s0,f)| < -. te[o,i] 4 Then, for all s G VSo, the result of analytic extension remains the same as every element (t) and 7s0 (t) coincide, and so so G E. Hence E = [0,1], which proves the theorem. □ Corollary (Monodromy theorem). Consider a simply connected region Q C C and a canonical element & = (D, f) centred at a G Q. Suppose that & extends along any path 7 C fi through a. Then for any b G U, the extension of & along any path terminating at b is independent on the path 688 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS and as a result, produces the same analytic element for any such path. Thus, an analytic extension of& to every point in fl generates an analytic function that is represented as a convergent power series in any disc inscribed in fl. 9.4.14. Remarks. We look at some simple examples where the analytic extension is crucial. Consider the function f(z) = y/z. Clearly we may choose i the two different options /(l) = ±1 and each of the choices will lead to a canonical element. The analytic extensions of them are called the branches of a multivalued complex function /. Notice that ^fz is not analytic at the origin since the derivatives blow up to infinity there. Intuitively, it seems that two closed paths in C \ {0} are homotopic if and only if they run around the origin the same number of times (the winding number). We may imagine what happens to the values if we move z along a circle z = rel6. The two initial options lead to /i(*) = V^ei9/2 f2(z) = y/ieWK Once we run 6 from 0 to 2tt, the value of the branches swap. more in the other See ?? for more observations on root functions. column Another very important example is f(z) = z~x. Since f(z) integrates to 2iri over each circle centered in origin, there cannot exist an antiderivative to / along any of such circles. But locally, the antiderivative is the logarithmic function log z. The canonical element 9.4.11(1) extends to one of infinite branches and running along a circle must change its value by the constant 2iri. We return to general analytic functions. As promised a few pages back, the monodromy theorem implies that for any analytic function f(z) in a simply connected region fl, there exists an analytic function representing the antiderivative. Moreover, for each analytic element of /, this is just the antiderivative of the power series representing the function on the disc. Now we may also deduce the Cauchy integral theorem for simple connected regions fl in another way: The integral along the closed path dfl is given by the difference of the values of the antiderivative at the terminal points. Since the boundary dfl is homotopic to a point, the integral vanishes. 9.4.15. Cauchy theorem the third time. The Cauchy inte-vrp gral theorem holds also for analytic func- C jWÍ*^Sv ti°ns on domains fl which are not sim-ply connected. A bounded open domain ■////'' fl C C is said to have a regular boundary dfl if the set of boundary points dfl consists of finitely many piece-wise smooth and mutually disjoint Jordan curves 7o,7i,...,7„. Notice, the Jordan curves in the boundary divide the complex plane into n+2 connected components. Just one of them is unbounded, one of them coincides with fl and all the others are bounded "holes" inside of fl. We write 70 for the oriented exterior boundary, i.e. the boundary of the connected unbounded component of C \ fl oriented counter-clockwise, 689 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS while 71,..., 7„ form the oriented interior boundary, i.e. the curves are oriented clockwise. cauchy integral theorem Theorem. Let fl C C be a bounded region with regular boundary and f(z) analytic on the closure fl (i.e. analytic on some domain containing fl). Then f f(z) dz = 0. Jan Proof. With our choice of orientation of the boundary, we must prove f(z) dz= f f(z) dz + V / f(z) dz = 0. an J-,0 We proceed by induction on n. If n = 0, then clearly fl is simply connected and the theorem is proved already, see Theorem 9.4.8. If n = 1, then there is exactly one interior part of the boundary 71 and we may choose any two smooth paths \x\ and H2 joining the left most points and the right most points in 70 and 71, respectively. This way, we split fl into two simply connected regions ß+ (say the upper one) and fl~ (the lower one), with boundaries 9ß+ = 7J 0/^ 07+ o/i~ 7q o jj,2 0 ii 0 IH1 ■ At the same time, f(z)dz= / f(z)dz+ / f(z)dz an Jan+ Jan- since the integration over fii and /i"1, i = 1,2, cancel each other on the right hand side. Moreover, the boundaries df^ are again piecewise differentiable Jordan curves and therefore both integrals on the right hand side vanish. The general induction step is completely analogous. If n > 1, we find one of the interior boundaries 7, closest to 70. Choose two cuts [i±, [i2 so that one of the two newly created components of fl is simply connected. Then the other component has one less interior boundary and thus the theorem it needs a diagram with n i . j 1—1 the cuts, see the follows by induction. □ ma!Mm „„ ^ diagram 9.4.16. Cauchy integral formula. Consider an open ball without its center, fl = B(z0, r) \ {z0}, an analytic function f (z) in fl, and positively oriented ^Afc Jordan curves 7 C fl including z0 in its interior. Due to the Cauchy integral theorem, the integral f f(z) dz does depend on the choice of such 7. Indeed, the region enclosed between the two choices 71, 72 is bounded by them, but with opposite orientations. Thus the vanishing of the integral over the boundary means that actually the integrals over 71 and 72 are equal. Next, recall that the integral of z~x over any circle centered at the origin is 27ri, see 9.4.6(2). These observations suggest the following essential formula (we may expect that /(C) will behave similarly as the constant f(z) for 7 very small): 690 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Cauchy integral formula Theorem. Let f(z) be analytic function in the closure of the region fl C C with regular boundary. Then for all z in fl, 21"* Jdfi C - z Proof. Fix z G fl and consider an open disc Dp = {( e C; \( — z\ < p} lying inside of fl. The function g(£) = is analytic in the closure of fl \ Dp. Adopting the counterclockwise orientation of the boundary of Dp, the Cauchy integral theorem implies J9fi S> — z JdDp S> — 2 We aim at showing that fgD |^dC = 2 712/(2). We know 2mf(z) = JgD fr^d(. Thus, we consider J9,DP S> _ 2 JdDp S> - 2 and estimate = 2tt max \f(Q-f(z)\. \(-z\=P Clearly, the right hand side approaches zero as p —> 0. Hence, JdDp C-z and the formula in the theorem has been verified. □ Notice that if we consider z e C\fl in the above theorem, then the function is analytic in fl and thus the integral vanishes by the Cauchy integral theorem. 9.4.17. Corollaries. Taking consecutive derivatives with respect to z in the above formula, we obtain expressions for all derivatives of f(z): Cauchy integral formula for derivatives Corollary. Let f(z) be an analytic function in the closure of the region fl C C with regular boundary. Then for all z in fl, } [) 2«y9fi(c-,r+idC- Proof. Indeed, z is an independent argument in the smooth integrand, thus we may differentiate the integral, which yields the formula. □ 2^|/(C)-/(2) < max -!——-— 691 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Apply the Cauchy integral formula to the disc Dr = {\z — a\ < r} to obtain Mean value theorem Theorem. For f(z) analytic in Dr the value at the centre of the disc can be evaluated as Proof. By the Cauchy integral formula 27" J\C-a\=r C ~ « Substitute C = a + r el6 and d( = ir el6 d6 to obtain ^ /-2ir /(a)=2^/ f(a + re")d0. o □ 9.4.18. Laurent series. Already in 6.3.10, we noticed that >x quotients of two polynomials enjoy a quite Í7 nice expansion similar to power series. We — called series of the form X^-oo c^(z ~ a)n a Laurent series. Now we do the same with complex arguments and coefficients. The part of the Laurent series with positive powers, 1l^=o cn(z—a)n, is called the regular part, while the remaining part J2ň=-icn(z — a)n, consisting of negative powers of (z — a), is called its principal part. A Laurent series is called convergent if both the regular and principal parts converge. Laurent series Theorem. Every function f(z) analytic in the annulus A = {r < \z — a I > R}, with 0 < r < R < oo admits a representation by a Laurent series oo (1) /(*)= cn(z-a)n, n= — oo where the coefficients cn can be calculated by (2) c„= J ^®n+1 d(, neZ, r(*-°)". *» = J (c^TT^. n=0 , vs 7 If C ~ a = r'. then £ — a < 2: — a and similarly to the above, the expansion /(C) /(C) 1 ^ /(C) a_1 l^(Z- a)n+l ^ a> C-z (z- a) - ^ ' v ' z-a n=0 leads to the equality (via term by term integration) IC—al —r' Thus, we have obtained the Laurent series representation 00 /(*)= E c^-«r n= —oo as requested. On the other hand, if we are given a Laurent series (1), then fixing arbitrary n e Z and multiplying this formula by (z — a)~(n+1) we can integrate over \z — a\ = p in order to obtain --^ ) , - (iz = 27T J C„ . l*-a|=P(*-a)n+1 The circle {|z — a| = p} with r < p < R was chosen arbitrarily, and in particular we see that the integrals Jjz_n|=p (z-f^I+i ^z cannot depend on p. □ 9.4.19. Remarks on convergence. Given a Laurent series, its regular part J2^=oc^(z ~ a)n represents a power series that converges absolutely and uniformly on compact sets in its disc of convergence {\z — a\ < R} with = limsuPjj^oq cn|i, see the Cauchy'-Hadamard formula in Theorem 9.4.2. The principal part Y^n^-i0™^ ~ a)n becomes a power series J2^=i c-nWn after a coordinate change w = and this series converges for \w\ < \ with r = lim sup^^^ Ic_„ I " . Thus we have verified: 693 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Proposition. For any set of coefficients {cn,n G Z} set — = lim sup cn | ~, r = lim sup c_„ | ™. -tX n—>-00 77—>-00 Tften ffre Laurent series oo /(*) = E Cn(z-aT 77= —OO converges absolutely and uniformly on any compact set in the annulus A = {r < \z — a\ < R}. It is analytic in A and if \z — a\ < r, then the principal part ^77=^1 °n(z ~ a)n diverges, while the regular part Y^=o °n(z ~ a)™ diverges for 2 — a I > R. 9.4.20. Link to Fourier series. There is a very interesting link between Laurent and Fourier series. If / is /'■^■"•j^' analytic in .1 — {1 — p < 2 < l+p} for some p > 0 then its n-th Laurent series coefficient 1 t fi-z).dz = — r f(eu)e-mtdt. 2m 7|Z,_. 2™+! 2tt Therefore, c„ represents the n-th Fourier coefficient of 0 for any R > 0. Thus, c„ = 0 for any n > 1 and consequently / (2) = CQ. □ 9.4.22. Isolated singularities. We look at typical examples of of analytic functions around "suspicious" points. Consider the fraction / (2) = The origin is the zero point of both sin 2 and 2, and since they behave very similarly for small 2, we seelimz^0/(2) = 1- 694 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS On the other hand f(z) = -z grows towards infinity, limz_).o \ — °°> in the sense of the extended complex plane C = C U oo (also called the Riemann sphere). We can imagine C as the sphere with the stereographic projection onto the plane C, see the picture. Then clearly \imz^a f(z) = oo if and only if limz^0 l/l2)! = oo in the sense of standard analysis in real variables. It might easily happen that the limit does not exist at all, see the theorem below. For example, take f(z) = e~ around the point z = 0. It is given by the infinite principle part of a Laurent series, f(z) = J2n=-i tIt2™- m general we talk about isolated singular points: Isolated singularities If f(z) is analytic in a punctured neighbourhood V = {0 < \z — a\ < p} then a is called an isolated singular point for f(z). We say that the singular point is • removable, if there is a finite limit lim f(z) = b e C; z—ya • a pole, if lim f(z) = oo; z—ya • an essential singularity, if lim f(z) does not exist in C. z—ya A function f(z) with only isolated singularities in a domain 17 c C and without any essential singularities is called a meromorphic function in 17. The function f(z) = tan-j provides an example of a non-isolated singularity at a = 0, as 0 is the limit of poles (f +n'rr)-1n eZ, of/(z). On the other hand, all rational functions f(z)/g(z) are meromorphic in C. The following theorem classifies isolated singularities and poles in terms of Laurent series. Theorem. The following properties are equivalent: • the point z = a is a removable singularity for f(z); • |/(;z)| is bounded in some punctured neighbourhood V = {0 < \z - a\ < p}; • the Laurent series of f(z) in V = {0 < \z — a\ < p} is the Taylor series f(z) = Er^=o °n(z ~ a)™' (,e- ^e principal part vanishes; • /(a) can be defined so that f(z) becomes analytic in {\z — a < p}. Further, the point z = a is a pole for f(z) if and only if the principal part of the Laurent series of f(z) in {0 < \z — a < p} contains only finitely many terms, i.e. f(z) = 2~2l^=-N cn{z—a)n, n £ N, for some integer N (the smallest N with this property is called the order or the pole at z = a). Finally, the Laurent series f(z) = E^=-oo °n(z ~ a)n in the punctured neighbourhood of a contains infinitely many terms with non-zero coefficients cn, n < 0, if and only if z = ais an essential singularity for f(z). Proof. If \f(z)\ < M for 0 < \z - a\ < p, then by the Cauchy inequalities 9.4.21(1), |c_„| < Men, n > 0, for all 0 < e < p. Therefore, all coefficients with negative indices 695 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS vanish, and oo n=0 Define /(a) = cq to obtain a power series that converges in the entire disc {\z — a\ < p}. This implies the equivalence of the four conditions in the first part of the theorem. By the definition of a pole, f(z) ^ 0 in some punctured disc D = {0 < \z — a\ < p' < p} around a as limz^a f(z) = oo. Therefore g(z) = j^j is also analytic in D and limz^a g(z) = 0. Hence, g(z) is analytic in D assuming s(a) = 0 and, therefore, g(z) = (z — a)Nh(z) for some integer n and analytic function h(z) with h(a) ^ 0. Thus, j4^r is also analytic on a neighborhood of a and, therefore, Conversely, if f(z) = ^~yrh(z), where h(a) ^ 0, then limz^a f(z) = oo and a is a pole for /. Finally, an isolated singularity of f(z) is neither removable nor a pole if and only if the principle part of its Laurent series is infinite and this observation finishes the proof. □ 9.4.23. Some consequences. There are several straightforward corollaries of our classification of isolated singularities. In particular, if limz^a f(z) does not exist, then f(z) has really chaotic behaviour: Theorem. If a e C is an essential singularity of f(z), then for any w G C there is a sequence zn —> a such that lirrin^oo f(zn) = w. Proof. Let w = oo. Since the singularity z = a is not removable, f(z) cannot be bounded in any punctured neighbourhood of a. So there exists a sequence zn —> a such that lirrin^oo f(zn) = oo. For w e C, if in any punctured neighbourhood of a there is a point z such that f(z) = w, then by making a sequence with those points we obtain a sequence 2„such that f(zn) = w, as required. If there is a punctured neighbourhood of a where /(z) 7^ w, then 3(2) = }(z)-w a^so ^as 311 isolated singularity at z = a, which cannot be a pole or a removable one, as otherwise f(z) = w + would have a limit as 2 —> a. Therefore z = a is an essential singularity for 3(2) and, thus, there is a sequence 2„ —> a such that g(zn) = 00, which implies that lim^oo =w. □ We say that 00 e C is an isolated singularity of /(z) if f(z) is analytic in {|2 — a\ > R} for some R > 0. These are straightforward consequences of the Liouville theorem if 00 is the only singularity of f(z): Corollary. If f(z) is analytic in C and z = 00 is a removable singularity for f(z) then f(z) is a constant. If f(z) is analytic in C and z = 00 is a pole then f(z) is a polynomial f(z) = Y^j=o cj (z ~ a)J- 696 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS Proof. The first claim is a simple reformulation of the Liouville theorem, cf. 9.4.21. To deal with the other claim, consider g(w) = f(^). Then w = 0is apolefor g(w). Let P(w) = Y^f=i cjw~-' be the principal part of Laurent series for g(w). Thus, h(w) = g(w) — P(w) is analytic in C with a removable singularity at w = 0. Moreover, limUJ_j.00 h{w) = limUJ_j.00 g(w) = /(0). Thus, \h(w) | is bounded, and by the Liouville theorem h(w) = const = /(0) = Co. Hence f(z) = g(z_1) = E f=o cj 2-7 > which is a polynomial in 2. □ 9.4.24. Residues. Next we return to the Cauchy integral theorem, with our knowledge of isolated singularities. A residue of an analytic function f(z) at its isolated singular point a e C is defined as resa/ = — f(z) dz, 271"* J\z\=r where 0 < r < p. Obviously, the definition does not depend on the choice of r. Residue Theorem Theorem. If f(z) is represented by the Laurent series Y^=-oo cn(z- a)n, then resa f = c_i. Further, consider a domain DcC and a function f(z) analytic in D \ {ai,..., an}, where cij G D, j = 1,..., n. Then P n I f(z)dz = 2my2resa. /. Job j=1 Proof. Integrating the Laurent series J2n°=-oo cn(z — a)n and using the fact that Jjz_a|=p(z — a)n dz = 0 unless n = —1, while f (z — a)-1 dz = 2ixi, we obtain that resa / = c_i. Next, choose such p > 0 that open discs Dj = {\z — aj\ < p}> 3 — 1, • • •,n, have pairwise empty intersections and their closures Dj belong to D. Then the Cauchy Integral theorem 9.4.15, applied to Dp = D \ |J"=i Dj yields 9.4.25. Residues at infinity. Recall that when integrating along the circle \ z — a\ = R, we always assume counterclockwise orientation on the circle. Thus we use the minus sign in the definition: If f(z) is analytic in the closure of an exterior of a disc {\z\ > i?},then resoo / = - f(z) dz. 697 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS In terms of the Laurent series f(z) = J2n°=-oo c^(z ~ a)™> valid in {\z\ > R}, we have resoo / = —c_i. Note that if f(z) is analytic in C \ {a±,..., an}, then n resoo / + resaj. / = 0. Indeed, by taking a disc {\z\ < R} of sufficiently large radius that does not contain singularities on its boundary, we conclude that /f(z) dz = res„ . f. \*\=R j=1 9.4.26. Example of applications. Residues of analytic functions are used for the evaluation of improper integrals in real analysis. The following lemma turns out to be very useful for such purposes. We shall write M(R) for the maximum of | / (z |) over the upper half of the circle with radius R, i.e. M(R) = max|z|=fliImz>0 \ f(z)\. Jordan's lemma Lemma. Consider the function f(z) continuous on {Imz > 0, \z\ = R}. Then, for each positive real parameter t, f(z)ettz dz < ^-M(R). |z|=fl,Im z>0 Consequently, if f(z) is continuous on {Imz > 0, \z\ > Ro} and Mmb^oo M(R) = 0, then lim / f(z)ettzdz = 0. R^-oo J |z| = R,Im z>0 Proof. We estimate the integral from the lemma: f f(Re*9)e-tRsin9+URcos9iRe*9d8 Jo < RM(R) f e-tRsin9d8. Jo To evaluate the latter integral, we observe that sin 6 > ^8 for 0 < 6 < §. Thus, using t > 0 and the substitution r = we arrive at RM(R) J e-tRsin9d8 = 2RM(R) f* e"^sin 9 d9 Jo Jo rRt <2RM(R) e=2^Ld6=-M(R) e~T dr Jo t Jo = -j-M(R)(l - e-Rt) < jM(R). The consequence for R —> oo is obvious. □ Typically, the Jordan lemma is used to compute improper integrals of real analytic (complex valued) functions g(x) = 698 CHAPTER 9. CONTINUOUS MODELS - FURTHER SELECTED TOPICS f(x) eltx along the entire real line (or rather the real or imaginary parts of such integrals). If the corresponding complex analytic function f(z) has only a finite number of poles ak in the upper half plane and lim^oo M(R) = 0, then we may compute the real integral where jr is the path composed byf the interval [-R, R] and the half upper circle of radius R. See the diagram and the examples in the other column. 9.4.27. Concluding remarks. Of course we have not touched on many important issues in this short introduction. These include the conformed properties of all analytic functions, i.e. they preserve all angles of curves, the richness of analytic functions which allow the mapping of any simply connected region fl bijectively to the unit open disc (i.e. both the map and its inverse are analytic, the Riemann mapping theorem). The proper setup for analytic extensions are the Riemann surfaces with their fascinating topological properties. Also, we only commented on the possibility of proving the Cauchy integral theorem for triangles just assuming the existence of complex derivative. The analyticity of all holomorhic functions then follows from the Cauchy integral formula. Moreover, we have not mentioned the functions of several complex variables at all! We hope that all of these interesting issues will challenge the readers to go for further more detailed study in the relevant literature. 699 CHAPTER 10 Statistics and probability methods Is statistics a part of mathematics ? - whenever it is so, we need much of mathematics there... ! A. Dots, lines, rectangles The obtained data from reality can be displayed in many ways. Let us illustrate some of them. 10.A.1. Presenting the collected data. 20 mathematicians were asked about the number of members of their household. The following table displays the frequency of each number of members. Number of members 1 2 3 4 5 6 Number of households 5 5 1 6 2 1 Create the frequency distribution table. Find the mean, median and mode of the number of members. Build a column diagram of the data. Solution. Let us begin with the frequency distribution table. There, we write not only the frequencies, but also the cumulative frequencies and relative frequencies (i. e., the probability that there is a given number of members in a randomly picked household). Let us denote the number of members by x{, the corresponding frequency by n{, the relative frequency by pi (= ni/ J2&j=i nj = «i/20), the cumulative frequency by iV, (= Yl]=i xj)< and me relative cumulative frequency by F{ Roughly speaking, statistics is any processing of numerical or other type of data about a population of objects and their presentation. In this context, we talk about descriptive statistics. Its objective is thus to process and comprehensibly represent data about objects of a given "population" — for instance, the annual income of all citizens obtained from the complete data of revenue authorities, or the quality of hotel accommodation in some region. In order to achieve this, we focus on simple numerical characterization and visualization of the data. Mathematical statistics uses mathematical methods to derive conclusions valid for the whole (potentially infinite) population of objects, based on a "small" sample. For instance, we might want to find out how much a certain disease is spread in the population by collecting data about a few randomly chosen people, but we interpret the results with regard to the entire population. In other words, mathematical statistics makes conclusions about a large population of objects based on the study of a small (usually randomly selected) sample collection. It also estimates the reliability of the resulting conclusions. Mathematical statistics is based on the tools of probability theory, which is very useful (and amazing) in itself. Therefore, probability theory is discussed first. This chapter provides an elementary introduction to the methods of probability theory, which should be sufficient for correct comprehension of ordinary statistical information all around us. However, for a serious understanding of a mathematical statistician's work, one must look for other resources. 1. Descriptive statistics Descriptive statistics alone is not a mathematical discipline although it uses many manipulations with numbers and sometimes even very sophisticated methods. However, it is a good opportunity for illustrating the mathematical approach to building generally useful tools. At the same time, it should serve as a motivation for studying probability theory because of later applications in statistics. many pictures missing!! CHAPTER 10. STATISTICS AND PROBABILITY THEORY (= A^/20 = E;=iPj): n{ Pi Ni Fl 1 5 1/2 5 1/4 2 5 1/4 10 1/2 3 1 1/20 11 11/20 4 6 3/10 17 17/20 5 2 1/10 19 19/20 6 1 1/20 20 1 Now, we can easily construct the wanted (column) graphs of (relative, cumulative) frequencies: The mean number of members of a household is: 5-1 + 5- 2 + 1- 3 + 6-4 + 2- 5 + 1- 6 20 = 2.9. The median is the arithmetic mean of the tenth and eleventh values (having been sorted), which are respectively 2 and 3, i. e., x = 2.5. The mode is the most frequent value, i. e., x = 4. The collected data can also be presented using a box plot: The upper and lower sides of the "box" correspond respectively to the first (lower) and the third (upper) quartile, so its height is equal to the interquartile range. The thick horizontal line is drawn at the median level; the lower and upper horizontal lines correspond respectively to the minimum and maximum elements of the data set, or to the value that is 1.5 times the interquartile range less than the lower side of the box (and greater than the upper side, respectively). The data outside this range would be shown as circles. We can also build the histogram of the data: In our brief introduction, we first introduce the concepts allowing to measure the positions of data values and the variability of the data values (means, percentiles etc.). We touch the problem how to visualize or otherwise present the data sets (diagrams). Then we deal with the potential relations between more data sets (covariance and principal components) and, finally, we deal with data without numerical values relying just on their frequencies of appearance (entropy). 10.1.1. Probability, or statistics? It is not by accident that rrs> _ we return to a part of the motivating hints 'C m^z?^\ fr°m me first chapter, as soon as we have managed to gather enough mathematical ■ tools both discrete and continuous. Nowadays, many communications are of a statistical nature, be it in media, politics, or science. Nevertheless, in order to properly understand the meaning of such a communication and using particular statistical methods and concepts, one must have a broad knowledge of miscellaneous parts of mathematics. In this subsection, we move away from the mathematical theory; and think about the following steps and our objectives. As an example of a population of objects, consider the students of a given basic course. Then, the examined numerical data can be: • the "mean number of points" obtained during the course in the previous semester and the "variance" of these values, • the "mean marks" for the examination of this and other courses and the "correlation" (i.e. mutual dependence) of these results, • the "correlation" of data about the past results of given students, • the "correlation" of the number of failed exams of a given student and the number of hours spent in a temporary job, • ... With regard to the first item, the arithmetic mean itself does not carry enough information about the quality of the lecture or of the lecturer, nor about the results of particular students. Maybe the value which is "in the middle" of the population, or the number of points achieved by the student who was just better than half of the students is of more concern. Similarly, the first quarter, the last quarter, the first tenth, etc. maybe of interest. Such data are called statistics of the population. Such statistics are interesting for the students in question as well, and it is quite easy to define, compute, and communicate them. From general experience or as a theoretical result outside mathematics, a reasonable assessment should be "normally" distributed. This is a concept of probability theory, and it requires quite advanced mathematics to be properly denned. Comparing the collected data about even a small random population of students to theoretical results can serve in two ways: We can estimate the parameters of the distribution as well as draw a conclusion whether the assessment is reasonable. 701 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Histogram of x Note that the frequencies of one- and two-member households were merged into a single rectangle. This is used in order to make the data "easier to read" - there exist (various and ambiguous) rules for the merging. We simply mention this fact without presenting an exact procedure (it is just as anyone likes). □ 10.A.2. Given a data set x = (x1,x2, ■ ■ ■ ,xn), find the mean and variance of the centered values x{ — x and the standardized values ^f2- ■ Solution. The mean of the centered values can be found directly using the definition of arithmetic mean: ^ n 1 n - 71 — / (xt — x) = — > Xi--> 1 = x — x = 0. n ^ ' n ^ n ^ 2 = 1 2 = 1 2 = 1 The variance of the centered values is clearly the same as for the original ones (sx). For the standardized values, the mean is equal to zero again, and the variance is □ 10.A.3. Prove that the variance satisfies sx = i J27=i x1 ~ x2. Solution. Using the definitions of variance and arithmetic mean, we get: 1 Yl -2x-x+x2) = ^J2x^-^J2x^+- i=l i=l i=l 1 n 1 \ " 2 -2 2x ■ At the same time, the numerical values of statistics for a given population can yield qualitative description of the likelihood of our conclusions. We can compute statistics which reflect the variability of the examined values, rather than where these values are positioned within a given population. For instance, if the assessment does not show enough variability, it may be concluded that it is badly designed, because the students' skills are of course different. The same applies if the collected data seem completely random. In the above paragraph, it is assumed that the examined data is reliable. This is not always the case in practice. On the contrary, the data is often perturbed with errors due to construction of the experiment and the data collection itself. In many cases, not much is known about the type of the data distribution. Then, methods of non-parametric statistics are often used (to be mentioned at the end of this chapter). Very interesting conclusions can be found if we compare the statistics for different quantities and then derive information about their relations. For example, if there is no evident relation between the history of previous studies and the results in a given course, then it may be that the course is managed wrongly. These ideas can be summarized as follows: • In descriptive statistics, there are tools which allow the understanding of the structure and nature of even a huge collection of data; • in mathematics, one works with an abstract mathematical description of probability, which can be used for analysis of given data. Especially, this is when there is a theoretical model to which the data should correspond; • conclusions of statistical investigation of samples of particular data sets can be given by mathematical statistics; • mathematical statistics can also estimate how adequate such a description is for a given data set. 10.1.2. Terminology. Statisticians have introduced a great many concepts which need mastering. The fundamental concept is that of a statistical population, which is an exactly denned set of basic statistical units. These can be given by enumeration or by some rules, in case of a larger population. On every statistical unit, statistical data is measured, with the "measurement" perceived very broadly. For instance, the population can consist of all students of a given university. Then, each of the students is a statistical unit and much data can be gathered about these units - the numerical values obtainable from the information system, what is their favorite colour, what they had for dinner before their last test, etc. The basic object for examining particular pieces of data is a data set. It usually consists of ordered values. The ordering can be either natural (when the data values are real numbers, for example) or we can define it (for instance, when we observe colours, we can express them in the RGB format 702 CHAPTER 10. STATISTICS AND PROBABILITY THEORY □ 10.A.4. The following values have been collected: 10; 7; 7; 8; 8; 9; 10; 9; 4; 9; 10; 9; 11; 9; 7; 8; 3; 9; 8; 7. Find the arithmetic mean, median, quartiles, variance, and the corresponding box diagram. Solution. Denoting the individual values by a{ and their frequencies by rij, we can arrange the given data set into the following table. a{ 3 4 7 8 9 10 11 1 1 4 4 6 3 1 From the definition of arithmetic mean, we have _ 3 + 4 + 4- 7 + 4- 8 + 6- 9 + 3-10 + 11 _ 162 X~ l + l + 4 + 4 + 6 + 3 + l ~ ^0" ~ Since the tenth least collected value is X(10) = 8 and the eleventh one is X(u) = 9, the median is equal to x = §±2 = 8.5. The first quartile is 20.25 = X(5)+X(6) = 7, and the third quartile is 20.75 = x<15>+x<16> = g_ From the definition of variance, we get s2: 5.12 + 4.12 + 4 ■ l.l2 + 4 ■ 0.12 + 6 ■ 0.92 + 3 ■ 1.92 + 2.92 1+1+4+4+6+3+1 = 3.59. The histogram and box diagram are shown in the following pictures. Histogram of x to "I I- in - f - I-1- l cu 3 n - ... o" ■p u_ ey - 1-1 o -1 '-1-1-1-1-1-1-' i-1-1-1 4 6 8 10 X where we have used "statistics" method to make the histogram "nice" and "clear". You can find a lot of these conventions in the books on statistics, but if you do not know them, and order them with respect to this sign). We can also work with unordered values. Since statistical description aims at telling comprehensible information about the entire population, we should be able to compare and take ratios of the data values. Therefore, we need to have a measurement scale at our disposal. In most cases, the data values are expressed as numbers. However, the meaning of the data can be quantified variously, and thus we distinguish between the following types of data measurement scales. Types of data measurement scales The data values are called: • nominal if there is no relation between particular values; they are just qualitative names, i.e. possible values (for instance, political parties or lecturers at a university when surveying how popular they are); • ordinal the same as above, but with an ordering (for example, number of stars for hotels in guidebooks); • interval if the values are numbers which serve for comparisons but do not correspond to any absolute value (for example, when expressing temperature in Celsius or Fahrenheit degrees, the position of zero is only conventional); • ratio if the scale and the position of zero are fixed (most physical and economical quantities). With nominal types, we can interpret only equalities x\ = x2\ with ordinal types, we can also interpret inequalities x1 < x2 (or x1 > x2); with interval types, we can also interpret differences x1 — x2. Finally, with rational types, we have also ratios 2:1/2:2 available. 10.1.3. Data sorting. In this subsection, we work with a dataset x1,x2,... ,xn, which can be ordered (thus, theirtype is not nominal) and which have been obtained through measurement on n statistical units. These values are sorted in a sorted data set (1) 2(1),2(2),...,2(rl). The integer n is called the size of the data set. When working with large data sets where only a few values occur, the simplest way to represent the data set is to enumerate the values' frequencies. For instance, when surveying the political party preference or when presenting the quality of a hotel, write only the number of occurrences of each value. If there are many possible values (or there can even be continuously distributed real values), divide them into a suitable number of intervals and then observe the frequencies in the given intervals. The intervals are also called classes and the frequencies are called class frequencies. We also use cumulative frequencies and cumulative class frequencies which correspond to the sum of frequencies of values not exceeding a given one. 703 CHAPTER 10. STATISTICS AND PROBABILITY THEORY you are lost. This is the default setting of the R program. For example if you replace just value 3 by 2 you get quite different looking histogram: Histogram of x —i 12 □ Most often, the mean at of a given class is considered to be its representative, and the value a^n, (where n{ is the frequency of the class) is the total contribution of the class. Relative frequencies a, /n, and relative cumulative frequencies, can also be considered. A graph which has the intervals of particular classes on one axis and rectangles above them with height corresponding to the frequency is called a histogram. Cumulative frequency is represented similarly. The following diagram shows histograms of data sets of size n = 500 which were randomly generated with various standard distributions (called normal, x2, respectively). 10.1.4. Measures of the position of statistical values. If the magnitude of values around which the collected data values gather are to be expressed, ^^1~§»- then the concepts of the definition below can be used. There, we work with ratios or interval types of scales. Consider an (unsorted) data set (x1,..., xn) of the values for all examined statistical units and let n\,..., nm be the class frequencies of m distinct values a1,... ,am that occur in this set. Means 10.A.5. 425 carps were fished, and each one was taken weighed. Then, mass intervals were set, resulting in the following frequency distribution table: Weight (kg) 0-1 1-2 2-3 3-4 4-5 5-6 6-7 Class midpoint 0.5 1.5 2.5 3.5 4.5 5.5 6.5 Frequency 75 90 97 63 48 42 10 Draw a histogram, find the arithmetic, geometric, and harmonic means of the carps' weights. Furthermore, find the median, quartiles, mode, variance, standard deviation, coefficient of variation, and draw a box plot. Solution. The histogram looks as follows: Definition. The arithmetic mean (often only mean) is given as x =—xi = — 7i,- 71 ^ 71 ^ 3 = 1 3 = 1 The geometric mean is given as X = \y,X1X2 ---Xn and makes sense for positive values xt only. The harmonic mean is given as \ 71 L—' Xi \ i=l 1 and is also useded for positive values x{ only. The arithmetic mean is the only one of the three above which is invariant with respect to affine transformations. For all scalars a, b, ^ n n (a + b ■ x) = — > (a + bxA = a + b} X{ = a + b ■ x. 704 CHAPTER 10. STATISTICS AND PROBABILITY THEORY K-5d=.16447, ; LI lliatm pt ,01 Ccekavane nonnalrii 0 1 ? 3 4 s Therefore, the arithmetic mean is especially suitable for interval types. The logarithm of the geometric mean is the arithmetic mean of the logarithms of the values. It is especially suitable for those quantities which cumulate multiplicatively, e. g. interests. If the interest rate for each time period is x{%, then the final result is the same as if the interest rate had constant value of xG%. See 10.A.9 for an example where the harmonic mean is appropriate. In subsection 8.1.29 (page 547), we use the methods invented there to prove that the geometric mean never exceeds the arithmetic mean. The harmonic mean never exceeds the geometric mean, and so From the definitions of the corresponding concepts in subsection 10.1.4, we can directly compute that the arithmetic mean is x = 2.7 kg, the geometric mean is xG = 2.1 kg, and the harmonic mean is xH = 1.5 kg. By the definitions of subsection 10.1.5, the median is equal to x = x0,5 = 2.5 kg, the lower quartile to 20.25 = 1.5 kg, the upper quartile to 20.75 = 3.5 kg, and the mode is 2 = 2.5 kg. From the definitions of subsection 10.1.6, we compute the variance of the weights, which is s2 = 2.7 kg2, whence it follows that the standard deviation is sx = 1.7 kg, and the coefficient of variation is Vx = 0,6. □ 10.A.6. Prove that the entropy is maximal if the nominal values are distributed uniformly, i. e., the frequency of each class is rii = 1. Solution. By the definition of entropy (see 10.1.11), we are looking for the maximum of the function Hx = — Y17=iPAnPi with respect to unknown relative frequencies Pi = which satisfy Y^=1Pi = 1. Therefore, this is a typical example of finding constrained extrema, which can be solved using Lagrange multipliers. The corresponding Lagrange function is L(pi, ■ ■ ■ ,Pn, A) = - ^Pilnpi + A ^pi - 1 The partial derivatives are J^- = — lnp, — 1 + A, hence its stationary points is determined by the equations pi = eA_1 for alH = 1,..., n. Moreover, we know that the sum of the relative frequencies pi is equal to one. This means that 7ieA_1 = 1, whence we get A = 1 — Inn. Substitution then 10.1.5. Median, quartile, decile, percentile, ... Another way of expressing the position or distribution of the values is to find, for a number a between zero and one, such a value xa that 100a% of values from the set are at most xa and the remaining ones are greater than xa. If such a value is not unique, one can choose the mean of the two nearest possibilities. The number xa is called the a-quantile. Thus, if the result of a contestant puts him into 21.00, it does not mean that he is better than anyone else yet. However, there is surely no one better than him. The most common values of tire the following: • The median (also sample median) is defined by X = 20.50 for odd n I (x(n/2) + Z(n/2+i)) for even n yields pi □ where x^ corresponds to the value in the sorted data set 10.1.3(1). • The first and third quartile are Qi = 20.25 and Q3 = 20.75. respectively. • The p-th quantile (also sample quantile or percentile) xp, where 0 < p < 1 (usually rounded to two decimal places). One can also meet the mode, which is the value 2 that is most frequent in the data set 2. The arithmetic mean, median (with ratio types), and mode (with ordinal or nominal types) correspond to the "anticipated" values. Note that all a-quantiles with interval scales are invariant with respect to affine transformations of the values (check this yourselves!). 10.1.6. Measures of the variability. Surely any measure of the variability of a data set 2 e R™ should be invariant with respect to constant translations. In the Euclidean space R™, both the standard distance and the sample mean have this property. Therefore, choose the following: 705 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.A.7. The following graphs depict the frequencies of particular amounts of points obtained by students of the MB 104 lecture at the Faculty of Informatics of Masaryk University in 2012. The axes of the cumulative graph are "swapped", as opposed to the previous example. The frequencies of particular amounts of points are enumerated in the following table: # of points # of students 20.5 1 20 1 19 2 18.5 1 18 2 17.5 3 17 2 16.5 4 16 3 15.5 5 15 7 14.5 6 14 14 13.5 21 13 21 12.5 19 12 17 11.5 18 11 31 10.5 22 10 53 # of points # of students 9.5 9 9 9 8.5 13 8 8 7.5 13 7 4 6.5 7 6 4 5.5 8 5 7 4.5 9 4 5 3.5 7 3 8 2.5 8 2 14 1.5 8 1 2 0.5 6 0 9 The corresponding histogram looks as follows: The histogram was obtained from the Information System of Masaryk University. We can see that the data are shown in a somewhat unusual way: individual amounts of points correspond to "double rectangles". It is a matter of taste how to represent the data (it is possible to merge some Variance and standard deviation Definition. The variance of a data set x is defined by 1 " i=l The standard deviation sx is defined to be the square root of the variance. As requested, the variability of statistical values is independent of constant translation of all values. Indeed, the un-sorted data set y = (x1 + c, x2 + c,..., xn + c) has the same variance sy = sx. Sometimes, the sample variance is used, where there is (n — 1) in the denominator instead of n. The reason will be clear later, cf. 10.3.2. In case of class frequencies rij of values a, for m classes, this expression leads to the value of the variance. In practice, it is recommended to use the Sheppard's correction, which decreases s2, by h2/12, where h is the width of the intervals that define the classes. Further, one can encounter the data-set range R = xi and the interquartile range ^3 The mean deviation, which is defined as the mean distance of the values from the median: 1 71 dx=1-y,\ 71 -' The following theorem clarifies why these measures of variability are chosen: Theorem. The function S(t) = (l/n)Y^=1(xi—t)2 hasthe minimum value att = x, i.e., at the sample mean. The function d(i) = (1/n) Y^7=i \Xi ~ ^ ^as ^e minimum value att = x, i.e., the median. Proof. The minimum of the quadratic polynomial f(t) = Y17=i (xi — /)2 is at the only root of its derivative: 71 f'(i) = -2j2(xt-t). i=l Since the sum of the distances of all values from the sample mean is zero t = x is the requested root and the first proposition is proved. As for the second proposition, return to the definition of the median. For this purpose, rearrange the sum so that the first and the last summand is added, then the second and the last-but-one summand, etc. In the first case, this leads to the 706 CHAPTER 10. STATISTICS AND PROBABILITY THEORY values, thereby decreasing the number of rectangles, or to use thinner rectangles). □ 25 20 15 10 0 50 100 150 200 250 300 350 400 450 500 pořadí studentů We can notice that the mode of the values is 10, which, accidentally, was also the number of points necessary to pass the course. The mean of the obtained points is 9.48. 10.A.8. Here, we present column diagrams of the amounts of points of MB 101 students in autumn 2010 (the very first semester of their studies). The first one corresponds to all students of the course; the second one does to those who (3 years later) successfully finished their studies and got the bachelor's degree. — in expression — t\ + \x^ — t\, and this is equal to the distance X(n) — x^ provided t lies inside the range, and it is even greater otherwise. Similarly, the other pair in the sum gives X(n_i) — 2(2) if 2(2) < t < 2(n_i), and it is greater otherwise. Therefore, the minimality assumption leads to t = i. □ In practice, it is required to compare the variability of data sets of different statistical populations. For this purpose, it is convenient to relativize the scale, and so use the coefficient of variation of a data set x: Vx = \x\ This relative measure of variability can be perceived in percentage of the deviation with respect to the sample mean x. 10.1.7. Skewness of a data set. If the values of a data set are distributed symmetrically around the mean value, then x = x However, there are distributions where x > x. This is common, for instance, with the distribution of salaries in a population where the mean is driven up by a few very large incomes, while much of the population is below the average. A useful characteristic concerning this is the Pearson coefficient, given by x — x P = 3-. It estimates the relative measure (the absolute value of (3) and the direction of the skewness (the sign). In particular, note that the standard deviation is always positive, so it is already the sign of x — x which shows the direction of the skewness. QUANTILE COEFFICIENTS OF SKEWNESS More detailed information can be obtained from the quantile coefficients of skewness x\—p -\- Xp 2>x Pp — , X\—p Xp for each 0 < p < 1/2. Their meaning is clear when the numerator is expressed as (x1-p — x) — (x — xp). In particular, the quartile coefficient of skewness is obtained when selecting p = 0.25. Again, the results can be depicted in an alternative way: 10.1.8. Diagrams. People's eyes are well suited for perceiving information with a complicated structure. That is why there exist many standardized tools for displaying statis-'////■ ■ tical data or their correlations. One of them is the box diagram. 707 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 0 100 200 300 400 500 600 porsdf studentu And these are the graphs of amounts of points obtained by those students who continued their studies: 16 0 2 4 6 8 10 12 14 16 18 body 0 20 40 60 80 100 120 140 160 poradf studentu We can see that in the former case, the mode is equal to 0, while in the latter case, it is 10 again. The frequency distribution is close to the one of the MB 104 course, which is recommended for the fourth semester. BOX DIAGRAM The diagram illustrates a histogram and a box diagram of the same data set (normal distribution with mean equal to 10 and variance equal to 3, n = 500). The middle line is the median; the edges of the box are the quartiles; the "paws" show 1.5 of the interquartile range, but not more than the edges of the sample range. Potential outliers are indicated too. Common displaying tools allow us to view potential dependencies of two data sets. For instance, in the left-hand diagram below, the coordinates are chosen as the values of two independent normal distributions with mean equal to 10 and variance equal to 3. In the right-hand illustration, the first coordinate is from the same data set, and the second coordinate is given by the formula y = 3x + 4. It is also perturbed with a small error. 10.1.9. Covariance matrix. Actually, the depencies be-jfj, tween several data sets associated to the same sta-.J i/j tistical units are at the core of our interest in many /^^SS^ real world problems. When definining the vari-" ance in 10.1.6 above, we employed the euclidean distance, i.e. we evaluated the scalar product of the values of the square of distances from the mean with itself. Thus, having two vectors of data sets, we may define 708 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.A.9. A car was traveling from Brno to Prague at 160 km/h, and then back from Prague to Brno at 120 km/h. What was its average speed? Solution. This is an example where one might think of using the arithmetic mean, which is incorrect. The arithmetic mean would be the correct result if the car spent the same period of time going at each speed. However, in this case, it traveled the same distance, not time, at each speed. Denoting by d the distance of Brno and Prague and by vp the average speed, we obtain d d _ 2d 160 + 120 _ 7' whence 160 ^ 120 Therefore, the average speed is the harmonic mean (see 10.1.3) of the two speeds. □ B. Visualization of multidimensional data The above examples were devoted to displaying one numerical characteristic measured for more objects (number of points obtained by individual students, for example). Graphical visualization of data helps us understand them better. However, how to depict the data if we measure p different characteristics, p > 3, of n objects. Such measurements cannot be displayed using graphs we have met. 10.B.1. One of the possible methods is the so-called principal component analysis. In this method, we use eigenvectors and eigenvalues (see 2.4.2) of the sample covariance matrix (see 10.2.35). We will use the following notation: • random vectors of the measurement ■^-i (xili X{2, • • • 5 xifj) » ^ 1, . . . , 72, • the mean of the j-th component mj = iT,?=ixij>J = i, • the sample variance of the j-th component • the vector of means m = (m1,..., rap), • the sample covariance matrix tt=t Er=i(x* - mXx* - m)T (note that each summand is a p-by-p matrix). The covariance matrix is symmetric, hence all its eigenvalues are real and its eigenvectors are pairwise orthogonal. Moreover, considering the eigenvectors of unit length, we Covariance and covariance matrix Consider two data sets x = (x1,..., xn), y = (y1,..., yn), and their means x, y. We define their covariance by the formula 1 " cov(a;,y) = - ^(xj - x)(yi -y). i=l If there are k sample sets a^1) = (x±\ ..., x1^), = (x[k\ ..., x^), then their covariance matrix is the symmetric matrix C = (c^) with ctj = cov(x<-t\x<-^). Again, the sample covariance and sample covariance matrix are denned by the same formulae with n replaced by («-!)■ _ Clearly the covariance matrix has got the variances of the individual data sets on its diagonal. In order to imagine what the covarinace should say, consider the two possible behaviours of two data sets: (a) they will deviate from their means in a very similar way (comparing individually x{ and y,), (b) they behave very independently. In first case, we should assume that the signs of the deviations will mostly coincide and thus the sum in the definition will lead to a quite big positive number. In the other case the signs should be rather independent and thus the positive and negative contributions should effectively cancel each other in the covariance sum. Thus we expect the data sets expressing independent features to be close to zero while the covariance of dependent sets should be far from zero. The sign of the covariance shows the character of the dependence. For example, the two sets of data depicted in the left hand diagram above had covariance bout -0.11, while the covariance of the data from the right hand picture was about 25.9. Similarly to the variance, we are often interested in normalized values. The correlation coefficient takes the covariance and divides it by the standard deviation of each of the data sets. In our two latter cases, the correlation coffeicients are about -0.01 and 0.99. As expected, they very clearly indicate which of the data are correlated. 10.1.10. Principal components analysis. If we deal with statistics involving many parameters and we need to decide quickly about their similarity (correlation) with some given patterns, we might use a simple idea from linear algebra. Assume we have got k data sets x^>. Since their covariance matrix C is symmetric, there is an orthonormal basis e in Rk such that in this basis the corresponding quadratic form given by C will enjoy a diagonal matrix. The relevant basis e consists of the real eigenvectors e{ e Kfe for the eigenvalues Ai. The bigger is the absolute value A{|, the bigger is the variation of the orthogonal projection x of all the k data sets into this one-dimensional subspace spaned by e{. Thus we may restrict ourselves to just this one data set x and consider the statistics concerning this one set as representing the multi-parametric data sets x^>. Similarly we may 709 CHAPTER 10. STATISTICS AND PROBABILITY THEORY can see that the eigenvalue corresponding to an eigenvector of the covariance matrix yields the variance of (the size of) the projection of the data onto this direction (the projection takes place in the p-dimensional space). The goal of this method is to find the direction (in the p-dimensional space of the measured characteristics) for which the variance of the projections is as great as possible. Thus, this direction corresponds to the eigenvector of the covariance matrix whose eigenvalue is the greatest one. The linear combination given by the components of this vector is called the first principal component. The size of the projection onto this direction estimates the data quite well (the principal component can be viewed as a characteristic which substitutes for the p characteristics, i. e., it is a random vector with n components). If we subtract this projection from the data and consider the direction of the greatest variance again, we get the second principal component. Repeating this procedure further, we obtain the other principal components. The directions of the principal components correspond to the eigenvectors of the covariance matrix in decreasing order with respect to the size of the corresponding eigenvalues. 10.B.2. Find the first principal component of the following simple data and the vector which substitutes them: Five people were taken their height, little finger length, and index finger length. The measured data are shown in the following table (in centimeters). Solution. Martin Michael Matthew John Peggy index f. 9 11 8 8 8 little f. 7.5 8 6.3 6 6.5 height 186 187 173 174 167 The vectors of the collected data are: xi = (9; 7.5; 186), x2 = (11; 8; 187), x3 = (8; 6; 173), x4 = (8; 6; 174), x5 = (8; 6.5,167). The covariance matrices of these vectors are: /0.04 0.14 \1.72 '0.641 0.640 ,3.521 0.14 1.72 \ /4.840 0.49 6.02 , 2.64 6.02 73.96/ \21.12 0.640 3.521\ /0.641 0.640 3.52 , 0.640 3.52 19.36/ V2.721 2.64 21.12X I. 44 11.52 II. 52 92.16/ 0.640 2.721> 0.640 2.72 2.72 11.56; '0.641 0.240 8.321 > 0.240 0.09 3.12 , 8.32 3.12 108.16; also use several biggest eigen-values instead of one and reduce the dimension of our parameter space in this way. Finally, considering the unit length eigenvector (qi, ..., ak) corresponding to the chosen eigenvalue A, then the values olj provide the right coefficients in the orthogonal projection (x^,.. . ,x(k~)) i-> x = aia^1) H-----h akx(k\ See the exercise 10.B.2 for an illustration, together with another description how to proceed with the data in 1 O.B.I. The latter approach is called the principal component analysis. 10.1.11. Entropy. We also need to describe the variability ,, of data sets even with nominal types, for instance in statistical physics or information theory. The only thing at disposal is the class frequencies, so the principle of classical probability can be used (see the fourth part of chapter one). There, the relative frequency of the i-th class, pi = ^i, is understood to be the probability that a random object belongs to this class. The variance of ratio-type values with class frequencies rij was given by the formula (see 10.1.6) where pj denotes the (classical) probability that the value is in the j-th class. Therefore, it is a weighted mean of the adjusted values where the weight of the term (a, — x)2 is pj. The variability of nominal values are expressed similarly (denote it by Hx) ■ Even though there are no numerical values aj for the indices j, we can be interested in functions F that depend on the relative frequencies pj. For a data set x we can define Hx = 5>^(r where F is an unknown function with some reasonable properties. If the data set has only one value, i.e. pk = 1 for some k and otherwise pj = 0, then we agree that the variability is zero, and so F(l) = 0. Moreover, Hx is required to have the following property: If a data set Z consists of pairs of values from data sets X and Y (for example, one can observe eye colour and hair colour of people - statistical units), it is reasonable that the variability of Z be the sum of the variabilities, that is, Hz = Hx + Hy. The relative class frequencies pi for the values of the data set X and qj for those of Y are known. The relative class frequencies for Z are then ninrij nra so we demand the equality (the ranges of the sums are clear from the context) ^PiqjFiPiQj) = ^PtF{pi) + ^qJF{qj). 710 CHAPTER 10. STATISTICS AND PROBABILITY THEORY The sample covariance matrix is then a quarter of their sum, i. e., / 1.70 1.075 9.35 \ S = 1.075 0.825 6.725 \9.35 6.725 76.30/ The eigenvalues of S are approximately 2.7, 312.2, and 0.38. The unit eigenvector corresponding to the greatest one is approximately (0.122; 0.09; 0.989). Thus, the first principal component is (185 5; 186.8; 172.4; 173.4; 166.5), which is not far from the people's heights. □ 10.B.3. The students of a class had the following marks in various subjects: Find the first principal component of the following simple data and the vector which substitutes them. Solution. The vectors of observation are xi = (1,1,2,2,1), xio = (2,3,1,2,1). The corresponding covariance matrices are: / 1.21 1.10 -0.330 -0.330 0.110 \ 1.10 1. -0.300 -0.300 0.100 -0.330 -0.300 0.0900 0.0900 -0.0300 -0.330 -0.300 0.0900 0.0900 -0.0300 \ 0.110 0.100 -0.0300 -0.0300 0.0100 / / 0.0100 -0.100 0.0701 -0.0300 0.0100 \ -0.100 1. -0.700 0.300 -0.100 0.0701 -0.700 0.490 -0.210 0.0701 -0.0300 0.300 -0.210 0.0900 -0.0300 \ 0.0100 -0.100 0.0701 -0.0300 0.0100 / The sample covariance matrix is / 0.99 0.44 -0.078 0.26 -0.01 \ 0.44 0.89 -0.22 0.22 -0.11 -0.078 -0.22 0.45 0.23 0.03 0.26 0.22 0.23 0.45 -0.078 \-0.01 -0.11 0.033 -0.0778 0.100 / Its dominant eigenvalue is about 13.68, and the corresponding unit eigenvector is approximately (0.70; 0.65;-0.13; 0.28;-0.07). Therefore, the principal Student id Maths Physics History English PE 1 1 1 2 2 1 2 1 3 1 1 1 3 2 1 1 1 1 4 2 2 2 2 1 5 1 1 3 2 1 6 2 1 2 1 7 3 3 2 2 1 8 3 2 1 1 1 9 4 3 2 3 1 10 2 3 1 2 1 Since pi and qj are relative frequencies, they sum to 1. So the right-hand side of the equality can be written as leading to J^PiqjFipiqj) = ^PtQjiFipi) + F(qj)). This is satisfied by any constant multiple of a logarithm of any fixed base a > 1. It can be shown that no other continuous solution F exists. Since pi < 1, lapi < 0. The variability must be non-negative, so F is chosen to be a logarithmic function multiplied by —1. Such a choice also satisfies F(l) = 0, as desired. Entropy The measure of variability of nominal values is expressed in terms of entropy. It is given by #x = -£-M-)> where k is the number of sample classes. Sometimes (especially in information theory), the binary logarithm is used instead of the natural logarithm. One often works with the quantity n ft (or with another logarithm base). In this form, for a data set X with k equal class frequencies, compute = k. which is independent of the sample size. The next illustration shows 2-based entropy y for the number of occurrences of ' letters a, & in 10-letter words consisting of these characters, and x is the number of occurrences of b. Note that the maximum entropy 1 occurs for the same number of a's and &'s, and indeed 21 = 2 as computed above. The following illustration displays the entropy of 11 randomly chosen strings of length 10 made of 8 characters. The values are all much less than the theoretical maximal value of 3. This reflects the fact that the number of occurences of the individual 8 characters cannot be equal (or it could happen with a very small probability if the length of the string was 8 711 CHAPTER 10. STATISTICS AND PROBABILITY THEORY componentis (1.58; 2.73; 2.13; 2.93; 1.45; 1.93; 4.28; 3.48; 5.26p6M)) If the same is done with, say, strings of length 10000, □ we would get very close to 3 (typically the difference would , , r . .. . „ .... be in the order of 10~3, if the random string generator was Another possible method of visualization of multidimen- , .. good enough). sional data is the so-called cluster analysis, but we will not go into further details here. ♦ C. Classical and conditional probability In the first chapter, we met the so-called classical probability, see 1.4.1. Just to recall it, let us try to solve the following (a bit more complicated) problem: 10.C.1. Ales wants to buy a new bike, which costs 5100 crowns. He has 2500 crowns left from organizing a camp. Ales is no dope: he took 50 more crowns from his pocket money and went to the casino to play the roulette. Ales always bets on red. This means that the probability of winning is 18/37 and the amount he wins is equal to the amount he has bet. His betting strategy is as follows: The first time, he bets 10 crowns. Each time he has lost, he bets twice the previous bet (if he does not have enough money to make this bet, he leaves the casino, deeply depressed). Each time he has won, he bets 10 crowns again. What is the probability that, using this strategy, he wins the desired 2550 more crowns? (As soon as this happens, he immediately runs to buy the bike.) Solution. First of all, we calculate how many times Ales can lose in a row. If he bets 10 crowns the first time, then in order to bet n times, he needs 10+20+- ■ ■+10-2™-1 = 10-1 2 10- 2™ - 1 2 - 1 10 As we can see, the number 2550 is of the form 10(2™ — 1), for n = 8. This means that Ales can bet eight times in a row regardless of the odds. He can never bet nine times in a row, because for that he would have to have 10(29 — 1) = 5110 crowns, which he will never reach (he stops betting as soon as he has 5100 crowns). Therefore, Ales loses the whole game if and only if he loses eight consecutive bets. The probability of losing one bet is 19/37; hence, the probability of losing eight consecutive (independent) bets is (19/37)8. Thus, the probability that he wins 10 crowns (using his strategy) is 1 — (19/37)8. In order to win 2550 crowns, he must win 255 times, and the probability of this is o \ 255 2. Probability Before further reading, the reader is advised to go through the fourth part of chapter one (the subsection beginning on page 18). Back then, we worked mainly with classical finite probability. We denned the basics of a formalism which we extend now. The main extension is that the sample space fl can be infinite, even uncountable. Recall that when we talked about geometric probability at the end of the fourth part of chapter one, the sample space for description of an event was a part of the Euclidean space, and events were suitable subsets of it. All of those sets were uncountable. Begin with a simple (infinite, yet still discrete) example, to which we return from time to time throughout this section. 10.2.1. Why infinite sets of events? Imagine an experiment where a coin is repeatedly tossed until it comes up heads. There are many questions to be asked about this experiment: What is the probability of tossing the coin at least 3 times? (or exactly times, or at most 10 times, etc.) The outcomes of this experiment can be considered in the forma;;; e N>i U {oo}, which could be read as "the coin comes up heads for the first time in the fc-th toss". Note that k = oo is inserted, since the possibility that the coin always comes up tails must be allowed, too. This problem is solved if the classical probability 1/2 of the coin coming up heads in one toss is used (and the same for tails). In the abstract model, the total number of tosses by any natural number N cannot be bounded. On the other hand, the probability that the coin always comes up tails in the first (k — 1) tosses out of the total number of n > k tosses is given by the fraction 2n- 2n = 2" where in the numerator, there is the number of favorable possibilities out of n independent tosses (i.e. the number of possibilities how to distribute two values into the n—k remaining positions), while in the denominator, there is the number of 712 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Therefore, the probability of winning using his strategy is much lower than if he bet everything on red straightaway. □ 10.C.2. You could try to solve a slight modification of the above problem: loe stops playing only if he loses all his money; if he still has some money, but not enough to bet twice the previous bet, he bets 10 dollars again. We also met the conditional probability in the first chapter, see 1.4.8. 10.C.3. Let A, B be two events such that B is a disjoint union of events B\, B2, ■ ■ ■, Bn. Using the definition of conditional probability (see 10.2.6), prove that (1) P(A\B) =^TlP(A\Bi)P(Bi\B) Solution. First, note that the events Af]Bu Af)B2,... ,Af)Bn are also disjoint. Therefore, we can write P{A\B1yj...yjBn) = P^B^---^ = V 1 n> P(BiU---UB„) _ P ((A n Bi) u (A n B2) u ■ ■ ■ u (A n Bn)) _ - pjB) - _ T.UPJA^B,) P(Bt) _ P(Bi) P(B) n = Y,P(A\Bi)P(Bi\B). □ 10.C.4. We have four bags with balls: In the first bag, there are four white balls. In the second bag, there are three white balls and one black ball. In the third bag, there are two white and two black balls. Finally, in the fourth bag, there are four black balls. We randomly pick a bag and take two balls out of it (without putting the first one back). Find the probability that a) the balls are of different colors; b) the second ball is white provided the first ball was white. Solution. Since there is the same number of balls in each of the bags, any ball has the same probability of being taken (similarly for any pair of balls lying in the same bag). Therefore, we can solve this problem using classical probability a) Altogether, there are 24 pairs of balls that can be taken. Out of them, 7 consist of balls of different colors. Therefore, the wanted probability is 7/24. all possible outcomes. As expected, this probability is independent of the chosen n, and there is the J2T=i ^~k = 1-Therefore, the probability of tossing only tails is zero. Thus we can define probability on the sample space fl with sample points (outcomes) ujk, whose probability is 2~k. This leads to a probability according to the definitions below. We return to this example throughout this section. 10.2.2. tr-fields. Work with a fixed non-empty set fl, which contains the possible outcomes of the experiment and which is called the sample space. The possible outcomes a; G fl are also called sample points. In probability models, not all subsets of outcomes need be admitted. In particular, the singletons {cu} need not be considered. Those subsets whose probability we want to measure are required to satisfy the axioms of the so called a-algebras. The axioms listed below are chosen from a larger collection of natural requirements in a minimal form. The first one is based on the assumption that the universal event should be a measurable set. The second one is forced by the assumption that events can be negated. The third one reflects the necessity to examine the event of the occurrence of at least one event from a countably infinite collection. (For instance, in the example from the previous subsection, the coin is tossed only finitely many times, but there is no upper bound on the number of tosses.). ct-algebras of subsets A collection A of subsets of the sample space is called a a-algebra or a-field and its elements are called events or measurable sets if and only if • fl G A, i.e., the sample space is an event; • if A, B G A, then A \ B G A, i.e., the set difference of two events is also an event; • if A{ G A, i G 7, is a countable collection of events, then their union is also an event, i.e., Uig/Aj G A. As usual, the basic axioms imply simple corollaries which describe further (intuitively required) properties in the form of mathematical theorems. The reader should check carefully that both following properties hold. • The complement Ac = fl \ A of an event A is again an event. • The intersection of two events is again an event since for any two subsets A, B c fl, A\(fl\B) = AnB. Actually, for any countable system of events A{, i G 7, the event fi \ uJG[/Ac = nJG/A2 is also in the tr-algebra A. Altogether, a tr-algebra is a collection of subsets of the sample space which is closed with respect to set differences, countable unions, and countable intersections. 713 CHAPTER 10. STATISTICS AND PROBABILITY THEORY b) Let A denote the event that the first ball is white and B denote the event that the second ball is white. Then, P(B OA) is the probability that both balls are white, and this is equal to 10/24 = 5/12 since there are 10 such pairs. Again, we can used classical probability to calculate P(A): there are 16 balls in total, and 9 of them are white. Altogether, we have P(B n A) _ ^ _ 20 ~ JL ~ 27' 16 z' P(B\A) = P{A) Another solution. The event A can be viewed as the union of three mutually exclusive events A\, A2, A3 that we took a white ball from the first, second, and third bag, respectively. Since there is the same number of balls in each of the bags, the probability of taking any (white) ball is also the same (independent of which ball it is), so we get P(A) = ^ and P(AX\A) = # = f. P(M\A) = § = ±, 10.2.3. Probability space. Now introduce probability in the mathematical model, recalling the concepts used already in the first chapter. Elementary concepts Use the following terminology in connection with events: • the entire sample space 17 is called the universal event; the empty set 0 £ A is called the null event; • the singletons cu £ 17 are called elementary events (note that {cu} may not even be an event in .4); • the intersection of events r\ieiAi corresponds to the si-multaneous occurrence of all the events A\,i £ I; • the union of events UieiA{ corresponds to the occurrence of at least one of the events A\, i £ I; • if AnB = 0, then A, B £ A are called exclusive events or disjoint events, • if A G B, then the event A implies the event B; • if A £ A, then the event B = 17 \ A is called the complementary event to A and denoted B = Ac. P(A3\A) = P(B\A) = Applying (5), we obtain We have seen an example of probability defined on an infinite sample space in 10.2.1 above. In general, probability P(B|yli)P(yli|yl) + P(B\A2)P(A2\A) + P(fyjtyyp^$fyAas follows: P(A) + P(B\A2 P(A2) P(A) + P(B\A3 P(A3 Probability P(P(AJ 4 2 9 + 3 3 12 9 + 39 20 27' □ 10.C.5. We have four bags with balls: In the first bag, there are four white balls. In the second bag, there are three white balls and one black ball. In the third bag, there are two white and two black balls. Finally, in the fourth bag, there are one white and three black balls. We randomly pick a bag and take a ball out of it, finding out that it is black. Then we throw away this bag, pick another one and take a ball out of it. What is the probability that it is white? Solution. Similarly as in the above exercise, let A denote the event that the very first ball is black. This event can be viewed as the union of mutually exclusive events A{, i = 2,3,4, where A{ is the event of picking the i-th bag and taking a black ball from there. Again, the probability of picking any (black) ball is the same. Hence, P(A2\A) = i, P(A3\A) = § = |, and P(A4\A) = § = \. Let B denote the event that the second ball is white. If the thrown bag is the second one, then there are a total of 7 white balls remaining, so the probability of taking one of them is P(B\A2) = ^ (we can use classical probability again because each of the bags contains the same number of balls, so any ball has the same probability of being taken). Similarly, P(B\A3) = j| Definition. A probability space is the c-algebra A of subsets of the sample space 17 on which there is a scalar function P : A —> K with the following properties: • P is non-negative, i.e., P(A) > 0 for all events A; • P is countably additive, i.e., P(uie)4,) = ^P(A), iei for every countable collection of mutually exclusive events; • the probability of the universal event is 1. The function P is called the probability function on (17, .4). Immediately from the definition, the complementary event satisfies P(AC) = \- P(A). In chapter one, theorems on addition of probabilities were derived. Although dealing with finite sample spaces, the arguments remain the same now. In particular, the inclusion and exclusion principle says for any finite collection of k events Ai that k k-1 k p(ui14J) = £ p(ao -EE p^n aj) i=l i=l j=i-\-l k-2 k-1 k + E E E PiAnA.nAe) i=ij=i+ie=j+i -----h + (-i)*-1P(a1n/l2n-n4). 714 CHAPTER 10. STATISTICS AND PROBABILITY THEORY and P(B\A4) = ^. Applying (5), we get that the wanted probability is P(B\A) = P(B\A2)P(A2\A) + P(B\A3)P(A3\A) + P(B\A4)P(A4\A) = 7_ 12 " 6 I I J_ . I I 6T12 3 ^ iL I 12 2 25 36' □ 10.C.6. We have four bags with balls: In the first bag, there are a white ball and a black ball. In the second bag, there are three white balls and one black ball. In the third bag, there are one white and two black balls. Finally, in the fourth bag, there are one white and three black balls. We randomly pick a bag and take a ball out of it, finding out that it is white. Then we throw away this bag, pick another one and take a ball out of it. What is the probability that it is white? Solution. Similarly as in the above exercise, we view the event A of the first ball being white as the union of four mutually exclusive events A\, A2, A3, and A4 that we take a white ball from the first, second, third, and fourth bag, respectively. The probability of taking a white ball out of the first bag is P(Ai) = \ ■ \ (the probability of A\ is the product of the probability that we pick the first bag and the probability that we take a white ball from there); simi- larly, P(A2 1 a 4 ' 4' p(A3) P(A. l l 4) ~ 4 ' 4- 11 24 Note P(A) = P(A1) + P(A2) + P(A3) + P(A4) that the probability P(A) cannot be calculated classically, i. e., by simply dividing the number of white balls by the total number of the balls, because, for instance, the probability of taking a white ball from the first bag is twice greater than from the fourth bag. As for the conditional probabilities, wehaveP(^i|A) = P(A1)/P(A) = P(A2\A) = ^, P(A3\A) = ^j,P(A4\A) = ^. Now, let B denote the event that we take another white ball after we have thrown away the first bag. We want to apply (5) again. It remains to compute P(B\Ai), i = 1,... ,4. The probability P(B\A1) can be computed as the sum of the probabilities of the mutually exclusive events B2, B3, B4 (given A{) that the second white ball comes from the second, third, fourth bag, respectively. Altogether, we have P(B\A1) = P(B2\A1)+P(B3\A1)+P(B4\A1 Similarly, P(B\A2) P(B\A3) 13 11 34 + 33 The reader should look back at 1.4.5 and think about the details. 10.2.4. Independent events. The definition of stochastically independent events also remains unchanged. It reflects the intuition that the probability of the simultaneous occurrence of independent events is equal to the product of the particular probabilities. Stochastic independence Events A, B are said to be stochastically independent if and only if P(A n B) = P(A)P(B). Of course, the universal event and the null event are stochastically independent of any event. Recall that replacing an event A{ with the complementary event A\ in a collection of stochastically independent events A\, A2, again results in a collection of stochastically independent events, and (see 1.4.7, page 23) P(A1 U ■ UAk) = l- P(A\ n ■ = i-(i-P(A1))...(i-P(Ak)). Classical finite probability remains the fundamental example of probability, used as the inspiration during creation of the mathematical model. Recall that in this case, fl is a finite set, the tr-algebra A is the collection of all subsets of fl, and the classical probability is the probability space (fl, A, P) with probability function P : A -> R, Pt*> = W This corresponds precisely to the intuition about the relative frequency pa of an event A when drawing a random element from the sample set fl. This definition of probability guarantees reasonable behaviour of monotone sequences of events: 10.2.5. Theorem. Consider a probability space (fl, A, P) and a non-decreasing sequence of events A\ C A2 C .... Then, p({Ja) = lim P(Ai). \ = 1 ' Similarly, if A\ D A2 D A3 D ..., then p(r\Ai) =lim p(^)- II I II I II 11,13,11 32 34 34 13 36' 1 2' 1 l PrQof. The considered union A = can be ^^yrTEteji in terms of mutually exclusive events Ai = Ai \ Ai—i, denned for all i = 2,3,.... Set A1=A1. Then, / OO \ OO k P(A) = P(U^ ) =£iW= lim J2P(Ai). 715 CHAPTER 10. STATISTICS AND PROBABILITY THEORY P(B\A4) = Altogether, we get 11 _i_ 13 , 11 32 + 34 + 33 19 36' For the finite sums, p(Ai) = p(^i) + E(p(^) - = P(B\A) = P(B\A1)P(A1\A) + P(B\A2)P(A2\A) + P(B 19 44 4 3 13 9 12 19 3 911 + 3622 + 211 + 3622 □ A^^^ji^p^^^yp^^ms proves the first part of the theorem. In the second part, consider the complements B{ = A': instead of the events A{. They satisfy the assumptions of the first part of this theorem. Then, the complement of the considered intersection is 10.C.7. Two shooters shoot at a target, each makes two shots. Their respective accuracies are 80 % and 60 %. We have found two hits in the target. What is the probability that they belong to the first shooter? Solution. The probability of hitting the target is 4/5 for the first shooter, and 3/5 for the second one. Consider the events: A... there are two hits in the target, both of the first shooter, B ... there are two hits in the target. Our task is to find P(B\A). We can divide the event B into six disjoint events according to which shot(s) of each shooter was/were successful. We enumerate the events in a table and, for each of them, we compute its probability. This is easy as each of the events is the intersection of four independent events (results of the four shots). A hits is denoted by 1, amiss by 0. Shooter 1 Shooter 2 probability 0 1 0 1 14 2 3 5 ' 5 ' 5 ' 5 B2 0 1 1 0 24 252 B-i 1 0 1 0 24 252 BA 1 0 0 1 24 252 B5 1 1 0 0 (14 252 B6 0 0 1 1 y 252 Adding up the probabilities of these disjoint events, we get: 6 P(B) = J2p(Bt) = 169/625. i=l Now, we can compute the conditional probability, using the formula of subsection 10.2.6: P(AnB) _ P(B5) _ H _ 64 P(B) P(B) if 164'U^8- □ P(A\B) = OO \ c OO B = A°=l()Ai) =ub, = 1 ' i=l The desired statement follows from the fact that P(A) = 1 - P(B) = 1 - lim P(Bt) = lim (l - P{Bt)) i—>-oo i—>-oo which completes the proof. □ 10.2.6. Conditional probability. Consider the following problem: On average, 40% of students succeed in course X and 80% of students succeed in course Y. If a random student is enrolled in both these courses saying that he has passed one of them (but we overhear which one), what is the probability that he has meant course XI As mentioned in subsection 1.4.8 (page 24), such problems can be formalized in the way described below. (We shall come back to the solution of the latter problem in 10.3.12.) Conditional probability Definition. Let H be an event with non-zero probability in the tr-algebra A of a probability space (17, A, P). The conditional probability P(A\H) of an events e A with respect to the hypothesis H is defined as P{Ar\H) P{A\H) = P(H) The definition corresponds to the intuition from the classical probability that the probability of events A and H occurring simultaneously, provided the event H has occurred, is P(AnH)/P(H). Directly from the definition, the hypothesis H and the event A are independent if and only if P(A) = P(A\H). At first sight, it may seem that introducing conditional probability does not add anything new. Actually, it is a very important type of approach which is needed in statistics as well. The hypothesis can be the a prior probability (i.e. the prior belief assumed beforehand), and the resulting probability is said to be posterior (i.e., it is considered to be a consequence of the assumption). This is the core of the Bayesian approach to statistics as is seen later. The definition also implies the following result. 716 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.C.8. We toss a coin. If it comes up heads, we put a white ball into an (initially empty) bag; otherwise we put a black ball there. This is repeated n times. Then, we take a ball randomly from the bag (without replacement). Suppose it is white. What is the probability that another ball we take randomly from the bag is black? Solution. We will solve the problem for a general (possibly biased) coin. In particular, we assume that the individual tosses are independent and that there exists a fixed probability of the coin coming up heads, which we denote p. The event "a ball in the bag is white" corresponds to the event "the coin came up heads in the corresponding toss". Since the first ball was white, we deduce that p > 0. We can also see that the probability space "taking a random ball from the bag" is isomorphic to the probability space "tossing a coin". Since we assume that the individual tosses are independent, we also get the independence of the colors of the selected balls. This leads to the conclusion that the probability in question is I—p. Is this reasoning correct? Do we not expect the probability of taking a black ball to be greater than 1 — pi See, there were approximately np white and n(l — p) black balls in the bag, so if we had removed one white ball, the probability of selecting a black one should increase, shouldn't it? Before reading further, try to figure out which (if any) of these two presented reasonings is correct, and whether the probability is also dependent on n (the number of balls in the bag before any were removed). Now, we select a more sophisticated approach to the problem. Let B{ denote the event "there were i white balls in the bag" (before any were removed), i e {0,1,2,..., n). Further, let A denote the event "the first ball is white" and C denote the event "the second ball is black". Actually, the event Bi says that the coin came up heads i times out of n; hence, its probability is Lemma. Let an event B be the union of mutually exclusive events B\, B2,. ■ -,Bn. Then, p{Bi) = [i:jf{i-pY-\ The conditional probability of taking a white ball provided there are exactly i white balls in the bag is equal to P{A\Bt) = -. n We are interested in the probability of C, knowing that A has occurred, i. e., we want to know P(C\A). Since the events Bi are pairwise disjoint, this is also true for the events C nBi. Since C can be decomposed as the disjoint union \J™=0(C n (1) P(A\B) = ^P(A\Bi)P(Bi\B) A n Br, are Proof. The events A n BltA n B2, also mutually exclusive. Therefore, P ((A n Bi) u (A n B2) u ■ ■ ■ u (A n Bn)) P(B) P(AnBj)P(Bj) = P(Bi) P(B) YJP(A\Bi)P(Bi\B). □ Consider the special case B = Q. Then, the events Bi can be considered the "possible states of the universe", P {A | Bi) expresses the probability of A provided the universe is in its i-th state, and P{Bi\fl) = P(Bi) is the probability of the universe being in its i-th state. By the above lemma, n P(A) = P{A\tt) = YJP(ABi)P{Bi). i=l This formula is called the law of total probability. 10.2.7. Bayes' theorem. Simple rearrangement of the conditional probability formula leads to P(A n B) = P(B nA) = P{A)P{B\A) = P{B)P{A\B). There are two important corollaries: Bayes' rules Theorem. The probabilities of events A and B satisfy P(A)P(B\A) P(B) P(A)P(B\A) (1) P(A\B) = (2) P(A\B) p(A}p(B\A} + p(Ac)P{B\Ac)' The first proposition is called the inverse probability formula. The second proposition is called the first Bayes' formula. Proof. The first statement is a mere rearrangement of the formula above the theorem. To obtain the second statement, note that P(B) = P(B nA) + P(B n AO-Applying the law of total probability, P(B) = P(A)P(B\A) + P(AC)P(B\AC) can be substituted into the inverse probability formula, thereby obtaining the second statement of the theorem. □ 717 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Bi), we can write ^)-p(U(CnB.^)-EP('C^))n-'). 1 " ^ > i=0 1 " = pJA) Y p(A n B«)p(c'\A n B«) = ^ ' i=0 n = plAjY P(Bt)P(A\Bt)P(C\A n Bt). We use the law of total probability and substitute for P(A), which leads to J2P(BAP(A\BAP(C\AnB, P{C\A) = (1) P{A) n Y,P(Bi)P(A\Bi)P(C\Ar\Bl i=0_ n E P{Bt)P{A\Bt) i=0 This formula is sometimes called the second Bayes' formula; it holds in general provided the space fl is a disjoint union of the events Bi. Since we tossed the coin at least once, we have n > 1. Now, we can calculate: Pi,Bi)P(A\Bi) = u J p^1 - p)n~l ■ -n ^(,-l)!(n-,)!Pl W j=0 n-1 i!(n-i-l)! PJ+1(1-P)T =pE("7>i(1-p)B-1" j=0 ^ ^ = p(p+ (1 1 =p, =£("V(i-p)B- j=0 n-1 2 71 — j E (71-2)! ^ (i-l)!(7i-i-l)! 71 71—1 =E-,i""o2)l.„p<+1(i-p)"-<-1 Bayes' rule is sometimes formulated in a somewhat more general form, proved similarly as in (2): Let the sample space fl be the union of mutually exclusive events Ai,... An. Then, for any i e {1,..., n}, P(B\Ai)P(Ai) (3) P(A\B) J2nk=1P(B\Ak)P(Ak) 10.2.8. Example and remarks. Now, the introductory question from 10.2.6 can be dealt with easily. Consider the event A which corresponds to "the student having passed an exam" and the event B which corresponds to "the exam in question concerning course X". Assume that the probabilities of the exam concerning either course are the same, i.e., P(B) = P(BC) = 0.5. While the wanted probability P(B\A) is unclear, the probability P(A\B) = 0.4 is given, as well as P(A\BC) = 0.8. This is a typical application of Bayes' formula 10.2.7(2). There is no need to calculate P(A) at all: P{B)P{A\B) p(B\A) = P(B)P(A\B) + P(BC)P(A\BC) 0.5 ■ 0.4 _ 1 ~ 3' i=0 i!(7i-2-i)r 0.5-0.4 + 0.5-0.8 In order to better understand the role of the prior probability hypothesis, here is another example. Consider a university using entrance exams with the following reliability: 99% of intelligent people pass them, while concerning non-intelligent people, only 0.5% are able to pass. It is desired to find the probability that a random student (accepted applicant) of the university is intelligent. Thus, let A be the event "a random person is intelligent" and B be the event "the person passed the exams successfully". Using Bayes' formula, the probability that A occurs provided B has occurred can be computed. It is only necessary to supply the general probability p = P(A) that a random applicant is intelligent. P(A\m = p-°M K 1 ' p ■ 0.99 +(l-p)- 0.005' The following table presents the result for various values of p. The first column corresponds to the case that every other applicant is intelligent, etc. p j 0.5 0.1 0.05 0.01 0.001 0.0001 P(A\B) I 0.99 0.96 0.91 0.67 017 0.02 Therefore, if every other applicant is intelligent, then 99% of the students are intelligent. If only 1% of the population meets an expectation of "intelligence" and the applicants form a good random sample, then only about two thirds of the students are intelligent, etc. Consider similar tests for the occurrence of a disease, say HIV. There may be a test with the same reliability as the one above and use it to test all students that are present at the university. In this case, assume that the parameter p is close to the one for the entire population (say 1 out of 10000 people is infected, on average), which corresponds to the last column 718 CHAPTER 10. STATISTICS AND PROBABILITY THEORY =p{i-p)52(n-2)pi{i-Pr-*-i = i=o \ ' = \p(l-p), n>l [0, n = l. Substituting this into the second Bayes' formula, we obtain the wanted probability P{C\A) = l°> n = 1' [1 -p, n > 1. Thus, the simple reasoning about the probability spaces being isomorphic led to the correct result. The second reasoning was wrong because it omitted the fact that since the first ball was white, the expected number of white balls in the bag (before removing the first one) was greater than np. The calculation highlights the singular case n = 1. □ 10.C.9. Once upon a time, there was a quiz where the first prize was Ferrari 599 GTB Fiorano. The contestant who won the final round was taken into a room where there were three identical doors. Behind two of them, there were goats, while the third one contained the car. In order to win the car, the contestant had to guess the correct door. First, the contestant pointed at one of the three doors. Then, an assistant opened one of the other two doors behind which there was a goat. Now, the contestant is given the option to change his guess. Should he do so? Solution. Of course, we assume that the contestant wants to win the car. First of all, try to examine your intuition for random events. For example, you can reason as follows: "One of the two remaining doors contains the car, each with the same probability. Therefore, it does not matter which door we choose." Or: "The probability of choosing the correct door at the beginning is |. The shown goat changes nothing, so the probability that the guess is wrong is |. Therefore, we should change the door, thereby winning by |." Apparently, it is wise to change the door only if the probability of the car being behind that door is greater than behind the initially chosen one. We consider the following events: H stands for "the initial guess is correct", A stands for "we have changed the door", and C for "we have won". We are thus interested in the probabilities P(C\A) and P(C\AC). First, we choose one of three doors, and the Ferrari is behind one of them, so of the table above. Clearly the result of the test is catastrophi-cally unreliable. Only about 2% of the students who are tested positive are really infected! Note that the problem with both tests is the same one. It is clear that real entrance exams require good selectivity and reliability. So the university marketing must ensure that the actual applicants do not provide a good random sample of population. Perhaps the university should try to discourage "non-intelligent" people from applying and thus secure a sufficiently low number of such applicants. With diseases, even the very rare occurrence of healthy people tested positively can be devastating. If the test is improved so that it is 100% reliable for positive people, it would have almost no impact on the resulting probabilities in the table. Thus, if a person is tested positive when diagnosing a rare disease, it is necessary to make further tests. Then, the result P(A\B) of the first test plays the role of the prior probability P(A) during the second test, etc. This approach allows one to "cumulate the experience". 10.2.9. Borel sets. In practice, the probability of events tJT j-p.^ which are expressed by questioning whether ffi$m'! some numerical quantity falls into a given in-terval is of interested. We illustrate this on the example dealing with the results of students in a given course, measured for instance by the number of points in a written exam(cf. 10.1.1). On one hand, there is only a finite number of students, and there are only a finite number of possible results (say, the numbers of points in the written exam can be the integers 0 through 20). On the other hand, imagining the results of the students as an analogy to independent rolls of a regular die is inappropriate. Even if a regular 21-hedron would exist (it cannot, see chapter 13); that would be somewhat weird. Thus it is better to focus on the assessing function X : fl —> R in the sample space fl of all students and model the probability that its value falls into a fixed interval when a random student is picked. For instance, if the table transferring points into marks A through F is fixed, the probability that the student obtained an A or a B can be modeled. In the case of a reasonable course, we should expect that the most probable results are somewhere in the middle of the "interval of success", while the ideal result of the full number of points is not very probable. Similarly, if many values of X lie in the interval of failure, this may be at most universities perceived as a significant failure of the lecturer. This is a typical example of the random variables or random vectors, as denned below (it depends whether the result of just one or several students is chosen randomly). One way to proceed is to model the behaviour of X as probability denned for all intervals. This requires the following tr-algebra:1 In this connection, we also talk about the c-algebra of Borel-measurable sets on Rk, and then the following definition says that random variables are Borel-measurable functions. 719 CHAPTER 10. STATISTICS AND PROBABILITY THEORY We assume that the event of changing the door is independent of the original guess, hence P(A\H) = P(A\HC) = P(A), P(AC\H) = P(AC\HC) = If the original guess is correct and it is changed, then we surely lose; while if it is originally wrong and then it is changed, then we surely win. Therefore, we have P(C\Ar\H) = Q = P(C\AcnHc), P(C\AcnH) = 1 = P{C\AC\HC). It follows from the second Bayes' formula (1) that P{C\A) = borel sets The Borel sets in R are all those subsets that can be obtained A°JP intervals using complements, countable unions, and ^countable intersections. More generally, on the sample space fl = Rfe, one considers the smallest tr-algebra B which contains all fc-dimensional intervals. The sets in B are called the Borel sets on Rfe. 10.2.10. Random variables. The probabilities of the individual intervals in the Borel algebra are usually given as follows. Consider a numerical quantity X on any sample space, that is, a function X : fl —> R. Since it is desired to work with the = P(H)P(A\H)P(C\A nH)+ P(HC)P(A\HC)P(C\A n #c ^qbahility of X taking on values from any fixed interval, the P{A) probability space and the properties of the function X have —P(HC) — - to a^ow 3 Notice that working with finite probability spaces where and, analogously, all subsets are events, every function X : fl —> R is a random variable in the following sense. P{C\AC) = _P(H)P(AC\H)P(C\AC n H) + P(HC)P(AC\HC)P(C\AC nlT) =P(H) = 1 We have thus obtained P(C\A) > P(C\AC), which means that it is wise to change the door. Note that the solution is based upon the assumption that the assistant deliberately opens a door behind which there is a goat. If the contestant believes it was an accident or if instead, say, he happens to see (or hear) a goat behind one of the two not chosen doors, then the first reasoning is correct and the Random variables and vectors Definition. A random variable X on a probability space (fl, A, P) is a function X : fl —> R such that the inverse image X~x (B) lies in A for every Borel set B e BonR. The real-valued function PX(B) = P(X~1 (B)) denned on all intervals B c R is called the (probability) distribution of a random variable X. A random vector X = (X1,..., Xk) on (fl, A, P) is a fc-tuple of random variables X{ : fl —> R denned on the same probability space (fl, A, P). probability remains to be \. □ 10.C.10. We have two bags. The first one contains two white and two black balls, while the second one contains one white and two black balls. We randomly select one of the bags and take two balls out of it (without replacement). What is the probability that the second ball is black provided the first one is white? O D. What is probability? First of all, recall the geometric probability, which was introduced in ??. 10.D.1. Buffon's needle. A plane is covered with parallel lines, creating bands of width /. Then, a needle of length / is thrown onto the plane. What is the probability that the needle crosses one of the lines? If intervals I\,..., Ik in R are chosen, then the probability of simultaneous occurrence of all of the k events X{ e h must exist. Thus, as in the scalar case, there is a real-valued function denned on the fc-dimensional intervals B = Ji x ■ ■ ■ x Ik, PX(B) = P(X~! (B)) (and thus also for all Borel sets B c Rfe). It is called the probability distribution of the random vector X. 10.2.11. Distribution function. The distribution of random variables is usually given by a rule which shows how the probability grows as the interval B is extended. In particular, consider the intervals I with endpoints a, b, —oo < a < b < oo. Denote P(a < X < b) the probability of X lying in I = (a, b), or P(X < b) if a = —oo; and analogously for other types of intervals. In the special case of a singleton, write P(X = a). In the case of a random vector X = (X1,..., Xk), write P(ai < Xi < bi,..., ak < Xk < bk) for the probability of simultaneous occurrence of the events where the values of Xi fall into the corresponding intervals (which may also be closed, unbounded, etc.). 720 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Solution. The position of the needle is given by two independent parameters: the distance d of the needle's center from the closest line (d e [0,1/2]) and the angle a (a e [0, tt/2]) between the lines and the needle's direction. The needle crosses one of the lines if and only if 1/2 sin a > d. The space of all events (a, d) is a rectangle tt/2 x 1/2. The favorable events (a,d) (i. e. those for which //2sina > d) correspond to those points in the rectangle which lie under the curve 1/2 sin a (q being the variable of the x-axis). By 6.2.21, the area of the figure is f / / - sin a da = —. o 2 2 Thus, the wanted probability is (see ??) 2 tt □ The following (known) problem, which also deals with geometric probability, illustrates that we must be cautious about what is assumed to be "clear". 10.D.2. Bertrand's paradox. What is the probability that a random chord of a given circle is longer than the side of an equilateral triangle inscribed into the circle? Solution. We will show three ways how to find "this" probability. 1) Every chord is determined by its center. Thus, a random choice of the chord is given by a random choice of the center. The chord is greater than the side of the inscribed equilateral triangle if and only if its center lies inside the concentric circle with half radius. The center is chosen "randomly" from the whole inside of the circle. Therefore, the probability that it will lie in the inner disc is given by the ratio of the areas of these discs, which is \. 2) Unlike above, we claim that the wanted probability does not change if the direction of the chord is fixed. Then, the centers of such chords lie on a fixed diameter of the circle. The favorable centers are those which lie inside the inner circle (see 1)), i. e., inside a fixed diameter of the inner circle. The ratio of the diameters is 1 : 2, hence the wanted probability is \. 3) Now, we observe that a chord is determined by its end-points (which must lie on the circle). Let us fix one of the endpoints (call it A)-thanks to the apparent symmetry, this should not affect the resulting probability. Then, the chord satisfies the given condition if and only if the other endpoint Distribution function Definition. The distribution function or cumulative distribution function of a random variable X is the function Fx ■ R -> [0,1] defined for all x e R by Fx(x) = P(X K defined for all vectors x = (xx,...,xk) eRfeby Fx(x) = P(X1 [0,1] has the following properties: (1) F is a non-decreasing function; (2) F has both side-limits at every point x G R, yet these limits may differ; (3) F is left-continuous; (4) at the infinite points, the limits of F are lim F(x) = 1, lim F(x) = 0; x—Yea x—Y — oo (5) the probability of X taking on the value x is given by P(X = x)= lim F(y) - F(x). y^YX+ (6) The distribution function of a random variable always has only countably many points of discontinuity. Proof. The proof consists of quite simple and straightforward calculations. In particular, note that the events a < X < b and X < a are exclusive, so P(a < X < b) = P(X < b) - P(X 0 which converges to 0, and consider the events An given by X < x — rn. The union of these events is exactly the event A given by X < x. Of course, the event A does not depend on the choice of the sequence rn. By the first proposition of 10.2.5, P(A) = lim P(An). In literature, the definition with the non-strict inequality F{x) — p(X < x) is often met. In this case, the probability p(X = x) is also included in Fx{x). Then, the distribution function has similar properties as those in 10.2.12, only it is right-continuous instead of left-continuous, etc. 721 CHAPTER 10. STATISTICS AND PROBABILITY THEORY lies on the shorter arc BC, where ABC is the inscribed equilateral triangle. However, the length of this arc is one third of the length of the entire circle, which means that the wanted probability is equal to |. How is it possible that we came to three different probabilities? It is caused by a hidden ambiguity in the statement of the problem. It is necessary to specify what exactly it means to choose a chord "randomly". Each of the three results is correct provided the chord is chosen in the corresponding way. However, these ways are not equivalent; this is apparent not only from the different results, but also from the distribution of the chords' centers. In the first case, they are distributed uniformly throughout the inside of the circle. In the second and third cases, the centers are concentrated more towards the center of the circle. □ 10.D.3. Two envelopes. There are two envelopes, each contains a certain amount of money. We know that the amount in one of them is twice as great as in the other one. We can choose either of the envelopes (and take its contents). As soon as we choose one, we are allowed to change our mind and take the other envelope instead. Is it advantageous to do so? Solution. At the first sight, it must not matter which envelope we choose. The probability of choosing the one which contains more is 1/2, so it is no good to change our choice. However, consider the following reasoning: the envelope we have chosen contains a. Therefore, the other one contains a/2 or 2a, each with probability 1/2. This means that if we change the envelope, then we get a/2 with probability 1/2 and 2a with probability 1/2, i. e., the expected outcome is 1 a 1 5 ---1—2a = -a. 2 2 2 4 Therefore, it is wise to change the envelope. What is wrong with this reasoning? There are several issues. Mainly, it is not generally true that if there is amount a in one of the envelopes, then the second one contains a/2 with probability 1/2. This depends on the initial distribution of the amounts that have been put into the envelopes, which is not precisely stated in the problem. However, the paradox is rooted not only in the concealed a priori distribution. There are (even discrete) distributions for which the choice of changing the envelope always produces greater expected outcome than that of not changing it. Nevertheless, any distribution with this property must have The distribution function is non-decreasing, and thus the left-sided limit equals the supremum. Thus, the left-sided limit Fx at x exists and equals p(A). This proves one half of proposition (2) as well as all of proposition (3). Similarly, the above sequence rn can be used to define the events An by Xn < x + rn. This time, it is a non-increasing sequence Ai D A2 D ..., and its intersection is the event X < x. By the second property of 10.2.5, p(A) = lim p(An) = p(X < x), n—too which verifies that the right-sided limit of F at x exists. At the same time, property (5) is proved. The limit values of property (4) can be derived similarly by applying theorem 10.2.5, as shown for the one-sided limits above. In the first case, use the events An given by X < rn, for an arbitrary increasing sequence rn —> oo. Their union is the universal event fl. In the second case, use the events An given by X < rn, for any decreasing sequence rn —> — oo, and their intersection is the null event. It remains to prove the last statement. As already shown, the discontinuity points of the distribution function are exactly those values x which the random variable has with nonzero probability, i.e., p(X = x) =^ 0. Now, let Mn denote the set of points x for which p(X = x) > i. Clearly, the set M of all discontinuity points equals the union of the sets Mn: M = U$JL2- Since the sum of probabilities of mutually exclusive events cannot exceed 1, Mn can contain no more than n—l elements. Thus, M is a countable union of finite sets, thus it is countable. □ 10.2.13. Probability measure. The probability that a random variable has a value lying in an arbitrarily chosen interval can be computed purely from the knowledge of its distribution function. The distribution function Fx thus defines the entire probability distribution of the random variable X. How a particular random variable X is denned can be ignored. X can be viewed directly as a probability definition on the tr-algebra of all the Borel sets in R. In this sense, every function F : R —> R satisfying the first four properties of the latter theorem is a distribution function of a unique random variable. Check the properties of the probability function denned on all intervals this way! The probability obtained in this way is also called a probability measure on R. Similarly one deals with probability measures on the algebra of Borel sets in Rfe in terms of the distribution functions of the random vectors. In this sense, a random variable or random vector can be considered without any explicit link to a probability space (n,a, p). 10.2.14. Discrete random variables. Random variables behave substantially differently according to whether the non-zero probability is "concentrated in isolated points" or it is "continuously distributed" along (a part of) the real axis. 722 CHAPTER 10. STATISTICS AND PROBABILITY THEORY infinite expected value (if the expectation is finite, then there is always a value which, when seen in the envelope, it is more advantageous to keep), so it is dubious to say that it is better to get "greater" infinity on average. □ E. Random variables, density, distribution function 10.E.1. Consider rolling a die. The set of sample points is fl = {cu1, ... where u>i means that we have rolled number i. Further, consider the tr-field A = {0, {wuw2}, {a;3,a;4,a;5,a;6}, fl}. Find whether the mapping X : fl —> R denned by i) X(cji) = i for each i e {1,2,3,4,5, 6}, ii) X(m) = X(cu2) = -2,X(cu3) = X(cu4) = X(cu5) = = X(uj&) = 3 is a random variable with respect to A. Solution. First of all, we should make sure that the set A really satisfies all axioms of 10.2.2, i. e., that it is a well-defined tr-field. Then, by definition 10.2.10, a random variable is any function X : fl —> R such that the preimage of every Borel-measurable set B c R lies in A. As for the first case, consider the interval [2,3]. Since X"1 ([2,3]) = {a^,^} £ A, we can see that the function X is not a random variable. In the second case, we can easily see that X is a random variable: Consider any interval in R. Then, exactly one of the four following occurs: 1) If the interval contains neither —2 nor 3, then the preimage in X is the empty set. 2) If it contains —2 but not 3, then the preimage is {lji , o;2 }. 3) On the other hand, if it contains 3 but not —2, then the preimage is {cu3, u>4, cj5 , cjG }. 4) Finally, if it contains both these numbers, then the preimage is the whole sample space fl. In each case, the preimage lies in the tr-field A. □ 10.E.2. Consider a tr-field (fl,A), where fl = {oji,1^2,^3,^4,^5} and A = {0, {a^,^}, {u3}, {uA,u5}, {u^^u?,}, {u)i,u)2,^4,^5}, {cu3,o;4,cu5}, fl}. Find a mapping X : fl —> R, as general as possible, which is a random variable with respect to A. Solution. Since the events uii, u>2 do not occur individually in A, the random variable X must map them to the same number, i. e. X(lji) = X(ui2) = a for an a e R. For the same reason, we must have X(o;4) = X(ui5) = & for a b e R. If an interval contains both a and b, then its preimage Discrete random variables If a random variable X assumes only finitely many values xi, x2, ■ ■ ■, xn e R or countably infinitely many values x\, X2, • • •, it is called a discrete random variable. One can define its probability mass function f(x) by /(*) = (P(X 1° otherwise. Since the probability is countably additive and the singleton events X = x{ are mutually exclusive, the sum of all values f(xi) is given by either a finite sum or an absolutely convergent series £/M = 1- The probability distribution of a random variable X satisfies P(X e B) = £ f(xA. In particular, the distribution function is of the form Fx(t)=52f(Xi). x, 6. The graph looks as follows: □ 10.E.4. An archer keeps shooting at a target until he hits. He has 4 arrows at his disposal. In each attempt, the probability that he hits the target is 0.6. Let X be the random variable which gives the number of unused arrows. Find the probability mass function and the distribution function of X and draw their graphs. Solution. Clearly, the probability of k consecutive misses followed by a hit is equal to 0Ak ■ 0.6. Therefore, jxix) = P(X = x) = 0.43-x ■ 0.6 for x G {1,2,3}. If the archer misses three times, then there will be no arrow left at the end, no matter whether he hits the last time or not. Thus, fx(0) = P(X = 0) = 0.43. Note that the distribution function F{x) of a continuous random variable X is always differentiable. Its derivative is the density function of X, i.e., F'(x) = j(x). 10.2.16. The general case. Of course, there are also random jjfi i, variables with mixed behaviour, where some part of the probability is distributed continuously, while there are values that are taken on with non-zero probability. This means, the probability measure of some singletons x G K is non-zero and still X is not a discrete random variable. For instance, consider a chaotic lecturer who remains to stand at his laptop with probability p throughout the entire lecture, but once he decides to move, he happens to be at any position in front of the lecture room with equal probability. Then, the random variable which corresponds to his position (assume that the desk with the laptop is at position 0 and the lecture room is bounded by the values ±1) has the following distribution function: F(t) 0 iff < -1 iff G (-1,0) >+±^(i + l) iff G [0,1) 1 iff > 1. The distribution function of all such variables can be expressed directly using the Riemann-Stieltjes integral F(t) = f J — cx. f(x)d(g(x)), developed in subsection 6.3.15 (page 431). In the example above, choose j(x) = 1 and 9(x) for x < — 1 for -1 < x < 0 for 0 < x < 1 for x > 1. This corresponds again to the idea, that the distribution function is equivalent to a probability measure. Thus the measure of any interval is given by integrating its indicator function with respect to this measure. This is what the Riemann-Stieltjes integral achieves. The Riemann integral corresponds to the choice g(x) = x. One could add only the jump p at x = 0 (i.e. g(x) = x for x < 0, while g (x) = x + p otherwise) and leave the constant density to / (x), which would be nonzero only on [—1,1]. This corresponds to splitting the probability measure into its discrete part (hidden in g) and continuous part (expressed by the probability density). Notice that any distribution function can have only count-ably many points of discontinuity. 10.2.17. Basic discrete distributions. The requirements on the properties of probability distributions of random variables are based on the modeled situations. Here is a list of the simplest discrete distributions. 724 CHAPTER 10. STATISTICS AND PROBABILITY THEORY By the definition of the distribution function (see 10.2.11), we have Fx(x) = P(X 3. Degenerate distribution The distribution which corresponds to a constant random variable X = \i is called the degenerate distribution Dg(/i). Its distribution function Fx and probability mass function fx are given by Fx(t) = t < >i t > jj, fx(t) = t = [i otherwise. The graphs of the probability mass function and the distribution function are as follows: 10.E.5. Bodoiry graTi Prom7 proti Promfi TafcmftaT I0w*30c Tabulhal 10v'30c □ The distribution function of a random variable X is 'o for x < 3 1 Fx{x) - i) Justify that Fx is indeed a distribution function. for 3 < x < 6 for 6 < x. Here follows a description of an experiment with two possible outcomes called success and failure. If the probability of success is p, then the probability of failure must be 1 — p. It is convenient to take the values 0 and 1 for the two possible results. Bernoulli distribution The distribution of a random variable X which is 0 (failure) with probability q = 1 — p and 1 (success) with probability p is called the Bernoulli distribution A(p). Its distribution function Fx and probability mass function fx are given by Fxit) = fxit) = Further, consider a random variable X which corresponds to n independent experiments described by the Bernoulli distribution, where X measures the number of successes. Clearly the probability mass function is non-zero exactly at the integers t = 0,..., n, which correspond to the total number of successes in the experiments (the order does not matter). The probability that t successes are encountered in t chosen experiments out of n is p*(l — It is necessary to sum all the (") possibilities. This leads the the binomial distribution of X: Binomial distribution The binomial distribution Bi(n,p) has probability mass function fx(t) _Ut)pt{i-PT o *€{0,l,...,n} otherwise. The illustration shows the probability mass functions for Bi(50,0.2), and Bi(50,0.9). The distribution of the probability corresponds to the intuition that most outcomes occur near the value np: 725 CHAPTER 10. STATISTICS AND PROBABILITY THEORY ii) Find the density of the random variable X. iii) Compute P{2 < X < 4). Solution, a) Clearly, Fx is continuous and non-decreasing. Moreover, we have lim F(x) = 0 and lim F(x) = 1, as x—Y — oo x—Yoa needed. b) By 10.2.14, the density of a continuous random variable is the derivative of its distribution function. We can see that on the interval (3,6), the density is equal to j(x) = i, while on the intervals (—oo, 3) and (6, oo), it is equal to zero. Therefore, the variable X has uniform distribution, see 10.2.20. c) We have from the definition of the distribution function that P(2 R given by j(x) = for x e R, where a is a parameter. Suppose that / is the density of X. Find i) the value of a, ii) the distribution function of X, iii) P(-l < X < 1). Solution, a) If the function / is to be a probability density, then its integral over R must be equal to one. This yields the condition a 1 ;dx = a [arctg x] = an. ,1 + x2 Hence a = -. b) By 10.2.14, the distribution function is given by the following integral: fx 1 fx dt 1 1 Fx(x)= f(t)dt = - —-j = -arctgx+-. J-oo I" J-oo 1 +t I" 2 c) By b) and the definition of the distribution function, we have P(-l 3X). Solution, a) By 10.2.22, the marginal distribution of the random variable X is obtained by summing up the joint probability mass function over all possible values of Y in each row. Similarly, the marginal distribution of Y is obtained by summing up the entries in each column. Thus, we get the following: X 1 2 3 fx Y 20 3 20 1 2 and Y 2 5 6 fv 3 5 i 5 i 5 b) The joint distribution function is at point (a, 6) equal to the sum of all values of the joint probability mass function f(x,y) such that X < a and Y < b. This corresponds to values of the subtable whose lower-right corner is (a, 6). Precisely, the joint distribution function F(x,y) looks as follows: F(x,y) [2,5) [5,6) [6,oo) [1,2) i 5 3 10 Y 20 [2,3) 3 10 y 20 1 2 [3,oo) 3 5 4 5 1 and on intervals (-co, 1) xl and R x (-co, 2), F( clearly zero. c) Apparently, P(Y > 3X) = P(X = 1, Y = 5) + P(X = 1 y = 6) = = — + — = — □ ' u/ 10 T 2(1 20 10.E.8. Find the probability P(2X > Y) provided the density of the random vector (X, Y) is given by: g (4a; - y) for 1 < x < 2, 2 < y < 4, otherwise. f(x,Y)(x,y) = < ö Solution. By definition, we have fOO fix /oo /*zx / f{x,Y)(x,y)dydx -oo ^ —oo oo ^ —oo 2 f2x 6 (4ai — y)dydx = 1 3^12^ dai : .4 2 - -x + - \ dx = 1 1 3 2 2 -2J--21 + -2 3 3 3 □ Observe the distribution of the molecules. Then, the behaviour of Xn as the number n of boxes as well as the number rn of objects increases so that their ratio rn/n = A remains constant is of interest. In other words, every box is to contain (approximately) the same number A of elements, on average. We are interested in the asymptotic behaviour of the variables Xn as n —> oo. Letting lim^oo rn/n = A, the standard procedure (with details to be added - take it as a challenge to recall the methods from the analysis of univariate functions!) leads to: lim P(Xn = k)= lim (rA (n'1r)'"""fc = lim ^(r„-l)...(r + / _r " n-s-oo (n — l)ft k\ \ n xk iim (i+iiy°= ¥l iim (i+-x k\ n- A^ k\ k\ m since the functions (1 +2/n)n converge uniformly to the function ex on every bounded interval in R. poisson distribution The Poisson distribution Po(A) describes the random variables with probability mass function fx(k) Of course, fc=0 keN otherwise. Y — ^ k\ -A+A As seen above, this discrete distribution Po(A) with an arbitrary A > 0 (distributed into infinitely many points) is a good approximation of the binomial distributions Bi(n, A/n), for large values of n. 10.2.19. Two examples. Besides the physical model men-tioned above, such a behaviour can be encoun-'^T'jy*"' tered when observing occurrences of events in a space with constant expected density in a unit volume. Observing bacteria under a microscope, when the bacteria are expected to occur in any part of the image with the same probability, provides an example. If the "mean density of occurrence" in a unit area is A and the whole region is divided into n identical parts, then the occurrence of k events in a fixed part is modeled by a random variable X with the Poisson distribution. When diagnosing in practice, such an observation allows us to compute the total number of bacteria with a relatively good accuracy from the actual numbers in only several randomly chosen samples. 727 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.E.9. Find the marginal distribution function and the joint and marginal density of the random vector (X, Y) provided Ft (XX 0 (z,y) = { \x2y2 for x < 0, y < 0 forO < x < 1,0 < y < 2 for a; > l,y > 2 Solution. The density of the random vector (X, Y) is obtained by differentiation with respect to x a y. Thus, for 0 < x < 1, 0 < y < 2, we have f(xx){x,y) ~ xy< elsewhere the density is zero. The marginal density of the random variable X is then /oo f2 \ f(xx)(x>y)dy = / zydy = hxy2]o =2a;-■oo JO Z Similarly, for Y, we get jy (y) = \y. The marginal distribution functions are and Fx(x) = £00fx(t)dt = ß2tdt = . Fy{y) = fl«, iv{i)dt = J* \tdt = \y2. □ 10.E.10. In a bag, there are 14 ballsy red, 5 white, and 5 blue ones. We randomly take 6 balls out of the bag (without replacement). Find the distribution of the random vector (X, Y) where X stands for the number of red balls taken and Y for the number of white balls. In addition, find the marginal distributions of X and Y. Then, compute P(X < 3), P(l < Y < 4). Solution. The value of the probability mass function at point (x, y) is denned as the probability P(X = x, Y = y), i. e. the probability of taking x red balls and y white balls. The number of ways how to take x red balls is (4); for y white balls, it is (f); and the remaining 6—x—y blue balls can be se- lected in I /4\ (5\ K6-x-y) ways. Altogether, there are [xJ yyJ y6_x_yJ possibilities. The values of this expression for all x, y are in the following table. x\y 0 1 2 3 4 5 0 0 5 50 100 50 5 210 1 4 100 400 400 100 4 1008 2 30 300 600 300 30 0 1260 3 40 200 200 40 0 0 480 4 10 25 10 0 0 0 45 £y 84 630 1260 840 180 9 3003 The values in the last column and row are the sums over all values of y and x, respectively. Then, the values of the probability mass function are obtained after dividing by the number of all possibilities how to take the 6 balls, i. e. (g4) = 3003. The second example is more challenging. We describe events which occur randomly at time t > 0. Here, the probability of an occurrence in the following small time period of length h does not depend on what had happened before and equals the same value hX for a fixed A > 0. At the same time, the probability that the event occurs more than once in a given time period is small. Let Xt denote the random variable which corresponds to the number of occurrences of the examined event in the interval [0, t). The requirements are expressed infinitesimally. We want: • the probability of exactly one event in each time period of length h equals hX + a(h), where the function a(h) satisfies lim/l_>.o+ = 0; • the probability /3(h) of more than one event occurring in a time period of length h to satisfy limiI^o+ ^fp- = 0; • the events Xt = j and Xt+h —Xt = to be independent for all j, k e N and t, h > 0. Use the notation Pkif) = P(Xt = k), k e N, and set the initial conditions po(0) = 1 andpfc(O) = 0 for k > 0. Compute directly Po(t + h)= p0(t)P(Xt+h -Xt = 0) = = p0(t)(l-h\-a(h)- (3(h)) and similarly, Pk(t + h) = P(Xt = k,Xt+h -Xt = 0) + P(Xt = k-l,Xt+h-Xt = l) + P(Xt 0+) Po(t + h) -po(t) h Pk(t + h) -pk(t) h -Xp0(t) + ^o(h) = -Apfc(i) + \pk. .!(<) + -O(ft). Letting h —> 0+, an (infinite!) system of ordinary differential equations is obtained: p'0(t) = -Xpo(t), po(0) = l p'k(t) = -XPk(t) + Apfc_i(t), pfc(0) = 0 for all t > 0 and k e N, with an initial condition. The first equation has a unique solution p0(t) =e"At, 728 CHAPTER 10. STATISTICS AND PROBABILITY THEORY The marginal distributions of X and Y correspond to the last which can be immediately substituted into the second equa- column and row, respectively. ti°n- Th*s leads to The probability P(X < 3) can be calculated easily from the Pi (t) = Xte xt. marginal distribution of X: A trivial induction argument shows that the system has a P(X<3)=FX(S) 1 3003 (210+1008+1260+480) = 0.985 unique solution Similarly, for the probability P(l < Y < 4), we have P(l < Y < 4) = FY(i) - FY(l) = = ^^(630 + 1260 + 840 + 180) = 0.969. □ Pk(t) = (Xtf k\ t>0, keN. 10.E.11. The density of a random vector (X, Y, Z) is It is thus verified that for every process which satisfies the three properties above, the random variable Xt which corresponds to the number of occurrences in the time period [0, t) has distribution Po(Ai). In practice, these processes are connected with the failure rate of machines. 10.2.20. Continuous distributions. The simplest example c(x + y + z) for0<2. 0 otherwise. Find the value of the parameter c as well as the distribution function of the vector, and compute P(0b, i t > b. 2' — 2' V2' 2' 2) ~ 16' 10.E.12. Find the value of the parameter a so that the func tion /(*) 0 for s 0. Moreover, assume that p is differentiable andp(0) = 1. Then, ]np(t + s) = lnp(f) + lnp(s). Letting s -> 0+ (and applying l'Hospital's rule), \np(t + s) - \np(t) (m(p))'(t) lim S-S-0+ lim S-S-0+ (lnp(S))' _ p'(0) :p'(0). ln(a;) da; Altogether, poo / /(*)= xln(x)— I ldx = xln(x)— x = x(ln(x) — l). a\n(x) = a[a;(ln(a;)-l)]2 = a(21n(2)-l), 1 p(0) ( Note: A > 0, and p'(0) cannot be Thus,p'(0) = -X e positive as p(0) = 1). Then, p(t) satisfies lnp(f) = -Xt + C. The initial condition leads to the only solution p(t) = e~xt. 729 CHAPTER 10. STATISTICS AND PROBABILITY THEORY so a ■ l 2 ln(2)-l 1 □ 10.E.13. A child has become lost in a forest whose shape is that of a regular hexagon. Suppose that the probability that the child happens to be in a given part of the forest is directly proportional to the size of that part, but independent of its position in the forest. • What is the probability distribution of the distance of the child from a given side (extended to a straight line) of the forest? • What is the probability distribution of the distance of the child from the closest side of the forest? Solution. • Let a be the length of the sides of the hexagon (forest). Then, the probability distribution satisfies Now, consider the random variable X which corresponds to a (random) moment when the event occurs for the first time. Apparently, the distribution function of X is given by /(*) 0 -=-T 4- 2 9a2 X + 3V3a for x < 0 for 0 < x < i \^3a -gfs-z + -j=7 for ^VSa < x < \^3a 0 V3a 2 for x > \^3a as for the first question. First, let us compute the distribution function F of the wanted random variable X that corresponds to the distance of the child from the closest side. The distance can be anywhere in the interval I = (0, ^a). Then, for y £ I, we have F(y) = P[X 4(#a-j/ Thus, the density, being the derivative of the distribution function, satisfies: 0 /(*) = for x < 0 s-^=y± forye<0,^a} 3a2 fory > □ 10.E.14. Let a random variable X have uniform distribution on an interval (0, r). Find the distribution function and probability density of the volume of the ball whose radius is equal to X. Fx(t) = 1 - p(t) = 1-e 0 t < 0. This function has the desired properties: It has values between zero and one, it is increasing and it has the required behaviour at ±oo. The density of this random variable can be obtained by differentiation of the distribution function. Exponential distribution The distribution corresponding to the continuous random variable X with density 'Ae"A* t>0 fx(t) = 0 t < 0. is called the exponential distribution ex(A). The exponential distribution belongs to the more general family of important distributions with the densities of the form c/"1 e~bx for x > 0, with given constants a > 0, b > 0, while the constant c is to be computed. The following expression is required to equal one: ex"-1 e-bx dx = / c(-) e-t-dt = —r(a). Jo Jo W b ba r is the famous transcendental function providing the analytic extension of the factorial function, discussed in 6.2.17 on the page 410. 4(^a-y)2 Gamma distribution The distribution whose density is zero for x < 0, while for x > 0. It is given by /(*) TTÖ) called the gamma distribution T(a, b) with parameters a > 0, b> 0. Thus, the exponential distribution is the special case of this one for the value a = 1. 10.2.21. Normal distribution. Recall the binomial distribution. If the success rate p is left constant, but the number n of experiments is increased, the probability mass function keeps its shape (although the scale changes). As n increases, the values of the probability mass function merges into a curve that should correspond to the density of a continuous distribution which is a good approximation for Bi(n, p) for large values of n. Recall the smooth function y = e~x I2, mentioned in subsection 6.1.5 (page 376) as an appropriate tool for the construction of functions which are smooth but not analytic. The 730 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Solution. First, we find the distribution function F (for 0 < F(d) = . Altogether, -ttX3 < d , 3d = P X < \ — 3 V 47T F{x)=l ^ 3 47rr3 1 for x < 0 for 0 < x < |7rr3 for x > |7rr3 Differentiating this, we obtain the density: {0 for s<0 0 „-1 , x 3 for 0 < x < f 7rr3 for x > |7rr3 □ 10.E.15. Find the value(s) of the parameter a e R so that the function C 0 for i<0 f(x) = I ax2 for 0 < x < 3 { 0 for i>3 defines the probability density of a random variable X. Then, find its distribution function, probability density, and the expected value of the volume of the cube whose edge-length has probability density determined by /. Thus, the distribution function of 1+3 Solution. Simply, a = the random variable X is Fx (t) = ^t3 for t e (0, 3), zero for smaller values of t, and one for greater. Let Z = X3 denote the random variable corresponding to the volume of the considered cube. It lies in the interval (0,27). Thus, for t e (0,27) and the distribution function Fz of the random variable Z, we can write Fz(t) = P[Z < t] = P[X3 3 defines the probability density of a random variable X. Then, find its distribution function, probability density, and the expected value of the area of the square whose side-length has probability density determined by /. illustration compares this curve (in the right hand part) to the values ofBi(40, 0.5). This suggests looking for a convenient continuous distribution whose density would be given by a suitably adjusted variation of this function. The function e~x I2 is everywhere positive, so it suffices to compute e~x I2 dx. If this results in a finite value, just multiply the function by its reciprocal value. Unfortunately, this integral cannot be computed in terms of elementary functions. Luckily, multidimensional integration and Fu-bini's theorem can be used. Transform to polar coordinates, to obtain e ■ 2 OO p-2-k ^x+y^2dxdy e-(>-2)/2 rdrdQ lo Jo = 2tt (cf. the notes at the end of subsection 8.2.5, verify that the integrated function satisfies the conditions given there, and compute that thoroughly!). Hence the integral results in V^tt, so the function j(x) = -j= e~x I2 is a well-defined density of a random variable. Normal distribution The distribution of the random variable Z with density tp(z) 1 -z2/2 2tt is called the (standard) normal distribution N(0,1). The corresponding distribution function 1 ~x '2dx cannot be expressed in terms of elementary functions. It is called the Gaussian function and the graph of ip(x) is often called the Gaussian curve. So far, the correct density which approximates the binomial distribution is not found. The diagram that compares the probability function of the binomial distribution to the Gaussian curve shows that the position of the maximum must be moved as well as an application of shrinkage or stretch to the curve horizontally. The first goal is easily reached by constant 731 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Solution. We proceed similarly as in the previous example. Again, we can easily find that a = |. Thus, the distribution function of the random variable X is Fx(t) = \t2 for t e (0,3), zero for smaller values of t, and one for greater. Let Z = X3 denote the random variable corresponding to the area of the considered square. It lies in the interval (0,9). Thus, for t e (0,9) and the distribution function Fz of the random variable Z, we can write Fz(t) = P[Z < t] = pyX2 2 defines the probability density of a random variable X. Then, find its distribution function, probability density, and the expected value of the volume of the cube whose edge-length has probability density determined by /. O 10.E.18. We randomly cut a line segment of length / into two pieces. Find the distribution function and the density of the area of the rectangle whose side-lengths are equal to the obtained pieces. Solution. Let us compute the distribution function: Let X denote the random variable with uniform distribution on the interval (0, /), which corresponds to the length of one of the pieces (then, the length of the other piece isl — X). The area S = x(l—x) of the rectangle, for a; e (0, /), can lie any where in the interval (0, Z2/4). Setting d e (0, Z2/4), we can write F(d) = P[S 0 does the rest. Thus, there are two real parameters p and a > 0 and the density function is of the form: S^)=e-(^)2/(^). Simple variable substitution leads to e-(x-M)V(2CT2) dx = Thus there is an entire two-parametric class of densities of random variables. The corresponding distributions are denoted by N(/i,cr2). We return to the asymptotic closeness of the normal and binomial distributions for n —> oo after creating suitable tools. The following illustration reveals, how well this works. The discrete values correspond to Bi(40,0.5), while the curve depicts the density of N(20,10). 10.2.22. Distributions of random vectors. As for the scalar random variables, one defines the distribution functions and the density or the probability * •' mass function for continuous and discrete random vectors. There are joint probability mass functions and densities. For two discrete random variables, i.e. a discrete vector (X, Y) of random variables, define their (joint) probability mass function f(x,y) P(X = Xi AY = yf) x = Xi,y = yj 0 otherwise. / -VI2- 4d l + ^P~ A random vector (X, Y) is called continuous, if its distribution function is defined as for continuous random variables. This means, for all a, b e K, F(a,b) = P(X ľ The density is obtained by differentiation: 0 for i<0 0 for 0 < x < l- for x > ľ 10.E.19. Nezávislé náhodné veličiny X a Y mají následující hustoty pravděpodobnosti: ÍO forí<0, ÍO forí<0, 1 for 0 < t < 1, /y (í) = l 2t for 0 < t < 1, 0 forl<í, [o forl<í. Určete distribuční funkci náhodné veličiny udávající obsah obdélníka o stranách X a Y. Solution. í 0 for t < 0 FY(t) = \ 2t-t2 for 0 < t < 1 1 for K t □ 10.E.20. Let X, Y be independent random variables, where X has uniform distribution on the interval (0,2) and Y is given by its density function: C 0 for i<0 f(x) = I 2x for 0 < x < 1 { 0 for i>1. Find the probability that Y is less than X2. Solution. Since X and Y are independent random variables, the joint density f(x; of the variable (X, Y) is given by the densities fx and fy of the individual random variables. Thus, we have f(X,Y)(u,v) = 'fx(u) ■ Jy(v) = \-2v = v for (u,v) G (0, 2) x (0,1 0 otherwise. Then, the wanted probability P is the integral of the density f(x,Y) over the part O of the plane where Y < X2: P = J J f(x,Y) dxdy = l-JJ f(X,Y) dacly = O K2\0 3 5' For a general continuous random vector X (X1,... ,Xn), define F (ax,..., an) = P{Xi < ai,..., I„ < a„) = = 1 - / / y dy dx = 0 Jx and similarly for discrete random vectors with more components. A random vector (X, Y) with both X and Y continuous is not always a continuous vector in the above sense. For ex- □ ample, taking a continuous variable X, the random vector (X, 2X) is neither continuous nor discrete, since the entire probability mass is concentrated along the line y = 2x in the plane, but not in individual points. The marginal distribution for one of the variables can be obtained by summation or integration over the others. For instance, in the case of a discrete random vector (X, Y), the events (X = x{, Y = yf) for all possible values x{ and yj with non-zero probabilities for X and Y, respectively, form an exhaustive collection of events for the vector (X,Y). Thus OO P(X = xi) = Y/P(X = xi,Y = yj), j=i which relates the marginal probability distribution of the random variable X to the joint probability distribution of the random vector (X, Y). In the case of continuous random vectors, proceed similarly using integrals instead of sums. 10.2.23. Stochastic independence. It is known from subsection 10.2.3 what (in)dependence means for events. Random variables X1, ■ ■ ■ , Xn are (stochastically) independent if and only if for any a{ G K, the events X1 < a1, ..., Xn < an are independent. In view of the definition of the distribution function F of the random vector (X1,..., Xn), this is equivalent to F(x1, ...,xn)= Fx^xi) ■ ■ -Fx„{xn), where Fx, are the distribution functions of the individual components. It follows that the events corresponding to Xk G Ik for arbitrarily chosen intervals Ik is also independent. The probability of X1 G [a, b) and simultaneously X{ G (—oo,c{) for the other com-'ponents is F(b,c2,... ,cn) - F(a,c2,... ,cn) = (FXl(b) - FXl{a))Fx2{^)---FXn{cn), and so on. The densities and probability mass functions behave well too: Proposition. For any random vector (Xi,..., Xn), the following two conditions are equivalent: • The random variables X\..., Xn are stochastically independent. • The joint distribution function F of the of random vector (Xi,..., Xn) is the product of the marginal distribution □ functions Fx, of the individual components. 733 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.E.21. Let X, Y be independent random variables, where X has density function C 0 for s<0 = I 2x for 0 < x < 1 { 0 for x > 1, and F has density function C 0 for x < 0 /2(a;) = } I for 0 < x < 2 [ 0 for x > 2. Find the probability that F is greater than X2. O Solution. f(x,y)(u,v) = ™. for («,«) £ (0,1) x (0,2), S(x,y) (u, v) = 0 otherwise. Then, the wanted probability P = Jo f^xydydx = ^i. □ 10.E.22. Let X, F be independent random variables, where X has density function C 0 for i<0 f(x) =1 ?f for 0 < x < 3 { 0 for i>1, and F has density function C 0 for s<0 f(x) = 1 § for 0 < x < 2 [ 0 for x>2. Find the probability that F is greater than X3. Solution. p = J2p(i-p)* (l-p)(2-p) k=0 hence the variance is var X = E(X2) — (E X) 2 _ l^P P2 ■ □ 10.F.3. A random variable X is defined by its density fx(x) = for x e (1, oo) and fx(x) = 0 elsewhere. Find its distribution function, expected value, and variance. Solution. By the definition of the distribution function, we have, for x e (1, oo), Fx(x) = t4 dt = = 1 - 1 The expected value of X is equal to EX = -dx = 11 X'' and the expected value of X2 is 3 ~2^ e(x2; Therefore, varX = 3 - (|)2 = §. ,oo 3 3' oo / —rdx = Jl X2 X 1 □ (±1,±1) is zero, while the marginal distribution functions are non-zero for the values \x\ < 1 and \y\ < 1. Expressing this random vector in polar coordinates era r-1([7) are open for continuous functions ip and open sets U, the continuous function ip always satisfies the condition. 735 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.F.4. A random variable X is denned by its density fx(x) = cos a; for x G (0, |) and fx(x) = 0 elsewhere. Find its expected value, variance, and median. Solution. Using the definition and integration by parts, we get EX = x cos xdx = \x sin a; + cos x]Z = — — 1. Using double integration by parts, we obtain E(X2 x2 cos xdx = I x sin x + 2x cos x — 2 sin a; J Q2 = ^ -2, so the variance is equal to var X = (f )2 — 2 — (| — l)2 = 7r— 3. By definition, the distribution function is equal to Fx (x) = f* cos tdt = sin a;, and the median is _F_1(0.5) = |. □ 10.F.5. A random variable X is defined by its density fx(x) = \e~Xx for x > 0, and fx(x) = 0 elsewhere (the so-called exponential distribution; A > 0 is a fixed parameter). Find its expected value, variance, mode (the real number where the density reaches its maximum), and median. Solution. Using the definition and integration by parts, we get EX = E(X2) = xXe Xxdx xz\e~Xxdx = -Xx -x2e~Xx - 2x\e A Xx l if -Xx l Ä' A2 2 Ä2' hence varX = E(X2) - (EX)2 = Since F'x(x) = —A2e~Ax < < 0, the density keeps decreasing. Therefore, its maximum is at zero. By definition, we have F(x) = / Ae" Jo *dt = 1 - e~AX, so the median is equal to F-1 (0.5) = — j ln(^) In 2 A 1 □ 10.F.6. The joint probability mass function of a discrete random vector (Xi, X2) is defined by 7r(0, —1) = c, ir(l, 0) = 7r(l,l) = 7r(2,1) = 2c, 7r(2,0) = 3c and zero elsewhere. Find the parameter c and compute the covariance cov(X1,X2). Solution. If 7r is to be a probability mass function, then the sum of its values over the entire domain must be equal to 1, i. e., E ^{h J) = c + 3.2c + 3c = 10c = 1, In the sequel, we restrict ourselves to this case. Functions of random variables and vectors For a continuous function t/j: R —> R and a random variable X, there is also the random variable Y = i/j(X). Y is said to be a function of the random variable X. In the case of a random vector (Xi,..., Xn) and a continuous function i/j : R™ —> R, we talk about a function Y = ip(Xi,..., Xn) of the random vector. It is useful to know whether independent random variables remain independent after transformations. The answer and its verification are simple: Proposition. Consider two independent random variables X and Y and two functions g and h such that U = g(X), V = h(X) are random variables. Then U and V are independent, too. Proof. For fixed reals u, v, write Au = {x; g(x) < u} Bv = {y; h(y) < v}. Then the joint distribution function for the vector (U, V) is Fuy{u, v) = P(U (x) = y)= E f(x*)- Thus, in the case of the affine dependency Y = a + bX, the probability mass function is non-zero exactly at the points yi = axi + b. As an example of a function of a random vector X = (Xi,..., Xn), consider the sum of n independent random variables with the Bernoulli distribution X{ ~ A(p). Of course, this leads just to the binomial distribution Bi(n,p). The above formula for f^(x) (y) reveals the already known probability function for Y = X\ + ■ ■ ■ + Xn. Only y e {0,..., n) can be reached. Collect all the possibilities of summing y ones, when each of them appears with probability Pv(i -p)n~y. Similarly, proceed with continuous random variables. The two parameter family Y = fi + aZ is met already, where 736 CHAPTER 10. STATISTICS AND PROBABILITY THEORY so c = yq. The probability mass function ir1 of X1 is given by the sum of the joint function over all possible values of X2, i. e., 7ri(j) = ^2jTr(i,j). Thus, we have 7ri(0) = c, 7Ti(l) = 4c, 7Ti(2) = 5c and zero elsewhere. Similarly, for the probability mass function ir2 of X2, we get 7r2(—1) = c, 7r2(0) = 5c, 7r2(l) = 4c and zero elsewhere. Hence, Eli = J2t i7n(i) = 14c = 1.4 and EX2 = >2(j) = = 3c = 0.3. By the definition of the covariance, we have cov(X1,X2) = J2(i - 1, 4)(j - 0, 3)n(i,j) = 0.18. □ 10.F.7. In many scientific fields, the behavior of a random variable which is bounded onto an interval is modeled using the so-called beta-distribution. That is a continuous distribution is denned by its density on the interval [0,1]: 1 fx(x) = Hi- B(q,/3) where a, j3 are fixed parameters, chosen suitably for description of the given random variable, and B(q, 0) is a normalizing constant, guaranteeing that the integral of fx (x) over [0,1] is equal to zero. Find its a) mode, b) expected value, and c) variance. Solution, a) By definition, the mode is the value where fx (x) reaches its maximum. Thus, let us look at its stationary points. We can easily calculate that the equation f'x (x) = 0 is equivalent to (a - 1)(1 -x) -x(/3- 1) = 0, Since fx (0) = /*(!) ■ and this is satisfied for x — Q+(3_2 • 0 and the function is positive, it must be the wanted maximum. b) By definition, we have 1 EX = »(1 -xf-xdx. B(q,/3) jo Integrating by parts, we get 1 f1 EX =---—— [xa(l-x)%+——— / x^tl-x) B(q,/3)/3L ^ ' Jo B(q,/3)/3 J0 y > Clearly, the first term is equal to zero. Refining the second one, we obtain ri EI=„, "„,„ / x"'1^ -xf^dx- B(q,/3)/3 Jo va{\- xf^dx. B(q,/3)/3 Jo Now, the integral in the first term is, thanks to the normalization, equal to B (a, 0), and the second integral is the expected Z ~ N(0,1) in 10.2.21. This is verified easily. FY(y) = P(Y (y)) Applying the rule for transformation of coordinates, we can compute the expected value of Y as /oo /*oo yfv(y)dy= ip(x)fx(x)dx, ■oo J — oo ans similarly for the variance of Y. is the random variable corresponding to the won amount, it seems that the correct answer is "anything below the expected value EX'. As derived in 10.2.1, P(T = k) = 2~k, provided that the coin is fair. Sum up all the probabilities multiplied by 2k, to obtain J2T 1 = oo. Therefore, the expected value does not exist. So it seems that it is advantageous for the gambler to play even if the initial amount is very high... Simulate the game for a while, to obtain that the amount won is somewhere around 24. The reason is that no one is able to play infinitely long, hence the extremely high amounts are not feasible enough to be won, so such amounts cannot be taken seriously. In decision theory, these cases (when the expected value does not directly correspond to the evaluated utility) are called St. Petersburg paradox, and much literature has been devoted to this topic.3 10.2.29. Properties of the expected value. In the case of simple distributions, compute the expected value directly from the definition. For instance, for the Bernoulli distribution A(p), it is immediate that EX = (1 -p) ■ 0+p- 1 =p. Similarly, compute the expected value np of the binomial distribution Bi(n,p). This requires more thought. The result is a direct corollary of the following general theorem since Bi(n, p) is the sum of n random variables with the Bernoulli distributions A(p). For any random variables X, Y, real constants a, b, consider the expected values of the functions of random variables X + Y and a + bX, provided the expected values E X and E Y exist. It follows directly from the definition that the constant random variable a has expected value a. Further, E(bX) = bEX, since the constant b can be factored out from the sums or integrals. More generally, the expected value of the product of independent random variables X and Y can be computed as follows. Suppose the components of the vector (X, Y) are discrete and independent, with probability mass functions fx(xi), fY(yj). Then, E(XY) ='E52xiyjfx(xi)fY(yj) i j Similarly, verify the equality E(XY) = EX EY for independent continuous random variables. Going back to Bernoulli, 1738, the real value is given by the utility, rather than the price. 739 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.G.1. Consider a random variable X with density f(x). Find the density of the random variable Y denned by i) Y = ex,x > 0, ii) Y = VX, x>0, iii) Y = \nX,x > 0, iv) Y = ±,x > 0. Solution. We can simply apply the formula for the density of a transformed random variable, which yields a) /y (y) = /(lny)i, b) fy(y) = 2f(y2)y, c) fy(y) = /(e»)e», d) 10.G.2. Consider a random variable X which has uniform distribution on the interval (—■§, f )■ Find the density of X as well as the densities of transformed variables Y = sinX,Z = tgX. Solution. Since the length of the interval where X is nonzero is TT, the density of X is fx(x) = ^ for x e (—§, f) and zero elsewhere. Applying the formula for the density of a transformed random variable and the derivatives of elementary functions, we get fy(y) = /x(arcsin(y))arcsin'(y) = and fz(y) = /x(arctan(;z)) arctan'(y) □ 1 1 T(i + y2)' 10.G.3. Consider a random variable X whose density is cos a; for x e (0, -|) and zero elsewhere. Find the density of the random variable Y = X2 and calculate E Y and var Y. Solution. Applying the formula for the density of a transformed random variable, we get My) = fx(Vy)(Vy)' = j^cosx- It is simpler to compute the expected value and variance of Y directly from the density of X. We have EY = .C00x2fx(x)dx.Thus, E Y x cosxdx = \x sinx + 2xcosx — 2sinx\ The integral was computed by parts. Applying this method again, we obtain E(Y 2^ — ' xA cos xdx o = \(x4 - I2x2 + 24) sin a; + 4(a;3 - 6x) < Now compute E(X + Y) for arbitrary random variables. For discrete distributions of X and Y, E(X + Y) = ^2 ^(a:2 + y/)P(X = xu Y = y/) i j i ^ j j ^ i = E xtP(X = Xi) + Y^yjP(Y = Vj), i 3 where absolute convergence of the first double sum follows from the triangle inequality and the absolute convergence of the sums that stand for the expected values of the particular random variables. Absolute convergence is used in order to interchange the sums. Dealing with continuous variables X and Y, whose expected values exist, proceed analogously. /oo poo / (x + y)fx,Y(x,y)dxdy oo /*oo /*oo /*oo x / fx,Y(x,y)dydx+ y fx,v(x,y) dydx ■ OO J — OO J — ooj—og /OO /*oo xfx(x)dx+ yfy(y)dy = EX + EY, -oo J — oo where absolute convergence of integrals of the expected values E X and E Y is used to interchange the integrals by Fu-bini's theorem. Altogether, the expected formula: E(X + Y) = EI + EY is obtained, whenever the expected values E X and E Y exist. Straightforward application of this result leads to the following: Affine nature of expected values For any constants a,b1,...,bk and random variables X\,..., Xk, E(a + biXi H-----h bkXk) = a + bx EXx H-----h bkEXk. The following theorem extends this behaviour with respect to affine transformations of random vectors, and shows that the expected value is invariant with respect to affine transformations, as is the arithmetic mean: T4ieoj;em. Let X = (Xx,..., Xn) be a random vector with expected value EX, a G Rm, B G Mat„jn(R) a martix. Then, E(a + B-X) = a + B-EX. Proof. There is almost nothing remaining to be proved. Since the expected value of a vector is defined as the vector of the expected values of the components, it suffices to restrict attention to a single item in E(a + B ■ X). Thus, it can be assumed that a is a scalar and B is a matrix with a single row. 740 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Hence, E(Y2) = (f)4 - 12(f)2 + 24, so varF = £L _ 3^2 + 24 - ^4-i6^2+64 = 20 - 2tt2. 16 16 □ 10.G.4. Let X be a random variable which takes on values 0 and 1, each with probability \. Similarly, let Y take on the values —1 and 1, each with probability \. Show that the random variables X aZ = XY are uncorrelated, yet dependent. Give an example of two continuous random variables with this property. Solution. First of all, we compute the expected values of our random variables: EX = 0-|+l-| = |,EZ = E(XY) = 0 ■ \ + (—1) ■ \ + 1 ■ \ = = 0. As for the expected value of their product, we have E(XZ) = E(X2 Y) == 1 ■ \ + (—1) ■ \ = 0. By theorem 10.2.33, the covariance is equal to cov(X, Z) = 0 - \ ■ 0 = 0. Thus, the variables X and Y are uncorrelated. At the same time, the conditional probability P(Z = 1\X = 0) is clearly zero, i. e., we also have P(Z = 1,X = 0) = 0, while P(Z = 1) = ± and P(X = 0) = \, so P(Z = 1) ■ P(X = 0) = | ^ 0. We can see that P(Z = 1) ■ P(X = 0) ^ P(Z = 1, X = 0), which means that X and Z are dependent. It can be easily verified from the corresponding definitions that if X is any random variable with zero expected value, finite second moment and zero third moment, then X and Y = X2 are dependent, but uncorrelated. □ H. Inequalities and limit theorems Markov's inequality provides a rough estimate of the behavior of a non-negative random variable if we know nothing more than its expected value. In exact words, for any non-negative random variable X and any a > 0, it holds that P(X > a) < 10.H.1. Consider a non-negative random variable X with expected value fi. With no further information about X, bound P(X > 3/i). Then, compute P(X > 3/i) if you know thatX-Ex(i). Solution. If the non-negative random variable X does not take zero with probability 1, then its expected value fi is positive. Therefore, the wanted probability can be bounded using Markov's inequality as Then, the expected value of a finite sum of random variables is obtained, and by the above results, that exists and is given as the sum of the expected values of the individual items. This is exactly what is wanted to be proved. □ 10.2.30. Quantiles and critical values. Introduce numeri-jksu. cal characteristics that are analogous to those from descriptive statistics. There, the next use-'"$^jpV% characteristics are the quantiles, cf. 10.1.5. 4ferS^-J— Consider a random variable X whose distribution function Fx is strictly monotone. This is satisfied by any random variable whose density is nowhere equal to zero, which is the case for the normal distribution, for example. In this case, define the quantile function F^1 simply as the inverse function (-Fx)-1 : (0,1) -> K. This means that the value y = F~1(a) is such that P(X < y) = a. This corresponds precisely to the quantiles from descriptive statistics using relative frequencies for the probabilities. Quantile function For any random variable X with distribution function Fx (x), define its quantile function F~1{a) = mi{x e R; F(x) > a}, a e (0,1). Clearly, this is a generalization of the previous definition in the case the distribution function is strictly monotone. As seen in descriptive statistics, the most used quantiles are for a = 0.5 (the median), a = 0.25 (the first quartile), a = 0.75 (the third quartile). Similarly for deciles and percentiles when a is equal to (integer) multiples of tenths and hundredths, respectively. It follows directly from the definition that the quantile function for a given random variable X allows the determination of intervals into which the values of X fall with a chosen probability. For instance, the value (0.975), approximately 1.96, corresponds to percentile 97.5 for the normal distribution N(0,1). This says that with the probability of 2.5 %, the value of such a random variable Z ~ N(0,1) is at least 1.96. Since the density of the variable Z is symmetric with respect to the origin, this observation can be interpreted as that there is only a 5% probability that the value of \Z\ is greater 1.96. There are similar intervals and values when discussing the reliability of estimates of characteristics of random variables. Critical values For a random variable X and a real number 0 < a < 1, define its critical value x(a) at level a as P(X > x{a)) = a. This means that x(a) = F^(l — a) where F^1 is the quantile function of the random variable X. 741 CHAPTER 10. STATISTICS AND PROBABILITY THEORY If we know that X ~ Ex(±), then P(X > 3/i) = 1 - P(X < 3/i) = 1 - F(3n), where F is the distribution function of the exponential distribution. By definition, this is r i F{x) = / -e ~dt ■ Jo I1 Hence, P(X > 3/i) = 4r. -e e I = 1 — e ^ . Jo □ P(X < 60) = 1 - P(X > 60) > 1 10.H.2. At a particular place, the average speed of wind is 20 kilometers per hour. • Regardless of the distribution of the speed as a random variable, bound the probability that in a given observation, the speed does not exceed 60 km/h. • Find the interval in which the speed lies with probability at least 0.9 if you know that the standard deviation is a = 1 km/h. Solution. Let X denote the random variable that corresponds to the speed. In the first case, we can only use Markov's inequality, leading to 20 _ 2 60 ~ 3' In the second case, we know the variance (or standard deviation) of the speed, so we can use Chebyshev's inequality (see 10.2.32): 0.9 < P(\X - 201 < x) = 1 - P(\X - 201 > x) < 1 - -4. xl Hence, x > v^lO « 3.2. Thus, the wanted interval is (16.8 km/h, 23.2 km/h). □ 10.H.3. Each yogurt of an undisclosed company contains a photo of one of 26 ice-hockey world champions. Suppose the players are distributed uniformly at random. How many yogurts must Vera buy if she wants the probability of getting at least 5 photos of Jaromir Jagr to be at least 0.95? Solution. Let X denote the random variable that corresponds to the number of obtained photos of Jaromir Jagr (parametrized by the number n of yogurts bought). Clearly, X ~ Bi(n, i). We are looking for the value of n for which P(X > 5) = 0.95, i. e., Fx (A) = P(X < 4) = 0.05. In order to find it, we use the de Moivre-Laplace theorem and approximate the binomial distribution with the normal distribution (we assume that n is large, so the approximation error will be small). By F, the expected value of X is EX = and its variance is var X = ||f. Denoting the corresponding 10.2.31. Variance and standard deviation. The simple nu-■ merical characteristics concerning the variabil- f&JtW ity 01 sample values in descriptive statistics sr=2g5=-^ were the variance and the standard deviation. Define them similarly for random variables. Variance of a random variable Given a random variable X with finite expected value, its variance is defined as varX = E ((X - EX)2) , provided the right-hand expected value exists. Otherwise, the variance of X does not exist. The square root VvarX of the variance is called the standard deviation of the random variable X. Using the properties of the expected value, a simpler formula can be derived for the variance of a random variable X whose expected value exists: var X = E(X - EX)2 = E(X2 - 2X(EX) + (EX)2) = EX2 - 2(EX)2 + (EX)2 = EX2 - (EX)2. Consider how affine transformations change the variance of a random variable. Given real numbers a, b and a random variable X with expected value and variance, consider the random variable Y = a + bX. Compute varF = E((a + &X) -E(a + &X))2 = E(b(X — EX))2 = &2varX. Thus are derived the following useful formulae: Properties of variance (1) (2) (3) varX var (a + bX) E{X2) - (EX)2 b2v&rX \J var(a + bX) = b\/ var X Given a random variable X with expected value and nonzero variance, define its standardization as the random variable „ X-EX yVarX Thus, the standardized variable is the affine transformation of the original variable whose expected value equals zero and variance equals one. 10.2.32. Chebyshev's inequality. A good illustration of the usefulness of variance is the Chebyshev's in-■~^// equality. This connects the variance directly £-r to the probability that the random variable assumes values that are distant from its expected value. 742 CHAPTER 10. STATISTICS AND PROBABILITY THEORY standardized variable by Z, we can reformulate the condition as 0 05 = P(X<4) = P Z<-g^ 26 ■Fz 104 -n 5^n where by the approximation assumption, Fz ~ $ is the distribution function of the normal distribution N(0,1). Since we must have n > 104, using 0. Then, P{\X-EX\>e)l e2f(x)dx = e2P(\X-ß\>e). \x—ß\>e □ By F, we have E X 1200 200 and varX 200(1 - g) = The condition on the number of 6s says that 150 < X < 250, which can be written as XT—200 < 50. Using Chebyshev's inequality 10.2.32, we get P(|X-200| < 50) = 1-P(|X-200| > 51) > 1--« 3 ■ 512 (2) The exact value of the wanted probability is given by the expression P(150 < X < 250) = Fx(250) - Fx(150), where Fx is the distribution function of the binomial distribution. By definition, mar a P(150 < X < 250) = E fc=150 This expression is hard to evaluate without a computer, so we use Moivre-Laplace theorem. Replacing X with the standardized random variable V3(X - 200) The analogous proof for discrete random variables is left as an exercise for the reader. Realizing that the variance is the square of the standard deviation a, the choice e = ka yields the probability P(\X -EX\ > ka) < -3-. k 0 94 The Chebyshev's inequality helps understanding asymptotic descriptions of limit processes. For instance, consider the sequence of random variables Xi,X2, ■ ■ ■ with probability distributions Xn ~ Bi(n,p), with a fixed value of p, 0 < p < 1. Intuitively, it is expected that the relative frequency of success should approach the probability p as n increases, i.e., that the values of the random variables Yn = ij„ should approach p. Clearly, n nA n Direct application of Chebyshev's inequality yields, for any fixed e > 0, that ^(I^n-Pl >e) < p(l - p) Z = WV5 then, by 10.2.40, we have Z ~ 7y(0,1), i. e., Fz « <£. Thus, P(150 < X < 250) = P(v/5(2i^200) We learn that 0, lim P(\ — -p\>e)=0. n—too 1 71 1 This result is known as Bernoulli's theorem (one of many). This type of limit behaviour is called convergence in probability. Thus it is proved (as a corollary of Chebyshev's inequality) that the random variables Yn converge in probability to the constant random variable p. 743 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.H.5. At the Faculty of Informatics, 10 % of students have prumer less than 1.2 (let us call them successful). How many students must we meet if the probability that there are 8-12 % successful ones among them is to be at least 0.95? Solve this problem using Chebyshev's inequality, and then using Moivre-Laplace theorem. Solution. Let X denote the random variable that corresponds to the number of successful students, parametrized by the number n of students we meet. Since a randomly met student has probability 10 % of being successful, when meeting n students, we have X ~ Bi(n, j^). By F, we have EI = O.ln and varX = 0.09n. By Chebyshev's inequality 10.2.32, the wanted probability satisfies P(\X - 0.1n| < 0.02n) = 1 - P(\X - 0.1n| > 0.02n) > > 0.1 ■ 0.9n _ 225 ~ ~ (0.02n)2 ~ ~ ~n' The inequality 1 - ^ > 0.95 and hence P(\X - 0.1n| < 0.02n) > 0.95 holds for n > 4500. The exact value of the probability is given in terms of the distribution function Fx of the binomial distribution: P(0.08n < X < 0.12n) = Fx(0.12n) - Fx(0.08n). Using the de Moivre-Laplace theorem (see 10.2.40), we can approximate the standardized random variable Z = wix^[i with the standard normal distribution, Fz ~ , so 0.95 = P(0.08n < X < Q.12n) = p(-Je} < Z < ^) K 15 ' K 15' = 2*(^)-l. v 15 ' Hence ^/n = 15z(0.975) and we learn n « 864.4. Thus, we can see that it is sufficient to meet 865 students. □ 10.H.6. The probability that a planted tree will grow is 0.8. What is the probability that out of 500 planted trees, at least 380 trees will grow? Solution. The random variable X that corresponds to the number of trees that will grow has binomial distribution X ~ Bi(500, |). By F, we have EX = 400 and varX = 80. 10.2.33. Covariance. We return to random vectors. In the ^ case °f me exPected value, the situation is very "■^~T"^'" simple —just take the vector of expected values. When characterizing the variability, the dependencies between the individual components are also of much interest. We follow the idea from 10.1.9 again. Covariance Given random variables X, Y whose variances exist, Define their covariance as cov(X,r) =E((X-EX)(Y-EY)) The basic properties of the concept can be derived very easily: Theorem. For any random variables X Y, Z whose variances exist and real numbers a,b,c, d, (1) cov(X, Y) = cov(Y X) (2) cov(X, Y) = E(XY) - (E X) (E Y) (3) cov(X+ YZ) =cov(X,Z)+cov(YZ) (4) cov(a + bX, c + dY) = bd cov(X, Y) (5) var(X + Y) = varX + var Y + 2 cov(X, Y). Moreover, if X and Y are independent, then cov(X, Y) = 0, and consequently (6) var(X + Y) = var X + var Y. The standardized random variable is Z X-400 By the de Proof. Directly from the definition, the covariance is symmetric in the arguments. The second proposition follows immediately from the properties of the expected value: ~cov(X,r) = E(X -EX)(Y -EY) = E(XY) - (EY)X- (EX)Y + EXEY = E(XY) —EXEY. The next proposition also follows easily if the definition is expanded and the fact that the expected value of the sum of random variables equals the sum of their expected values is used. The next proposition can be computed directly: cov(a + bX, c + dY) = = E((a + bX - E(a + bX))(c+ dY - E(c + dY))) = E((bX - bE(X))(dY - dE(Y))) = E(bd(X-E(X))(Y-E(Y))) = bdE((X-EX)(Y-EY)) = bdcov(X,Y). 744 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Moivre-Laplace theorem, we have Fz ~ $, so P(X > 380) = P(Z > 380 - 400, 80 v 2 ' 20, = ^(^~) ~ °-987- □ 10.H.7. Using the distribution function of the standard normal distribution, find the probability that the absolute difference between the heads and the tails in 1600 tosses of a coin is at least 82. Solution. Let X denote the random variable that corresponds to the number of times the coin came up heads. Then X has binomial distribution Si (1600,1/2) (with expected value 800 and standard deviation 20), so for a large value of n = 1600, by the de Moivre-Laplace theorem, the distribution function of the variable x~2o°° can be approximated with the distribution function

(-2,05) = 0.0404. □ 10.H.8. Using the distribution function of the standard normal distribution, find the probability that the absolute difference between the heads and the tails in 3600 tosses of a coin is at most 66. Solution. Let X denote the random variable that corresponds to the number of times the coin came up heads. Then X has binomial distribution Bi(3Q00,1/2) (with expected value 1800 and standard deviation 30), so for a large value of n = 3600, The other propositions about the variance are quite simple corollaries: var(X + Y)= E((X + Y)- E(X + Y)f = E((X -EI) + (7- E7))2 = E(X -EX)2+ 2E(X -EX)(Y -EY) + E(Y — E Y)2 = varX + 2 cov(X, Y) + varY Furthermore, if X and Y are independent, then E(XY) = E X E Y, and hence that their covariance is zero. □ Directly from the definition, var(X) = cov(X,X). The latter theorem claims that covariance is a symmetric bilinear form on the real vector space of random variables whose variance exists. The variance is the corresponding quadratic form. The covariance can be computed from the variance of the particular random variables and of their sum, as seen in linear algebra, see the property (5). Notice that the random variable, equal to the sum of n independent and identically distributed random variables Y{ behaves, very much differently than the multiple nY. In fact, var(Yi + ■ ■ ■ + Yn) = n var Y, var(nY) = n2 var Y. 10.2.34. Correlation of random variables. To a certain ex-^ tent, covariance corresponds to dependency between the random variables. Its relative version is called the correlation of random variables and, similarly as for the standard deviation, the following concept is denned: Correlation coefficient The correlation coefficient of random variables X and Y whose variances are finite and non-zero is denned as cov(X, Y) PX,Y Vvar X\/ var Y As seen from theorem 10.2.33, the correlation coefficient of random variables equals the covariance of the standardized variables -±= (X — EX) and (Y-EY). The following equalities hold (here, a,b,c,d are real constants, bd 7^ 0, and X, Y are random variables with finite X-800 20 the distribution function of the variable imated, by the de Moivre-Laplace theorem, with the distribution function

(1,1) -£(-!, 1) = 0,7498. □ Pa+tx,c+dY = sgn(bd)px,Y Px,x = 1- Moreover, if X and Y are independent, then px,Y = 0. Note that if the variance of a random variable X is zero, =then it assumes the value E X with probability 1. If the value of X falls into an interval I not containing E X with probability p 7^ 0, then the expression var X = E(X — E X)2 is positive. Stochastically, random variables with zero variance behave as constants. 745 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.H.9. The probability that a seed will grow is 0.9. How many seeds must we plant if we require that with probability at least 0.995, the relative number of grown items differs from 0.9 by at most 0.034. Solution. The random variable X that corresponds to the number of grown seeds, out of n planted ones, has binomial distribution X ~ Bi(n, ^). By F, we have EX = 0.9n and varX = 0.09n, so the standardized variable is Z = xZ°:®n . V0.09n The condition in question can be written as P(\X - 0.9n\ < 0.034n) = P (\Z\ < °'°34n V0.09n / 0 34 \ = Pi\Z\< — v^J > 0.995. By the de Moivre-Laplace theorem, for large n, the distribution function can be approximated by the distribution function

( — V« Altogether, we get the condition ^ f'0.34 r- 1 > 0.995. Odtud vypočítáme n > 3z(0.9975)V 0,34 J 615. □ 10.H.10. The service life (in hours) of a certain kind of gadget has exponential distribution with parameter A = jq . Using the central limit theorem, bound the probability that the total service life of 100 such gadgets lies between 900 and 1050 hours. Solution. In exercise 10.F.5, we computed that the expected value and variance of a random variable Xi with exponential distribution are equal to EXj = j and varXj = j?, respectively. Thus, the expected service life of each gadget is EXj = [i = 10 hours, with variance varXj = a2 = 100 hours2. By the central limit theorem, the distribution (xx-ji} If the covariance is a positive-definite symmetric bilinear form, then it would follow from the Cauchy-Schwarz inequality (see 3.4.3) that (1) \px,y \ < 1 The following theorem claims more. It shows that the full correlation or anti-correlation, i.e. px,y = ±1 of random variables X &Y says that they are bound by an affine relation Y = kX + c, where the sign of k corresponds to the sign in px,y = ±1. On the other hand, a zero correlation coefficient says that the (potential) dependency between the variables is very far from any affine relation of the mentioned type. (Note, however, this does not mean that the variables must be independent). For instance, consider random variables Z ~ N(0,1) andZ2. Then cov(Z, Z2) = EZ3 = 0 since the density of Z is an even function. Thus the expected value of an odd power of Z is zero, if it exists. Theorem. If the correlation coefficient is defined, then \px,y\ < 1. Equality holds if and only if there are constants k, c such that P(Y = kX + c) = 1. Proof. A stochastic affine relation between Y and X with nonzero coefficient at Y is sought. This is equivalent to Y + sX ~ D(c) for some fixed value of the parameter s and constant c. In such a case the variance vanishes. Thus one considers the following non-negative quadratic expression: ,'Y — ~EY X-EX\ , 0 < var ( , + t —, 1=1 + 2tpx,Y + t VvarY V varX of the transformed random variable -4= y^L, ( —'— itjo Si=i ^ — 10 approaches the standard normal distribution as n tends to infinity. Thus, the wanted probability for the service life of 100 gadgets / 100 P(900 < E xi< 105°) = p I -1 < JJjTj YlXj ~ 10 - 0 \ 2 = 1 can be approximated with the distribution function of the normal distribution: The right-hand quadratic expression does not have two distinct real roots; hence its discriminant cannot be positive. So Mpx,y)2 — 4 < 0. Hence the desired inequality is obtained, and also the discriminant vanishes if px,y = ±1- For the only (double) root to, the corresponding random variable has zero variance; thus it asumes a fixed value with probability 1. This yields the affine relation as expected. □ 10.2.35. Covariance matrix. The variability of a random J -,, vector must be considered. This suggests considering the covariances of all pairs of components. The fol-lowing definition and theorem show that this leads to ™5 an analogy of the variance for vectors, including the behaviour of the variance under affine transformations of the random variables. 746 CHAPTER 10. STATISTICS AND PROBABILITY THEORY P(900 < J2X < 1050) « <2>(0.5) - <2>(-l) « 0.533. covariance matrix □ 10.H.11. We keep putting items into a chest. The expected mass of an item is 3 kg and the standard deviation is 0.8 kg. What is the maximum number of items that we can put into the chest so that with probability at least 99%, the total mass does not exceed one ton? Solution. Let X{ denote the random variable that corresponds to the mass of the i-th item. Then, we have fi = EX{ = 3 and a = yVar X{ = 0.8 (in kilograms), and we want to have p(J2xi ^ 100°) = By the central limit theorem 10.2.40, the distribution of the random variable 'Xi-3\ 1 A„ 3y^ 1 1 0.8 V 0.8 I 0.8Jn^~ can be approximated by the standard normal distribution. Thus, we get p(£x < iooo) = P(sn < ^--fc) - *(^- We learn that 2(0.99) « 2.326, so the wanted n satisfies the quadratic equation 1000 V« 0.8^ ~ IDF whence we get n « 322. □ = 2.326, I. Testing samples from the normal distribution In subsection 10.3.4, we introduced the so-called two-sided interval estimate of an unknown parameter fi of the normal distribution N(/i, a2). In some cases, we may be interested only in an upper or lower estimate, i.e. a statistic U or L for which P(p < U) or P(L < fi), respectively. Then, we talk about a one-sided confidence interval (—oo, U) or (L, oo). The formula for these intervals can be derived similarly as for the two-sided interval. Now, we have for the random variable Z = V^^f^ ~ N(0,1) that 1 - a = $(z(l - a)) = P(Z < z(l - a)). Hence it immediately follows that 1 - a = P(X - 4=z(l - a) < /i), Consider a random vector X = (X1,... ,Xn)T all of whose components have finite variances. The covariance matrix of the random vector X is defined in terms of the expected value as (notice the vector X is viewed as a column of random variables now) varX = E(X - EX)(X - EX)T. Using the definition of the expected value of a vector and expanding the matrix multiplication, it is immediate that the covariance matrix var X is the symmetric matrix / varXi cov(Xi,X2) ■■■ cov(Xi,Xn)\ cov(X2,X1) varX2 ■■■ cov(X2,X„) \cov (Xn,X{) cov (Xn,X2) var Xn J Theorem. Consider a random vector X = (Xi,..., Xn)T all of whose components have finite variances. Further, consider the transformed random vector Y = BX + c, where B is an m-by-n matrix of real constants and c G Rm is a vector of constants. Then, var(Y) = var(BX + c) = £(var X)BT. Proof. The claim follows from direct computation, us-3^g7|he properties of the expected value: °-8 var(r) = E((BX + c) - E(BX + c)) ((BX + c) -E(BX + c))T = E(B(X -EX))(B(X -EX))T = BE(X — EX)(X — E X)TBT = B(vaxX)BT. The constant part of the transformation has no impact, while with respect to the linear part of the transformation, the covariance matrix behaves as the matrix of a quadratic form. 10.2.36. Moments and moment function. The expected value and variance reflect the square of the deviation of values of a random variable from the average. In descriptive statistics, one also examines the skewness of the data, and it is natural to examine the variability of random variables in terms of higher powers of the given random variable X. The characteristic E(Xk) is called the k-th moment; the characteristic fik = E ((X — EX)k) is called the k-th central moment of a random variable X. What also comes in handy is the k-th absolute moment, given by E \ X\k. From the definition it follows that for a continuous random variable X, EXk xkfx(x)dx. 747 CHAPTER 10. STATISTICS AND PROBABILITY THEORY so L = X - -£jz(1 - a). Similarly, we find U = X + -j%z(l — a), and for a distribution with unknown variance, H>X- -J=f„-i(l - a) and ji < X + -J=i„-i(l - a). If we want to estimate the variance a2 of a random distribution, then we use theorem 10.3.3, similarly as when we derived it for the expected value. This time, we use the second part of the theorem, by which the random variable S2 has distribution x2- Then, we can immediately see that 71—1 1 - a = P ( xl-M/2) < -^S2 < xl-^l - a/2) Thus, the two-sided 100(1 — a)% confidence interval for the variance is (n-l)S0 (n-l)S0 X2_i(l-"/2)'xLiW2) and similarly for the one-sided upper and lower estimates, we get o-2 < —f—, resp. 4——^-r < a2. X2n-i{a) X2_i(l-a) 10.1.1. We roll a die 600 times, obtaining only 45 sixes. Is it possible to say that the die is ideal at level a = 0.01? Solution. For an ideal die, the probability of rolling a six is always p= \- The number of sixes in 600 rolls is given by a random variable X with binomial distribution X ~ Bi(600, i). By 10.2.40, this distribution can be approximated by the distribution N(100, ^p). The measured value X = 45 can be considered a random sample consisting of one item. Assuming that the variance is known and applying 10.3.4, we get that the 99% (two-sided) confidence interval for the expected value p, equals (45 - ^J^fz (0.995), 45 + ^/2|2z(0.995)). We learn that the quantile is approximately 2(0.995) « 2.58, which gives the interval (21,69). However, for an ideal die, we clearly have fi = 100, so our die is not ideal at level a = 0.01. □ 10.1.2. Suppose the height of 10-years-old boys has normal distribution N(/i, a2) with unknown expected value fi and variance a2 = 39.112. Taking the height of 15 boys, we get the sample mean X = 139.13. Find i) the 99% two-sided confidence interval for the parameter ii) the lower estimate for fi at significance level 95 %. Similarly, for a discrete random variable X whose probability is concentrated into points x{, EXk =YJ^kfx(xi). The next theorem shows that all the moments completely describe the distribution of the random variable, as a rule. For the sake of computations, it is advantageous to work with a power series in which the moments appear in the coefficients. Since the coefficients of the Taylor series of a function at a given point can be obtained using differentiation, it is easy to guess the right choice of such a function: Moment generating function Given a random variable X, consider the function Mx (t) R -4- R defined by Mx(t) = Ee tx J2ietx' fx(xi) if X is discrete I^oo e*x fx (x) dx if X is continuous. If this expected value exists, the moment generating function of the random variable X can be discussed. It is clear that this function Mx (t) is always analytic in the case of discrete random variables with finitely many val- Theorem. Let X be a random variable such that its analytic moment generating function on an interval (—a, a) exists. Then, Mx (t) is given on this interval by the absolutely convergent series 00 /fc Mx(*) = £]fe!E** k=0 If two random variables X and Y share their moment generating functions over a nontrivial interval (—a, a), then their distribution functions coincide. Proof. The verification of the first statement is a simple exercise on the techniques of differential and integral calculus. In the case of discrete variables, there are either finite sums or absolutely and uniformly converging series. In the case of continuous variables, there are absolutely converging integrals. Thus, the limit process and the differentiation can be interchanged. Since -| e1 d tx : etx, it is immediate that dk d¥ Mx(t)=EXk, as expected. The second claim is obvious for two discrete variables X and Y with only a finite number of values x1,... ,Xk for which either fx(xi) =^ 0 or fvixi) =^ 0. Indeed, the functions etx' are linearly independent functions and thus their coefficients in the common moment function M(t) = etel f(xt) + ■■■ + etx« f(xk) must be the shared probability function values for both random variables X and Y. 748 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Solution, a) By 10.3.4, the 100(1 - a)% two-sided confidence interval for the unknown expected value \i of the normal distribution is (l) fielx -,z(l - a/2),X + —j=z{\ - a/2) where X is the sample mean of n items, a2 is the known variance, and z(l — a/2) is the corresponding quantile. Substituting the given values n = 15, a « 6.254 and the learned z(0.995) « 2.576, we get -^z(a/2) « 4.16, i. e., (i e (134.97,143.29). b) The lower estimate L for the parameter \i at significance level 95 % is given by the expression L = X — ^z(0.95). We learn that 2(0.95) ~ 1.645, and direct substitution leads to fi e (136.474, oo). □ 10.1.3. A customer tests the quality of bought products by examining 21 randomly chosen ones. He will accept the delivery if the sample standard deviation does not exceed 0.2 mm. We know that the pursued property of the products has normal distribution of the form N(10 mm; 0.0734 mm2). Using statistical tables, find the probability that the delivery will be accepted. How does the answer change if the customer, in order to save expenses, tests only 4 products? Solution. The problem asks for the probability P(S < 0.2). By theorem 10.3.3, when sampling n products, the random variable zprSa has distribution Xn-i- m our case> n — 21 and a2 = 0.0734, so In the case of continuous variables X and Y sharing their J.',, generating function M(t), the argument is more involved and an indication only is provided. Notice that | M(t) is analytic and thus it is denned for all complex numbers t, \t\ < a. In particular, M(it) ' f(x)dx, which is the inverse Fourier transform of f(x), up to the constant multiple V2tt, see 7.2.5 (on page 474). If this works for all t, then clearly / is obtained by the Fourier transform of (v/27r)~1M(ii) and thus must be the same for both X and Y. Further details, in particular covering general random variables, would need much more input from measure theory and Fourier analysis, and thus it is not provided here. □ It can be also shown that the assumptions of the theorem are true whenever both Mx (—a) < oo and Mx (a) < oo. 10.2.37. Properties of moment function. By the properties of the exponential functions, it is easy to compute the behaviour of the moment function un-fi-Ofe der affine transformations and sums of independent random variables. Proposition. Let a, b G R and X, Y be independent random variables with moment generating functions Mx (t) and MY(t), respectively. Then, the moment generating functions of the random variables V = a + bX and W = X + Y are Ma+bx(t)=eatMx(bt) Mx+Y (t) = Mx(t)MY(t) Proof. The first formula can be computed directly from the definition: P(S < 0.2) = P 20 0.0734 ■S2 < 20 0.0734 0-22 = xlo 20-0.22\Mv(t) =Ee(°+^' =Eeate^x =eatMx(bt). 0.0734 J As for the second formula, recall that etx and etY are The expression in the argument of the distribution function is approximately 10.9, and we can learn from the table of the x2 distribution that xlo(10-9) ~ °-05- Th^- the probability that delivery will be accepted is only 5 %. We could have expected the probability to be low: indeed, E S2 = = a2 = 0.0734 > 0.22. If the customer tests only 4 products, then the probability of acceptance is given by the expression xl independent variables. Use the fact that the expected value of the product of independent random variables equals the product of the expected values. .Eet(x+y) Mw(t) EetxetY .EetxEetY Mx(t)MY(t). □ 0.0734 2 x|(1.63). The value of the distribution function of x2 m this argument cannot be found in most tables. Therefore, we estimate it using linear interpolation. For instance, if the nearest known points are x\ (0.58) = 0.1 and x|(6.25) = 0.9, then xl(1.63) « (1.63 - 0.58) 0.9-0.1 6.25 - 0.58 + 0.1 « 0.24. 10.2.38. Normal and binomial distributions. As an illustrating example, compute the moment function of two random variables X ~ N(/i, a) and X ~ Bi(n,p). Moment generating function for N(/i, a) Proposition. If X ~ N(/i, a), then Mx(t) =e^e^. In particular, it is an analytic function on all o/R. 749 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Although this results is only an estimate, we can be sure that the probability of acceptance is much greater than when testing 21 products. □ 10.1.4. From a population with distribution N(p,a2), where a2 = 0.06, we have sampled the values 1.3; 1.8; 1.4; 1.2; 0.9; 1.5; 1.7. Find the two-sided 95% confidence interval for the unknown expected value. Solution. We have a random sample of size n = 7 from the normal distribution with known variance a2 = 0.06. The sample mean is X = ^(1.3 + 1.8 + 1.4 + 1.2 + 0.9 + 1.5 + 1.7) = 1.4 and we can learn for the given confidence level a = 0.05 that z(l - a/2) = 2(0.975) « 1.96. Substituting into (1), we immediately obtain the wanted interval (1.22,1.58). □ 10.1.5. Let X1,..., Xn be a random sample from the distribution N(p, 0.04). Find the least number of measurements that are necessary so that the length of the 95% confidence interval for fi would not exceed 0.16. Solution. Since we have a normal distribution with known variance, we know from (1) that the length of the (1 — a)% confidence interval is ^=z(l — a/2). Substituting the given values, we get that the number n of measurements satisfies the inequality 2-02 —=-z(0.975) < 0.16. Since z(0.975) « 1.96, we obtain n > 24.01. Thus, at least 25 measurements are necessary. □ 10.1.6. Consider a random variable X with distribution N(fi,a2), where fi, a2 are unknown. The following table shows the frequencies of individual values of this random variable: Xi 8 11 12 14 15 16 17 18 20 21 rii 1 2 3 4 7 5 4 3 2 1 Calculate the sample mean, sample variance, sample standard deviation, and find the 99% confidence interval for the expected value [i. Solution. The sample mean is given by the expression X = niXi/ J2ni- Substituting the given values, we get X = 490/32 « 15.3. By definition, the sample variance is S = J2ni(Xi — X)2/(J2ni ~ !)■ Substituting the given values, we get S2 = 1943/256 « 7.6, so the sample standard deviation is S ~ 2.8. The formula for the two-sided (1 — a)% confidence interval for the expected value fi, when the variance Proof. Suppose Z ~ N(0,1)). Then „ 1 Mz(t) _e 2 da; . 2tt 1 e-^-2tx+t2-t2)Ax 2tt 1 <*-')2 , e 2 da; 2tt = e 2 where use is made of the fact that in the last-but-one expression, for every fixed t, the density of a continuous random variable is integrated; hence this integral equals one. Substitute the formula for the moment generating function M^+az, to obtain for X ~ N(/i, a) that Mx(t) =^t&^r-again a function analytic over entire R. □ In particular, the moments of Z of all orders exist. Substitute \t2 into the power series for the exponential function, and calculate them all: 2 \ k oo lit k=0 v ' k=0 = V — t2k = = l + Qt + \t2 + Qii + ^-tA + ... 2 4! In particular, the expected value of Z is E Z = 0, and its variance is varZ = ¥>Zr — (E2)2 = 1. Further, all moments of odd orders vanish, EZ4 =3, etc. Hence the sum of independent normal distributions X ~ N(/i, a) and Y ~ N(/i', a') has again the normal distribution X + Y ~N(fi + fi',a + a'). Similarly, considering the discrete random variable X ~ Bi(n,p), Mx(t) =Eetx = E(pe*f (f)(l - p)n~k k=0 ^ ' = {pet+{\-p)Y = {p{et-\) + \Y = 1 + npt + ^ (n(n - l)p2 + np)t2 + ... is computed. Of course, the same can be computed even easier using the proposition 10.2.37 since X is the sum of n independent variables Y ~ A(p) with the Bernoulli distribution. Therefore, Eetx = (EetY)n = (pe* +(1 - p))n. Hence all the moments of the variable Y equal p. Therefore, EY = p, while var Y = p(l —p). From the moment function Mx(t), EX = npandvarX = EX2 - (EX)2 = np(l -P)- 750 CHAPTER 10. STATISTICS AND PROBABILITY THEORY is unknown, was derived at the end of subsection 10.3.4: fie(x- -J=*n-i(l - a/2),X + -J=*n-i(l - a/2) Substitution yields X = 15.3, n = 32, S « 2.8, a = 0.01, and we learn %(0.995) « 2.75. Thus, the 99% confidence interval is \i e (14.0,16.7). □ 10.1.7. Using the following table of the distribution function of the normal distribution, find the probability that the absolute difference between the heads and the tails in 3600 tosses of a coin is greater than 90. Standard Normal Distribution Table Solution. Let X denote the random variable that corresponds to the number of heads. Then, X has binomial distribution Bi(3Q00,1/2) (with expected value 1800 and standard deviation 30), so by the de Moivre-Laplace theorem, for large val- can ues of n, the distribution function of the variable be approximated by the distribution function

0, lim P( 1 " 71 ^-' ß \ < e) = 1. Proof. By the use Chebyshev's inequality just as at the end of subsection 10.2.32, = 2<2>(-1.5) = 0.1336, where the last value was learned from the table. □ n 1 v-^ I \ varfi E"_i Xi — u) P - VX-/i >e)<-uz"1-1 " M 7i ^-^ 1 ' el i=l = 7j?Er=ivar^ < SL e2 ~ Tie2 751 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.1.8. The probability that a newborn baby is a boy is 0.515. Find the probability that there are at least the same number of girls as boys among ten thousand babies. Solution. PIX < 50001 X-5150 -150 V5150- 0.485 ^5150- 0.485 -iV(0,l) = 0.00135 □ 10.1.9. Using the distribution function of the standard normal distribution, find the probability that we get at least 3100 sixes out of 18000 rolls of a six-sided die. Solution. We proceed similarly as in the exercises above. X has binomial distribution Bi(18000,1/6). We find the expected value ((1/6) (18000) = 3000) as well as the standard deviation ^(1/6) (1 - 1/6)18000) = 50. Therefore, the variable x~3q00 can be approximated with the distribution function

3100] = P = P X- 3000 3100 - 3000 > 50 X - 3000 50 > 2 50 = 1 -<2>(2) = 0.0228. 10.1.10. A public opinion agency organizes a survey of preferences of five political parties. How many randomly selected respondents must answer so that the probability that for each party, the survey result differs from the actual preference by no more than 2% is at least 0.95? Solution. Let pi, i = 1... 5 be the actual relative frequency of voters of the i-th political party in the population, and let X{ denote the number of voters of this party among n randomly chosen people. Note that given any five intervals, the events corresponding to X{/n falling into the corresponding interval may be dependent. If we choose n so that for each i, Xi/n falls into the given interval with probability at least 1 — ((1 — 0.95)/5) = 0.99, then the desired condition is sure to hold even in spite of the dependencies. Thus, let us look for n such that P[|f - p\ < 0.02] > 0.99. First of all, we Thus, the probability P is bounded from below by 1 71 71 -' - ß \ < e) > 1 C which proves the proposition. □ Thus, existence and uniform boundedness of variances suffices for the means of pairwise uncorrelated variables Xi with zero expected value to converge (in probability) to zero. 10.2.41. Central limit theorem. The next goal is more ambitious. In addition to the law of large numbers, the stochastic properties of the fluctuation of the means Xn = i Y17=i around the expected value [i need to be understood. We focus first on the simplest case of sequences of independent and identically distributed random variables X{. Then formulate a more general version of the theorem and provide only comments on the proofs. Move to a sequence of normalized random variables X{. Assume E X{ = 0 and var X{ = 1. Assume further that the moment generating function Mx (t) exists and is shared by all the variables X{. The arithmetic means - y^L, X{ are, of course, random variables with zero expected value, yet their variances are -% = -. Thus, it is reasonable to renormalize them to 1 E*» which are again standardized random variables. Their moment generating functions are (see proposition 10.2.37) MSn(t) = Ee" ■T,ix* _ Mx(-r=) □ Since it is assumed that the variables X are standardized, t 1 + 0^ + 1 — + o - \/n In n where again o(G(n)) is written for expressions which, when divided by G(n), approach zero as n —> oo, see subsection 6.1.16. Thus, in the limit, lim Ms„ (t) = lim 1 + —--h O ( — 2n Kn' ■ e 2 This is just the moment generating function of the normal distribution Z ~ N(0,1), see the end of subsection 10.2.35. Thus, the standardized variables Sn asymptotically have the standard normal distribution. We have thus proved a special version of the following fundamental theorem. Although the calculation is merely a manipulation of moment generating functions, many special cases were proved in different ways, providing explicit estimates for the speed of convergence, which of course is useful information in practice. Notice that the following theorem does not require the probability distributions of the variables X{ to coincide! 752 CHAPTER 10. STATISTICS AND PROBABILITY THEORY rearrange the expression: --p < 0.02 X -0.02 <--p < 0.02 n = P [-0.02 ■ n< X -pn< 0.02 ■ n] = -0.02 ■ n X — pn 0.02 ■ n yjnp{\-p) yjnp{\-p) yjnp{\-p) 2

0.99 > 0.995 Since the distribution function is increasing, the last condition is equivalent to > *-(0.995) \Aw - p) "-02-" > 2.576 ^np{\-p) > 50-2.576- Vp(l-p) > (25-2.276)2-4147 Here, we used the fact that the maximum of the function p(l — p) is \, and it is reached at p = \. We can see that if e. g. p = 0.1, then \Jp{\ — p) = 0.3 and the value of the least n is lower. This accords with our expectations: for less popular parties, it suffices to have fewer respondents (if the agency estimates the gain of such party to be around 2 % without asking anybody, then the wanted precision is almost guaranteed). □ 10.1.11. Two-choice test. Consider random vectors Y\ and Y2 all of whose components are pairwise independent random variables with normal distribution, and suppose that the components of vector Y{ have expected value and the variance a is the same for all the components of both vectors. Central limit theorem Theorem. Consider a sequence of independent random variables Xi which have the same expected value ~EXi = p, variance var Xi = a2 > 0 and uniformly bounded third absolute moment E\Xi\3 < C. Then, the distribution of the random variable 1 " Xi- p satisfies lim P(Sn 0 lim ^±E\X,-M2+S- 0. Then X'~ß' converges in distribution to Z ~ N(0,1). The previous version of the central limit theorem is derived by choosing (5 = 1. Then sn = a^fn and the condition of the Lyapunov's theorem reads n 0 < lim n-3/2ö--3V E IX2|3 < Ca'3 lim n"3/2+1= 0. 71—>-00 -j 71—>-00 7 = 1 10.2.42. De Moivre-Laplace theorem. Historically, the first formulated special case of the central limit ^ I theorem was that of variables Yn with binomial distribution Bi(n,p). They can be viewed as the sum of n independent variables X{ with Bernoulli distribution A(p), 0 < p < 1. These variables have moment generating functions and E |X{ \3 = p < 1. 753 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Use the general linear model to test the hypothesis whether/ii = p2. Solution. We will proceed quite similarly as in subsection 10.3.12 of the theoretical part. This time, we can write both vectors Y{ into one column, and we consider the model /rn\ /I 0\ Thus, the central limit theorem says in this case that the random variables 1 71 77t7^ Xi -p X np Y2i 1 0 1 1 + 9. We illustrate the result with a concrete example. Suppose it is desired to know what percentage of students like a given course, with an error of at most 5 %. The number of people who like the course among n randomly chosen people should behave as the random variable X ~ Bi(n,p). Further, suppose the result is desired to be correct with confidence (i.e., ■projbability again) to at least 90%. Thus, X -p\< 0.05 0.9 C = Til 771 772 Thus, we test the hypothesis pi = p2, which means that we test whether fa = 0. For this, it is suitable to use the statistic _ _ 1 T _ Y2 - Y1 ( n-in-j S \n1+n2/ where the standard deviation S is substituted as is desired by choosing a high enough number n of students to ask. Approximate 0.9 : S2 = ni+n2 — 2 = 1 i = l The distribution of this statistic is t„1+Tl2_2, so the null hypothesis pi = fi2 is rejected at level a if we have \T\ > tni+n2_2(a). □ 10.1.12. In 1ZD1 Tempo, the milk yield of their cows was measured during five days, the results being 15, 14, 13, 16 a 17 hectoliters. In 1ZD Boj, which had the same number of cows, they performed the same measurement during seven days, the results being 12,16,13, 15, 13,11,18 hectoliters. a) Find the 95% confidence interval for the milk yield of 1ZD Boj's cows, and the 95% confidence interval for the milk yield of 1ZD Tempo's cows. b) On the 5% level, test the hypothesis that both farms have cows of the same quality. P = P ~ á> = 2á> 1 -X -p\< 0.05 0.05n X np 0.05n y/np(l-p) ^/npil-p) ^np(l-p) 0.05n \ / 0.05n x/npil-p) 0.05n y/np(l-p) -1, y/np(l-p) where the symmetry of the density function of the normal distribution is exploited. Thus, 0.05n x/np(\-p) 1 (1 + 0.9) = 0.95 is wanted. This leads to the choice (recall the definition of critical values z(a) for a variable Z with standard normal distribution in subsection 10.2.30) 0.05n ^np(l-p) 2(0.05) = 1.64485. Since p (1 — p) is at most \, the necessary number of students can be bounded by n > 270, independently of p. JZD —jednotné zemědělské družstvo — an agricultural cooperative farm, created by forced collectivization in 1950s in Czechoslovakia. 754 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Suppose that the milk yield of the cows in each day is given by the normal distribution. Solve these problems assuming that there are no data from previous measurements, and then assuming that the previous measurements showed that the standard deviation was a = 2 hi. Solution. First of all, let us compute the results for the known variance. In order to find the confidence interval, we use the statistic _ which has standardized normal distribution (see 10.2.21). Then, the confidence interval is (see 10.3.4) X-^Z(a/2),X + ^Z(a/2) where a = 0,05. Now, it suffices to substitute the specific values. For 1ZD Tempo, we thus get the sample mean — 15 + 14 + 13 + 16 + 17 lr X1 =---= 15, and using appropriate software, we can learn that z(0.025) = 1.96, which gives the interval 15 - 4=1.96,15 + 4=1.96 | = (13.25; 16.75). For 1ZD Boj, we get 12 + 16 + 13 + 15 + 13 + 11 + 18 Xo = 7 = 14, so the 95% confidence interval for the milk yield of their cows is (12.52; 15.48). If the variance of the measurements is not known, we use the so-called sample variance for the estimate. In order to find the confidence interval, we use the statistic S^Jn which has Student's distribution with n — 1 degrees of freedom (see also 10.3.4). Then, we can analogously obtain the 95% confidence interval X - 4=*n-i(a/2),X + 4=*n-i(a/2) For the values of 1ZD Tempo, we get the sample variance S2 = 02 + (-l)2 + (-2)2 + l2 + 22) = ^ i. e., S = 1.58. Further, we have t4(0,025) = 2,78, so the 95% confidence interval for 1ZD Tempo is (13.03; 16.97). 10.2.43. Important distributions. In the sequel, we return v£sL ■/?> t0 statistics- It should be of no surprise that 4w~\. we work with the characteristics of random vec-sr=2g5=-^ tors similar to the sample mean and variance, as well as relative quotients of such characteristics, etc. We consider several such cases. Consider a random variable Z ~ N(0,1), and compute the density fy(x) of the random variable Y = Z2. Clearly, Jy (x) = 0 for x < 0, while for positive x, FY(x) = P(Y 0, while fx (x) = 0 for non-positive x, i.e., the distribution x2 corresponds to the choice a = b = 1/2. This case is already thoroughly discussed as an example in subsection 10.2.20. Hence such a function is the density for the constant c = T^y- Thus, it is the distribution T(a, b) with density, for positive x, fx(x) = In general, the fc-th moment of such variable X is easily computed: , ba E X = o r(a) r(a + k) ' dx xa-1+ke-bxdx foa+k r(a)bk J0 r(a + k) _ r(a + k) ~ r(a)bk ' since the integral of the density of r(a + k, b) in the last expression must be equal to one In particular, E X r(a + 2) _ r(q+l) _ a varX br(a) a2 " b2 = , while (a + l)a — a2 b2r(a) b2 b2 b2' Similarly, the moment generating function can be computed for all values t, —b < t < b Mx(t)= I etx -^xa~L e~bx dx = (b-trj0 [b-ty r(a) 755 CHAPTER 10. STATISTICS AND PROBABILITY THEORY For JZD Boj, we get the sample variance S| = 6, so the wanted confidence interval is (11.73; 16.27). b) If we compare the expected values of milk yield in both farms, then this is a comparison of the expected values of two independent choices from the normal distribution. In the case of unknown variances, we further assume that the variance is the same for both farms. Thus, let us examine the hypothesis assuming the known variances of = of = 4. We use the statistic (X1 - X2) - (/ix - /i2 _ u = "1 I "2 X\ — x2 N(0,1), "1 I "2 where ii\ and ii2 are the unknown expected values of milk yield in the examined farms, and n1, n2 are the numbers of measurements. This statistic has, as indicated, the standardized normal distribution. We reject the hypothesis at the 5% level if and only if the absolute value of the statistic U is greater than «0.025, i- e., if and only if 0 does not lie in the 95% confidence interval for the difference of the expected values of milk yield in both farms. For the specific values, we get ^4=^ = 0.854. 4,4 5^7 Thus, we have \U\ < 2(0.025) = 1.96, so the hypothesis that the expected values of milk yield are the same in both farms is not rejected at the 5% level. The reached p-value of the test (see 10.3.9) is 39.4%, so we did not get much closer to rejecting the hypothesis (the probability that the value of the examined statistic is less than 0.854 provided the null hypothesis holds is 60.6%. If we do not know the variances of the measurements but we know that they must be equal in both farms, we use the statistic (X1 - X2) - (/ix - /i2) _ K S: X\ — x2 ni n2 ni+n2— 2, where (ni - 1)5° + (n2 - 1)51 n1+n2 — 2 Thus, for the sum of independent variables Y = X\ + ■ ■ ■ + Xn with distributions X{ ~ r(a,, b), the moment generating function (for values \t\ < b) is obtained MY(t) = b-t aiH-----ha„ that is, Y ~ T(a1 H----+ an, b). It is essential that all of the gamma distributions share the same value of b. As an immediate corollary, the density of the variable Y = Z\ + ■ ■ ■ + Zl is obtained, where Zt ~ N(0,1). As just shown, this is the gamma distribution Y ~ T(n/2,1/2); hence its density is Iy(-X) = 2™/2r(n/2) xn/2~1e~x/2 . This special case of a gamma distribution is called x2 with n degrees of freedom. Usually, it is denoted by Y ~ xW- 10.2.44. The F-distribution. In statistics, it is often wanted ■J} 1. to compare two sample variances, so we need to consider variables which are given as a quotient X/k u■ = Y/r where X ~ x\ and Y ~ x2m- Suppose fx (x) and fy (x) are the densities of indepen-dentrandom variables X and Y. Suppose fy is non-zero only for positive values of x. Compute the distribution function of the random variable U = cX/Y, where c > 0 is an arbitrary constant. By Fubini's theorem, the order of integration can be interchanged with respect to the individual variables. Fu(u) = P(X < (u/c)Y) uy/c fx(x)fy(y) dxdy V fx(ty/c)fy(y)dt)dy yfx(ty/c)fY(y) dy)dt This expression for Fjj (u) shows that the density fu of the random variable U equals Iu{u) = yfx(uy/c)fy(y) dy. Substitute the densities of the corresponding special gamma distributions for X ~ x\ and Y ~ Xm- Set c = ra/k. The random variable U (k/mfl2 = yj^ has density fu{u) equal to 2(k+m)/2r(k/2)r(m/2)J0 (fc+m)/2-l -y(l+ku/m)/2 dy. The integrand in the latter integral is, up to the right constant multiple, the density of the distribution of a random variable Y ~ T((k + m)/2, (1 + ku/m)/2). Hence the multiple can be rescaled (notice u is constant there) in order to get 756 CHAPTER 10. STATISTICS AND PROBABILITY THEORY For the specific values, we get K = 0.796, \K\ < fiO(0,025) = 2.2281, so again, the null hypothesis is not rejected. The reached y-value of the test is 44.6%, which is even greater than in the above test. □ 10.1.13. Analyzing the variance of a simple sort. For k > 2 independent samples Y{ of size n{ from normal distributions with equal variance, use a linear model to test the hypothesis that all the expected values of individual samples are equal. Solution. The technique is quite similar to that of the above exercise. The hypothesis to be tested is equivalent to stating that a submodel in which all the components of the random vector Y created by joining the given k vectors Y{ have the same expected value holds. Thus, the used model is of the form (Y11) 0 ■ ■ 0\ Y\ni l 0 ■ ■ 0 Y2i — 0 1 ■ ■ 0 Ykl 0 0 ■ ■ 1 \ßk) 0 ■ ■ i/ + aZ. We can easily compute estimates for the expected values \ii using arithmetic means: 1 n{ Hence we get the estimate Y{j = Y{, so the residual sum of squares is of the form k rti i=l j=l The estimate of the common expected value in the considered submodel is n ^ ^ J n ^ i=l j=l i=l where n = nH-----Ynk, and the residual sum of squares in this submodel is k rti i=l i = l In the original model, there are k independent parameters fii, while in the submodel, there is a single parameter y, so the the integral to evaluate to one. The density fu(u) is then expressed as ^t?)/?U-VWi+-"V(fc+m)/a- r(k/2)r(m/2) \m) \ m J This distribution is called the Fisher-Snedecor distribution with k and m degrees of freedom, or F-distribution in short. 10.2.45. The t-distribution. One encounters another useful distribution when examining the quotient of variables Z ~ N(0,1) and \JX/n. Here X ~ xi- (We w& interested in the quotient of Z and the standard deviation of some sample). Compute first the distribution function of Y = Vx (note that X, and hence Y as well, take only positive values with non-zero probability) FY(y) = P(VX ,2/2 dZl---dzn QTz 0. The experiment of tossing the coin n times allows the adjustment of the distribution within the preferred class. Thus, we build on some assumptions about the distribution and adjust the prior distribution in view of the experiment. This approach is called Bayesian statistics. The first approach is based on the purely mathematical abstraction that probabilities are given by the frequencies of event occurrences in data samples which are so large that they can be approximated with infinite models. The central limit theorem can be used to estimate their confidence. From the statistical point of view, the probability is an idealization of the relative frequencies of the cases when an examined result 759 CHAPTER 10. STATISTICS AND PROBABILITY THEORY K. Bayesian data analysis 10.K.1. Consider the Bernoulli process defined by a random variable X ~ Bi(n, 6) with binomial distribution, and assume that the parameter 6 is a random variable with uniform distribution on the interval (0,1). We define the success chance in our process as the variable 7 = What is the density of this variable 7? Solution. Intuitively, we can feel that the distribution is not uniform. Denoting the wanted probability density by f(s), we can use the relation between 6 and 7 to compute 6 = ■ In addition, we can immediately see that the probability density of 7 is non-zero only for positive values of the variable. Now, we can formulate the statement as the requirement (l) e = p(e<&) = p(1oo />oo e-t-stx-1sy-1dtds = JO Jo (substitution t = rq,s = r(l — q)) r=0 Jq=0 r(rq)x-1(r(l-q))v rdqdr = 1 ta~'-(l-t)b-1dt B(a,b) jQ When differentiating, we must use the rule for differentiation of integral with variable upper bound. Thus, we get for the Since the variables Xt are independent, additivity of variance can be used (derived in subsection 10.2.33). The variance behaves as a quadratic form with respect to multiplication by a scalar. Hence 1 71 : X = — var X{ = r>z a—^ —no = —a nA n e-rrx+y-ldr . / gx-1(l-q)y-1dq = lr=0 Jt=0 = r(x + y)B(x,y). Thus, we get the general formula and it follows from properties of the gamma-function for positive integers a, b that t-w , k\(n-k)\ 1 /n\_1 B(ti - fc + 1, fc + 1) = —---f =- v ' ' (n + 1)} n + \\k) We can directly compute that the expected value of the variable X ~ fi(a, 6) with beta-distribution is (applying r(z + i) = zr(z)) EX= B(a + l,b) = a B(a,b) a + b' If a = b, then the expected value and median are \. We can also directly calculate the variance varX = E(X-EX)2 =--—^--. V ' (a + b)2(a + b+l) Thus, for a = b, we get var X = , which shows that the variance decreases as a = b increases. For a = b = 1, we get the ordinary uniform distribution on the interval (0,1). □ 10.K.3. In the situation as in the problem above the previous one, assume that the success chance 6 in the Bernoulli process is a random variable with probability distribution fi(a, 6). What is the probability distribution of the variable 7 = ? In what is it special when a = b = pi Solution. We have already discusses the special case with uniform distribution fi(l, 1). Thus, we can continue with the equality ||1||, where we used the form of this distribution. Now the left-hand side contains, instead of 0, the expression The formula n n - /i)2 = J2(X - Xf + n(X - /i)2 i=l i=l can be verified simply by expanding the multiplications. Thus: 1 n Vs2 = ^YJ{X-»)2--{X-l>)2 = n z—' n i=l 1 " = — > var Xj — var X = i=l = (1-V- That is why the variance s2 is multiplied by the coefficient ^-j-j-, which leads just to the sample variance S2 and its expected value a. Of course, this multiplication makes sense only if n 7^ 1. □ 10.3.3. Random sample of the normal distribution. In jjfi 1, practice, it is necessary to know not only the numerical characteristics of the sample mean and the variance, but also their total probability distributions. Of fS course, it can be derived only if the particular probability distribution of X{ is known. As a useful illustration, calculate the result for a random sample of the normal distribution. It is already verified, as an example on properties of moment generating functions in 10.2.37, that the sum of random variables with the normal distribution results again in the normal distribution. Hence the sample mean must also have the normal distribution, and since both its expected value and variance are known, X ~ N(/i, ^c2). The probability distribution of the sample variance is more complicated. Here, apply the ideas about multivariate normal distributions from subsection 10.2.37. Consider a vector Z of standardized normal variables X{ - p. The same property holds for the vector U = QTZ with any orthogonal matrix Q. In addition, J2i U2 = J2{ X2. Choose the matrix Q so that the first component U1 equals the sample mean Z, up to a multiple. This means that the first column of Q is chosen as (y/7i)~1(l,..., 1). Then Uf = nZ2, so we 761 CHAPTER 10. STATISTICS AND PROBABILITY THEORY wanted density that B(a, &)/(*) a-l / x b-1 S \ i ^ S \ 1 s + lj \~ s + lj (s + 1)2 a-l \ a+b can compute: n n ,s + i, The picture shows the densities fox a = b = p = 2, 5, 15. This enforces the intuition that the same and not too small values of a = b = p correspond to the most probable value 0 = 5, so the density of the chance is greatest around one. The higher p, the lower the variance of this variable. □ 10.K.4. Show that the Bernoulli experiment, described by a random variable X ~ Bi(n, 6), and the a priori probability of a random variable 6 with beta-distribution, the a posteriori probability also has beta-distribution with suitable parameters which depend on the experiment results. What is the a posteriori expected value of 6 (i. e., the Bayesian point estimate of this random variable)? Solution. As justified in subsection 10.3.7 of the theoretic part, the a posteriori probability density is, up to an appropriate constant, given as the product of the a priori probability density 1 9(9) = -9a-l(l-9)b B{a,b) and the probability of the examined variable X provided the value of 6 occurred. Thus, assuming k successes in the Bernoulli experiment, we get the a posteriori density (the sign used instead of equality denotes "proportional") g(6\X = k) cc P(X = k\9)g{9) cc cc ek(\ - e)n-kea-1(i - ef-1 = _ gaA-k— 1^ _ Q'jb-\-n—k— 1 = 1 i=l i=l n n lb lb ~. I b i = 2 2 = 1 2 = 1 Therefore, a multiple of the sample variance I^5a is the sum of n — 1 squares of standardized normal variables, so the following theorem is proved: Theorem. Let (Xi,..., Xn) be a random sample from the N(/i, a2) distribution. Then, X and Sa are independent variables, and X~N(/i,V), ^S2 -xl^. Hence, it immediately follows that the standardized sample mean -X -fi S has Student's t-distribution with 71 — 1 degrees of freedom. 10.3.4. Point and interval estimates. Now, we have every-thing needed to estimate the parameter values in the context of frequentist statistics. Here is a simple example. Suppose there are 500 students enrolled in a course, each of which has a certain degree of satisfaction with the course, expressed as an integer in the range 1 through 10. It may be assumed that the satisfactions X{ of the students are approximated by a random variable with distribution N(/i, a2). Further, suppose a detailed earlier survey showed that \i = 6, a = 2. In the current semester, 15 students are asked about their opinion about the course, as rumor has it that the evaluation of the new lecturer might be quite different. The results show that 2 students vote 3, 3 vote 4, 3 vote 5, 5 vote 6, and 2 vote 7. Altogether, the sample mean is X = 5.133 and the sample variance is S2 = 1.695. By assumptions, X ~ N(/i, o2/n), so Z = yfnXjIJ± ~ N(0,1). In order to express the confidence of the estimate, compute the interval which contains the estimated parameter with an a priori fixed probability 100(1—a)%. Wetalkabout a confidence level a, 0 < a < 1. Consider \i to be the unknown parameter, while the variance can be assumed (be it correct or not) to remain unchanged. It follows that P(\Z\ < z(a/2)) ~f ^X 1 < z(a/2) = P[X--^=z(a/2) < p< X + -^=z(a/2) J, V Vn Vn ) where z(a/2) means the critical value, cf. 10.2.30. Thus, an interval is found whose endpoints are random variables and which contains the estimated parameter fi with an a priori fixed probability. The middle point of this interval is called the point estimate for parameter fi; the whole interval is called the interval estimate. We can also say that at the confidence 762 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Thus, we have indeed obtained the density (up to a constant, which we need not evaluate) of the a posteriori distribution for 6 with distribution B(a + k, b + n — k). Its a posteriori expected value is 9= a+k . a + b + n For n and k approaching infinity so that k/n —> p, our a posteriori estimate also satisfies 6 —> p. Thus, we can see that for large values of n and k, the observed fraction of successful experiments outweighs the a priori assumption. On the other hand, for small values, the a priori assumption is very important. □ 10.K.5. We have data about accident rates for N = 20 drivers in the last n = 10 years (the fc-th item corresponds to the number of years when the fc-th driver had an accident): 0, 0,2,0,0,2, 2,0,6,4,3,1,1,1,0,0, 5,1,1,0. We assume that the probabilities pj, j = 1,..., N, that the j-th driver has an accident in a given year are constants. For each driver, estimate the probability that s/he has an accident in the following year (in order to determine the individual insurance fee, for instance). 2 Solution. We introduce random variables X{j with value 0 if the i-th driver has no accident in the j-th year, and 1 otherwise. The individual years are considered to be independent. Thus, we can assume that the random variables Sj = J27=i Xji that correspond to the number of accidents in all the n = 10 years have distribution Bi(n,pj). Of course, we could estimate the probabilities for all drivers altogether, i. e., using the arithmetic mean However, consider the homogeneity of the distribution of the variables Xj, they can hardly be accounted equal, so such estimate would be misleading. On the other hand, the opposite extreme, i. e., a totally independent and individual estimate Pj = J n J is also inappropriate, since we surely do not want to set zero insurance fee until the first accident happens. level a, the estimated parameter p is or is not different from another value po. Suppose for instance, the data and levels are a = 0.05 and a = 0.1. Respectively we obtain the intervals (i e (4.121, 6.145), (i e (4.284, 5.983). Considering the confidence level of 5%, we cannot affirm that the opinion of students are worse compared to the previous year because the mentioned interval also contains the value Po = 6. We can conclude this if we take the confidence level of 10% since the value po = 6 no more lies in the corresponding interval. On the other hand, if it is assumed that the other (worse) lecturer causes the variance of the answers to change as well (for instance, the students might agree more on the bad assessment), we proceed differently. Instead of the standardized variable Z, deal in a similar way with the variable T = -X-n ^his problem is taken from the contribution M. Friesl, Bayesovské odhady v některých modelech, published in: Analýza dat 2004/n (K. Kupka, ed.), Trilobyte Statistical Software, Pardubice, 2005, pp. 21-33. As seen, this random variable has probability distribution T ~ t„_i, where n = 15 in this case. This leads to the interval estimate X - -^Lt„_i(a/2) < p < X + -^t„_1(a/2). Substitute the data at levels a = 0.05 and a = 0.03 respectively, to obtain ix e (4.412, 5.854), p, e (4.321, 5.945). Therefore, at the confidence level of 3%, the opinion seems to have become worse. This corresponds to our intuition that the sample deviation S = 1.302, which is significantly smaller than a = 2 from the previous case, should be essential for our thinking. 10.3.5. Likelihood of estimates. From the mathematical point of view, interval and point estimates are simple and easy to understand. It is much worse with their practical interpretation because it is problematic to verify all assumptions about randomness of the sample. With more complicated cases, we consider problems with the "likelihood" of our estimates. As mathematicians, we can avoid the practical problem by denning the missing concept. In general, one works with a random sample of size n. Implicitly it is assumed that there are independent random variables Xi with the same probability distribution which depends on an unknown parameter 6 (a vector in general). We are trying to find a sample statistic T, i.e., a function of the random variables X\, X2,... which, in a mathematical sense, estimates the actual value of the parameter 6. T is said to be an unbiased estimator of 6 if and only if E T = 6. The expected value E(T — 6) is called the bias of estimator T. The asymptotic behaviour of the estimator, that is, what it does as n goes to infinity is often of interest. T = T(n) is 763 CHAPTER 10. STATISTICS AND PROBABILITY THEORY The realistic method is to use the same assumption for the a priori distribution of the probabilities pj of accident rates of individual drivers. In practice, one often uses a model with the Poisson distribution Po(Aj) for the j-th driver, with further assumptions about the distribution of the parameter A among the drivers. We can also assume quite well (and simply) that the distribution is pj ~ fi(a, b) with suitable parameters a, b which should reflect the cumulated results of all drivers. Thus, let us go this way. We know from the above exercise that the a posteriori probability distribution will be (j>j\Sj = k) = fi(a + k,b + n — k), so the corresponding expected value will be ,h a + k Pi =-;-• ■' a + b + n Let us compare this estimate to the common estimate p mentioned above and the individual estimate pj. We introduce the values po = i. e., the expected value of the a priori common distribution for all drivers, and n0 = a + b. We get said to be a consistent estimator of the parameter 9 if and only if T(n) converges in probability to 9, i.e., for every e > 0, lim P(\T(n) -0\ 1- ■T(n) Assuming lim^oo E T(n) values of n, then, for sufficiently large var T(n) P(\T{n)-6\ < 2e) > P(\T(n)-ET(n)\ < e) > 1 A useful proposition is thus proved : Theorem. Assume that limn_j.00 ET(n) = 9 and limn_j.00 var T(n) = 0. Then, T(n) is a consistent estimator of 9. As a simple example, we can illustrate this theorem on variance: ö* = ±±{Xi-X)2 = n-l Since it is known from subsection 10.3.2 that S2 is an unbiased estimator, it follows that a2 is not. However, P) (a + b)a nk uq n -po+- (a + b + n)(a + b) (a + b + n)n tiq + n' " tiq + which is a linear combination of the expected value po and the individual estimate pj. Thus, it only remains to reasonably estimate the unknown parameters a,b. We know that EXji=EE(Xji\p)=Ep = p0 -Pi' a = a , and it can be calculated that lim var a2 = lim var S2 = lim 2a = 0. Evar(XJ2|p) E{p{\ - p)) a + b = riQ varE(Xji\p) v&rp and the left-hand variables can be estimated directly. 1 N 1 N n Evw(XJt\p) ~ jj^li^—iPA1 - Pj)) 1 N n vaiE(Xji\p) ~ sp where s? denotes the sample variance between individual estimates (you can verify that the subtraction of the right-most expression guarantees that the last estimate is unbiased). Since for the mentioned data, we get no ~ 3.8643 and p0 ~ 0.1450, the Bayesian estimate of the individual probability of accidents is p) = 0.154 -0.145 + 0.846 -pr n—too n—too n—too fl — 1 Therefore, the statistic s2 is a consistent estimator of the variance. It is apparent that there may be more unbiased estima-jjj i tors for a given parameter. For instance, it is already shown that the arithmetic mean X is an unbiased estimator of the expected value 9 of random variables X{. f ^ The value X\ is, of course, an unbiased estimator of 9 as well. We wish to find the best estimator T in the class of considered statistics, which are unbiased or consistent. Consider as best the one whose variance is as small as possible. Recall that the variance of a vector statistic T is given by the corresponding covariance matrix, which is, in case of independent components, a diagonal matrix with the individual variances of the components on the diagonal. We have already defined inequalities between positive-definite matrices. 10.3.6. Maximum likelihood. Assume that the density function of the components of the sample is given by a function f(x,9) which depends on an unknown parameter 9 (a vector in general). By the assumed independence, the joint density of the vector (X1,..., Xn) is equal to the product: f(x1, ...,xn,9) = f(x1, 9) ■ ■ ■ f(xn, 9), which is called the likelihood function. We are interested in the value 9 which maximizes the likelihood function on the set of all admissible values of the parameter. In the discrete case, this means choosing the parameter for which the obtained sample has the greatest probability. 764 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Thus, it is a combination of the confidence estimate p = 0.145 of the collective probability po with the individual (fre-quentist) estimate pj, which is measured from a small number n = 10 of observations of one driver. □ L. Processing of multidimensional data Sometimes, we need to process multidimensional data: for each of n objects, we determine p characteristics. For instance, we can examine marks of several students in various subjects. 10.L.1. In his attempts, I.G.Mendel examined 10 plants of pea, and each was examined for the number of yellow and green seeds. The results of the experiment are summarized in the following table: It follows from the genetic models that the probability of occurrence of the yellow seed should be 0.75 (and 0.25 for green seed). At the asymptotic significance level 0.05, test the hypothesis that the results of Mendel's experiments are in accordance with the model. Solution. We test the hypothesis with the Pearson's chi-squared test. We use the statistic (rij - npj) npj where r is the number of sorting intervals (measurements; we have r = 10), rij is the actually measured frequency in the chosen sorting interval (we will count the number of yellow seeds), pj is the expected frequency (by the assumed distribution), in our case, pj = 0.75, j = 1,..., 10. If the results of the experiment were really distributed as assumed in our model, we would have K « x2(r — 1 — p)< where p is the number of estimated parameters in the assumed probability distribution. In our case, it is especially simple, since our model does not have any unknown parameters, so we have p = 0 (the parameters may occur if, e. g., we assume that the probability distribution in our experiment is normal but with unknown variance and expected value; then we would have p = 2). Thus, K « x2(9). The statistic is recommended to be used if the expected frequency of the characteristic in each of the sorting intervals is at least 5. Usually, it is more efficient to work with the log-likelihood function n £(x1, ...,xn,0)= \nf(x1,. ..,xn,0) = y^]nf(xi,0). i=l Since the In function is strictly increasing, maximization of the log-likelihood function is equivalent to maximization of the original likelihood function. If, for some input, it happens that f(xi,..., xn, 9) = 0, &est£(xi,..., xn, 9) = — oo. In the case of discrete random variables, use the same definition with probability mass function instead of the density, i.e., £(xx ■ 1 •^"Tll J2HP(X=x,\9)). plant number 1 2 3 4 5 6 7 8 9 10 yellow seeds 25 32 14 70 24 20 32 44 50 44 green seeds 11 7 5 27 13 6 13 9 14 Tgks total seeds 36 39 19 97 37 26 45 53 64 62 We can illustrate the principle on a random sample from the normal distribution N(/j, a2) with size n. The unknown parameters are p or a, or both. The considered density is fix, p, a) = -= e . v ' V^2 Jjaks logarithms of both sides, to obtain £(x,p,a) -n-ln(27ro-2) 1 n The maximum can be found using differentiation (note that a2 is treated as a symbol for a variable): dp ^ n ^ n 2a2 dl n 1 da2 = ~2^2 + <).(n2\2 1J- 2a2 1 2(02)2 2(0-2)2 +)2 Thus, the only critical points are given by p = X and a2 = s2. Substitute these values into the matrix of second derivatives, to obtain the Hessian of £: 0 0 T 2(ct2)2 , Finally, this is the required maximum, and since there is only one critical point, it must be the global maximum (think about the details of this argument!). Thus it is verified that the expected value and the variance are the most likely estimates for p and a, as already used. 10.3.7. Bayesian estimates. We return to the example from ^JsLtff/ subsection 10.3.4, now from the point of view JlJ\±, of Bayesian statistics. This totally reverses the ssSSeL^ approach: the collected data X\,..., X15 (i.e., the points which express how much each student is satisfied, using the scale 1 through 10) are treated as constants. On the other hand, the estimated parameter p (the expected value of the points of satisfaction), is viewed as the random variable 765 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Let us write the data into a table: 3 1 2 nj 25 32 Pj 0.75 0.75 nPj 27 29,25 (rij-npjY 0.148148 0.258547 10 44 0.75 46.5 0.134409 The value of the statistic K for the given data is K = 0.148148 + 0.258547 + ■ ■ ■ + 0.134409 = 1.797495. This value is less than Xo.9s(9) = 16.9, so we do not reject the null hypothesis at level 0.05 (i. e., we do not refute the known genetic model). □ whose distribution we wish to estimate. Let us look at the general principle first (and come back to this example soon). For this purpose, let us interpret Bayes' formula for conditional probability on the level of probability mass functions or probability densities, in the following way: If a vector (X, 0) has joint density f(x, 6), then the conditional probability of a component 0, given by X = x, is denned as the density !(x\e)g(9) g(9\x) = where f(x) and g{9) are the marginal probability densities. (In the above example, x is 15-dimensional vector coming from the multidimensional normal distribution while 6 = p is scalar.) Thus, given the a priori probability density g{6) of the estimated parameter 6 and the probability density f(x\8) (in the above example, 6 = p is the expected value parameter of the distribution), the formula to compute the a posteriori probability density g(6\x) can be used, based on the collected data. Indeed, we do not need to know j(x) for the following reason: we have to view f(x) as a constant independent of 6 and thus the proper density is obtained from j(x\6)g{6) by multiplying with a uniquely given constent in the end. Thus, during the computation, it is sufficient to be precise "up to a constant multiple". For this purpose, use the notation Q oc R, meaning that there is a constant C such that the expressions Q and R satisfy Q = CR. We shall illustrate this procedure on a more explicite example. In order to be as close as possible to the ideas from subsection 10.3.4, work with normal distributions N(/i, a2). Suppose that the satisfaction of individual students in particular lectures is a random variable X ~ N(0, a2), while the parameter 6 reached by the particular lecturers is a random variable 6 ~ N(a, b). Compute, (up to a constant multiple, ignoring all multiplicative components which do not include any 6), (x-6)2 (6»-a*2N g(9\x) oc f(x\0)g(0) oc exp oc exp I-- oc exp I — 1 1 ~P2+~ti2~ 2d2 2b2 a2b2 b2 + a2 This proves already that the distribution for 6 b2 a2 b2 9 ~ N b2 + 0, the relevance of a single opinion is still increasing, and this corresponds to a 100% relevance of x in the case a = 0. This is in accordance with the interpretation that Bayesian statistics is the probability extension of the standard discrete mathematical logic. If the variance a is close to zero, then it is almost certain that the opinion of any student precisely describes the opinion of the whole population. In subsection 10.3.4, we worked with the sample mean X of the collected data. This can be used in the previous calculation, since the mean also has a normal distribution, too. The expected value is the same, and the only difference is that a2 jn is substituted instead of a2. To facilitate the notation, define the constant _ nb2 n nb2 + a2 The a posteriori estimate for 6 based on the found sample mean X has the distribution with parameters 6 ~ N(c„X + (1 - cn)a, cna2/n). As could be expected, for increasing n, the expected value of the distribution for 6 approaches the sample mean, and its variance approaches zero. In other words, the higher the value of n, the closer is the point estimate from the frequentist point of view. A contribution of the Bayesian approach is that if the estimated distribution is used, questions of the kind: "What is the probability that the new lecturer is worse than the old one?" can be answered. Use the same data as in 10.3.4 and supplement the necessary a priori data. Assume that the lecturers are assessed quite well (otherwise, they would probably not be teaching at the university at all). For concreteness, select the a priori distribution with parameters a = 7.5, b = 2.5, and the standard deviation with a = 2. Continue with n = 15 and the sample mean of 5.133. Substitute this data, to get the a posteriori estimate for the distribution 9 ~ N(5.230, 0.256). We are interested in P(6 < 6). This is computed by evaluating the distribution function of the corresponding normal distribution for the input value 6 (Excel is capable of this, too). The answer is approximately 93.6 %. This is similar to the material in subsection 10.3.4, where the known variance is assumed constant. Note the influence of the a priori assumption about the distribution of the parameter 6 for all lecturers. To a certain extent, this reflects a faith that the lecturers are rather good. If a statistician has a reason for assuming that the actual expected value a for a specific lecturer is shifted, say a = 6 as in the survey about the previous lecturer, (this can be caused, 767 CHAPTER 10. STATISTICS AND PROBABILITY THEORY for example, by the fact that the lecture is hard and unpopular), then the probability of his actual parameter being less than 6 would be approximately 95.0 %. (If the expected value is considered to be significantly worse only when below 5.5, then the value would be only approximately 75 %). When substituting a = 5, the value is already 96.8 %. The variance b2 is also important. For instance, the a priori estimate a = 6, b = 3.5 leads to probability 95.2 %. In the above discussion, another very important point is touched on - sensitivity analysis. It would seem desirable that a small change of the a priori assumption has only a small effect on the a posteriori result. It appears that this is so in this example; however, we omit further discussion here. The same model with exponential distributions is used in practice when judging the relevance of the output of an IQ test of an individual person. It can also be used for another similar exam where it is expected that the normal distribution approximates well the probability distribution of the results. In both cases, there is an a priori assumption to which group he/she should belong. Other good examples (with different distributions) are practical problems from insurance industry, where it is purposeful to estimate the parameters so that both the effects of the experiment upon an individual item and the altogether expectations over the population are included. 10.3.9. Notes on hypothesis testing. We return to deciding whether a given event does or does not occur in the context of frequentist statistics. We build on the approach from interval estimates, as presented above. Thus, consider a random vector X = (X1, Xn) (the result of a random sample), whose joint distribution function is Fx(x). A hypothesis is an arbitrary statement about the distribution which is determined by this distribution function. Usually, one formulates two hypothesis, denoted H0 and Ha-The former is traditionally called null hypothesis, and the latter is called alternative hypothesis. The result of the test is then a decision based on a concrete realization of the random vector X (a test) whether the hypothesis H0 is to be rejected or not in favor of the hypothesis Ha ■ During this process, two types of errors may occur. Type I error occurs when H0 is rejected even though it is true. Type II error occurs when H0 is not rejected although it is false. The decision procedure of a frequentist statistician consists of selecting the critical region W, i.e., the set of test results when the hypothesis is rejected. The size of the critical region is chosen so that a true hypothesis is rejected with probability not greater than a. This means that a fixed bound for the probability of the type I error is required: the significance level a. The most common choices are a = 0.05 or a = 0.01. It is also useful in practice to determine the least possible significance level p for which the hypothesis is rejected; the p-value of the test. It remains to find a reasonable procedure for choosing the critical range. This should hopefully be done so that the 768 CHAPTER 10. STATISTICS AND PROBABILITY THEORY type II error occurs as rarely as possible. Usually, it is convenient to consider the likelihood function f(x,6), defined for a random vector X in subsection 10.3.6. For the sake of simplicity, assume there is a one-dimensional parameter 6, and formulate the null hypothesis as X being given by the function f(x,80), while the alternative hypothesis is given by the distribution f(x, 6i) for fixed distinct values 80 and 81. Ideas about rejecting or accepting the hypotheses suggest that when substituting the values of a specific test into the likelihood function, the hypothesis can be accepted if f(x, 80) is much greater than f(x, 6i). This suggests considering, for each constant c > 0, the critical range Wc = {x; f{x,9i)>cf{x,90)}. Having chosen the significance level, choose c so that / f(x,60) = a. Jwc This guarantees that for the test result x e Wc, when H0 is valid, the type I error occurs with at most the prescribed probability. This can also be guaranteed by other critical ranges W which also satisfy / f(x,60) = a. Jw On the other hand, type II errors are also of interest. That is, it is desired to maximize the probability of Ha over the critical range. Thus, consider the difference D= f /(z,0i)- / /(Mi) Jwc Jw for arbitrary W as above. The regions over which integration is carried out, can be divided into the common part W nWc and the remaining set differences. The contributions of the common part are subtracted, and there remains D= f /(Mi) - / /(Mi)-Jwc\w Jw\wc Using the definition of the critical range Wc, (again, put back the same integrals over the common part) D>cf /(Mo)-c/ /(Mo) = Jwc\w Jw\wc = C /(Mo)-c/ /(Mo) = CQ - CQ = 0. Jwc Jw Thus is proved an important statement, the Neyman-Pearson lemma: Proposition. Under the above assumptions, Wc is the optimal critical range which minimizes occurrence of the type II error at a given significance level. 769 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 10.3.10. Example. The interval estimate, as illustrated on an example in subsection 10.3.4, is a special case of hypothesis testing, when H0 had the - form "the expected value of the satisfaction with the course remained no"< while Ha says that it is equal to a different value hi- The general procedure mentioned above leads in this case to the critical range given by = X - up > z(a/2). > a Note that in the definition of the critical range, the actual value Hi is not essential. In the context of classical probability, the decision at a given level a whether or not there is a change to the expected value fi is thus formalized. To test only whether the satisfaction is decreased, assume beforehand that hi < ho- We analyze this case thoroughly: The critical range from the Neyman-Pearson lemma is determined by the inequality f(x,n0,a2) Take logarithms and rearrange to obtain 2a2 2x(ni - Ho) - (pl - l4) ^ — lnc- Since hi < Ak)> it follows that - , Hi+Vo . o2 j x <----h —-r In c = y. 2 n(Hi-Ho) For a given level a, the constant c, and thereby the decisive parameter y, are determined so that, under the assumption that Hq is true, a = P(X < y) = P(^_i^ < ^°v^). By assuming that H0 is true, a so the requirement means choosing Z < —z(a), which determines uniquely the optimal Wc. Note that this critical region is independent of the chosen value hi< and the actual value for y did not have to be expressed at all. It was only essential to assume that hi < Po-In the illustrative example from subsection 10.3.4, Hq : h = 6, and the alternative hypothesis is Ha ■ h < 6. The variance is a2 = 4. The test withn = 15 yielded a; = 5.133. Substitute this, to get the value z = 5-1343"6 Vl5 = -1.678, while -2(0.05) = -1.645. Therefore, if we are testing whether the new teacher is even worth than the previous one, we reject the hypothesis at the level of 5 %, deducing that the students' opinions are really worse. If, for the critical range, the union of the critical ranges for the cases hi < Po and hi > Po are chosen, the same results as for the interval estimate are obtained, as mentioned above. 770 CHAPTER 10. STATISTICS AND PROBABILITY THEORY We remark that in the Bayesian approach, it is also possible to accept or reject hypotheses in a direct connection to the a posteriori probability of events, as was, to certain extent, indicated in subsection 10.3.8 where our specific example is interpreted. 10.3.11. Linear models. As is usual in the analysis of math-1+ i, ematical problems, either we deal with linear depen-"pv^ dencies and objects, or we discuss their linearizations. fjJ^ft In statistics, many methods belong to the linear mod-1 els, too. We consider a quite general scheme of this type. Consider a random vector Y = (Y1,..., Yn)T and suppose Y = X ■ f3 + aZ, where X = (x{j) is a constant real-valued matrix with n rows and k < n columns, whose rank is /.-, >' is an unknown constant vector of k parameters of the model, Z is a random vector whose n components have distribution N(0,1), and a > 0 is an unknown positive parameter of the model. This is a linear model with full rank. In practice, the variables Xij are often known. The problem is to estimate or predict the value of Y. For instance, iy can express the grade in maths of the i-th student in the j-th semester (j = 1,2,3), and we want to know how this student will fare in the fourth semester. For this purpose, the vector (3 needs to be known. This can be estimated using complete observations, that is, from the knowledge of Y (from the results of past years, for example). In order to estimate the vector (3, the least squares method can often be used. This means looking for the estimate b e Kfe for which the vector Y = Xb minimizes the squared length of the vector Y — Xj3. This is a simple problem from linear algebra, looking for the orthogonal projection of the vector Y onto the subspace span X c K™ generated by the columns of the matrix X. This is minimizing the function n k \\Y-Xf3f = J2(Y-J2xtJf3J)2. i=l j=l Choose an arbitrary orthonormal basis of the vector subspace span X and write it into columns of the matrix P. For any choice of basis, the orthogonal projection is realized as multiplication by the matrix PPT. In the subspace spanX the mapping given by this matrix is the identity. That is, Y = PPTY = PPT(Xf3 + aZ) =Xf3 + aPPT Z. The matrix PPT is positive-semidefinite. Extend the basis consisting of the columns of P to an orthonormal basis of the whole space Rn. In other words, create a matrix Q = (PR) by writing the newly added basis vectors into the matrix R with n — k columns and n rows. Denote by V = PTZ and U = RTZ the random vectors with k and n — k components, respectively. They are orthogonal, and their sum in R™ is the vector (VT UT)T = QTZ 111 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Clearly (see subsection 10.2.46), both vectors V and U have multivariate normal distribution with zero i, expected value and identity covariance matrix. The random vector Y is decomposed to the sum of a constant vector X/3 and two orthogonal projections Y = X/3 + aPV + aRU, and the desired orthogonal projection is the sum of the first and second summands. In subsection 10.2.46, the distribution of such random vectors is also derived. The size of \\Y — Y\\2 is called the residual sum of squares, sometimes denoted by RSS. Also, the residual variance is defined as s2 _ \\Y-Xb\\*_ n — k Recall that Y = Xb and that X7 X is invertible as the full rank of X is assumed. Thus b = (X7, X) ~ 1XT Y can be computed. At the same time, X7 (Y - Y) = aX? (RU) = 0, since the columns of X and R are mutually orthogonal. Therefore, (1) b = (XTX)-1XTY. The chosen matrix P can be used with advantage. Since its columns generate the same subspace as the columns of X, there is a square matrix T such that X = PT (its columns are the coefficients of linear combinations which are expressed by the columns of X in the basis of P). Substitute (using the fact that PTP is the identity matrix and T is invertible): b = (TTPTPT)-1TTPTY = = T-l(TTylTTpT(pTp + aZ) = = /3 + aT-1V. Thus is proved the main properties of the linear model: Theorem. Consider a linear model Y = X(3 + aZ. (1) For the estimate of Y, Y = X/3 + aPV, Y ~ N(X/3, a2PPT). (2) The residual sum of squares and the normed square of the residue size have distributions: Y-Y~ N(0, a2RRT), \\Y - Y\\2/a2 ~ X2n_k. (3) The random variable b = (3 + oT~xV has distribution b~N(f3, tn-i(a). There is another simple application of the general model, which is called the paired t-test. It is appropriate for cases when pairs of random vectors W\ = (Wn) and W2 = (Wi2) are tested. The differences Y{ = Wn — Wl2 of their components have distribution N(/3, a2). In addition, the variables Yj need to be independent (which does not mean that the individual pairs Wn and Wa have to be independent!). In the context of our illustrative example from 10.3.4, we can imagine the assessment of two lecturers by the same student. Test the hypothesis that for every i, E Wn = E Wi2. Thus, use the statistic rj, _ W\ — W2 S Finally, we consider an example with more parameters. It is a classical case of the regression line. 774 CHAPTER 10. STATISTICS AND PROBABILITY THEORY Assume that the variables Yt, i = 1,..., n have distribution N(/30 + PiXi, a2), where x{ are given constant. Examine the best approximation Yi = b0 + hxi, and the matrix X of the corresponding linear model is XT=,1 1 ••• 1 Kx1 x2 ... xn Substitute into general formulae, and compute the estimate &o\ ( n nx \ V tiY bi)~\nx Ya=ixV \Ya=i^ It follows that ~x 1 J\Li=ixiYi Finally, compute b0 = Y — b\x. From the calculations, n varbi = a2J /~^(xj — x)2. i=l In order to test the hypothesis whether the expected value of a variable Y does not depend on x, that is, whether or not H0 is of the form f31 = 0, use the statistic h ( n \ V2 T = 7T f Yl(Xt ~ x12 J ~t«-2- The statistical analysis of multiple regression is similar. There are several sets of values iy to evaluate the statistical relevance of the approximation Yi = b0 + bxxxi H-----h bkxki. The individual statistics I) allow for a t-test of dependence of the regression on the individual parameters. Software packages often provide a parameter which expresses how well the values Y{ are approximated. It is called the coefficient of determination: RSS Rz = 1 IZU^-Y)2 10.3.13. In practice, problems are often met where the distributions of the statistical data sets are either completely unknown or errors are assumed in '1*'-"" .' the model, together with deviations with nonzero expected value and a non-normal distribution. In these cases, application of classical frequentist statistics is very hard or even totally impossible. There are approaches which work directly with the sample set. Then derive statistics of point or interval estimates or probability calculations about the above, including the evaluation of standard errors. One of the pioneering articles of this topic is the brief work of Bradley Efron of Stanford University, published in 775 CHAPTER 10. STATISTICS AND PROBABILITY THEORY 1981: Nonparametric estimates of standard error: The jack-knife, the bootstrap and other methods^. The keywords of this article are: Balanced repeated replications; Bootstrap; Delta method; Half-sampling; lackknife; Infinitesimal jack-knife; Influence function. The procedure used in the bootstrap method uses software resources, created from a given data sample, and new data samples of the same size (with replacement). The desired statistics (sample mean, variance, etc.) is then examined for each of them. After a great number of executions of this procedure, a data sample is obtained which is considered a relevant approximation of the probability distribution of the examined statistic. The characteristics of this data set is considered a good approximation of the characteristics of the examined statistics for point or interval estimates, analysis of variance, etc. There is not enough space for a more detailed analysis of these techniques, which is the foundations of non-parametric methods in contemporary software statistical tools. 776 4Biometrika (1981), 68, 3, pp. 589-99 CHAPTER 11 Number theory God created the integers, all else is the work of man. Leopold Kronecker A. Basic properties of divisibility Divisibility of natural numbers. Let us recall the basic properties of divisibility, whose proof follows directly from the definition: the integer 0 is divisible by every integer; the only integer that is divisible by 0 is 0; every integer a satisfies a | a; every triple of integers a,b,c satisfies the following four implications: In this chapter, we will deal with problems concerning integers, which include mainly divisibility and solving equations whose domain will be the set of integers (or natural numbers) (in this chapter, unlike in the other parts of this book, we will not consider zero to be a natural number, as is usual in this field of mathematics). Although the natural numbers and the integers are, from a certain point of view, the simplest mathematical structure, examination of their properties yielded a good deal of tough problems for generations of mathematicians. These are often problems which can be formulated quite easily, yet many of them remain unsolved so far. We will introduce the most popular of them: • twin primes - the problem is to decide whether there are infinitely many primes p such that p + 2 is also a prime,1 • Sophie Germain primes - the problem is to decide whether there are infinitely many primes p such that 2p + 1 is also a prime, • existence of an odd perfect integer - i.e., the sum of whose divisors equals twice the integer, • Goldbach's conjecture - the problem is to decide whether every even integer greater than 2 can be expressed as the sum of two primes, • a jewel among the problems of number theory: Fermat's Last Theorem - the problem is to decide whether there are natural numbers n, x, y, z such that n > 2 and xn + yn = zn\ Pierre de Fermat formulated this problem as early as in 1637; much effort of many generations was put to this question, and it was solved (using results of various fields of mathematics) by Andrew Wiles in 1995. a a|b A b\c \b A a | c 0 b A b > 0 a | c, a | b + c A a (a | b -4=>- ac a 0. Now, n — 1 | 2 implies that ti — 1 = 1 or ti — 1 = 2, whence n = 2 or n = 3. The wanted property is thus possessed only by the natural numbers 2 and 3. □ 11.A.2. Prove that for any a G Z, the following holds: i) a2 leaves remainder 0 or 1 when divided by 4; ii) a2 leaves remainder 0,1, or 4 when divided by 8; iii) a4 leaves remainder 0 or 1 when divided by 16. Solution. • It follows from the Euclidean division theorem that every integer a can be written uniquely in either the form a = 2k or a = 2k + 1. Squaring this leads to a2 = 4k2 or a2 = 4(fc2 + k) + 1, which is what we wanted to prove. • Making use of the above result, we immediately obtain the statement for the (even) integers of the form a = 2k. Back then, we arrived at a2 = 4k(k + 1) + 1 for odd integers a; we get the proposition easily if we realize that k(k + 1) is surely even. • Again, we utilize the result of the previous parts, i.e. a2 = 41 or a2 = 8£ + 1. Squaring these equalities once again, we get a4 = (a2)2 = 1Q£2 for a even, and a4 = (a2)2 = (S£ + l)2 = 64£2 + 16£ + 1 = 16(4£2 +£) + ! for a odd. ^ 11.A.3. Prove that if integers a,b G Z leave remainder 1 when divided by an m G N, then so does their product a6. Solution. By the Euclidean division theorem, there are s, t G Z such that a = sm + 1,6 = im + 1. Multiplying these equalities leads to the expression ab = (sTTl + l)(tm + 1) = (S7J771 + S + 7j)771 + 1, where stm+s+t is the quotient, so the remainder of ab when divided by m is equal to 1. □ It follows from the Euclidean division theorem that the greatest common divisor of any pair of integers a, b exists, is unique, and can be computed efficiently by the Euclidean algorithm. At the same time, the coefficients into Bezout's identity can be determined this way (such integers k, I that ka + lb = (a, 6)). It can also be easily proved straight from One of the most important properties of the integers, of which we will often take advantage, is the unique Euclidean division (division with remainder). Theorem. For any integers a G Z, m G N, there exists a unique pair of integers q G Z, r G {0,1,..., m — 1} satisfying a = qm + r. Proof. First, we will prove the existence of the integers q, r. Let us fix a natural number m and prove the statement for any a G Z. First, we assume that a is non-negative and prove the existence of the integers q, r by induction on a: If 0 < a < m, we can choose q = 0, r = a, and the equality a = qm + r holds trivially. Now, suppose that a > m and the existence of the integers q, r has been proved for all a' G {0,1, 2,..., a — 1}. In particular, for a' = a — m > 0, there are q',r' such that a' = q'm + r' and r' G {0,1,..., m — 1}. Therefore, if we select q = q' + 1, r = r', we obtain a = a' + m = (q' + l)m + r' = qm + r, which is what we wanted to prove. Now, if a is negative, then we have proved that for the positive integer —a, there are q' G Z, r' G {0,1,..., m — 1} such that —a = g'm + r'. If r' = 0, we set r = 0, q = —g'; otherwise (i.e., r' > 0), we put r = m — r'',q= —q' — 1. In either case, we get a = q ■ m + r. Therefore, the integers q, r with the wanted properties exist for every a G Z, m G N. Now, we will prove the uniqueness. Suppose that there are integers q±, q2 G Z and ri, r2 G {0,1,..., m — 1} which satisfy a = gim + ri = g27?i + r2. Simple rearrangement yields r\ — r2 = (g2 — qi)m, so m | ri — r2. However, we have 0 < r1 < m and 0 < r2 < m, whence it follows that —771 < r1 — r2 < m. Therefore, r1 — r2 = 0, and ( 0 of a and 6 which is divisible by every common divisor of the integers a, b is called the greatest common divisor of a and 6 and it is denoted by (a, 6) (or gcd(a, 6) for the sake of clarity). The concept of the least common multiple is defined dually and denoted by [a, 6] (or lcm(a, 6)). 778 CHAPTER 11. ELEMENTARY NUMBER THEORY the properties of divisibility that integer linear combinations of integers a, b are exactly the multiples of their greatest common divisor. 11.A.4. Find the greatest common divisor of the integers a = 10175, b = 2277 and determine the corresponding coefficients in Bezout's identity. Solution. We will invoke the Euclidean algorithm: 10175 = 4 2277+ 1067, 2277 = 2 1067+ 143, 1067 = 7 143 + 66, 143 = 2 66 + 11, 66 = 6 11 + 0. Therefore, 11 is the greatest common divisor. We will express this integer from the particular equalities, resulting in a linear combination of the integers a,b: 11 = 143 - 2 • 66 = 143 - 2 -(1067- 7-143) = -2 ■ 1067+ 15 ■ 143 = -2 ■ 1067 + 15 ■ (2277 - 2 ■ 1067) = 15 ■ 2277 - 32 ■ 1067 = 15 ■ 2277 - 32 ■ (10175 - 4 ■ 2277) = -32 ■ 10175 + 143 ■ 2277. The wanted expression in the form of Bezout's identity is thus 11 = (-32) ■ 10175 + 143-2277. □ The computation of the greatest common divisor using the Euclidean algorithm is quite fast even for relatively large integers. In our example, we will try this out with integers A, B, each of which will be the product of two 101-digit primes. Let us notice that the computation of the greatest common divisor of even such huge integers took an immeasurably small amount of time. A noticeable amount of time is taken by the computation of the greatest common divisor in the second example, where the input consists of two integers having more than a million digits. An example in the system SAGE : sage: p=next_prime (5* 10A100) sage: q=next_prime (3* 10A100) sage: r = next_prime (10A100) sage: A=p*q;B=q*r; sage: time G=gcd(A,B); print G It follows straight from the definition that for any a, b G Z, we have (a, 6) = (6, a), [a, 6] = [6, a], (a, 1) = 1, [a, 1] = |a|, (a,0) = |a|, [a, 0] = 0. So far, we have not shown that for every pair of integers a, b, their greatest common divisor and least common multiple exist. However, if we assume they exist, then they are unique because every pair of non-negative integers k, I satisfy (directly from the definition) that k | / and / | k imply k = I. However, in the general case of divisibility in integral domains, the situation is more complicated - see 12.2.9. Even in the case of the so-called Euclidean domains,2 which guarantee the existence of greatest common divisors, the result is determined uniquely up to the multiplication by a unit (an element having multiplicative inverse) - in the case of the integers, the result would be determined uniquely up to sign; the uniqueness was thus guaranteed by the condition that the greatest common divisor be non-negative. Theorem (Euclidean algorithm). Let a1,a2be positive inte- gers. For every n > 3 such that a„_i 0, let an denote the remainder of the division of a„_2 by a„_i. Then, after a finite number of steps, we arrive at ak = 0, and it holds that ak-i = (ai,a2). Proof. By the Euclidean division, we have a2 > a3 > a4 > .... Since these are non-negative integers, this decreasing sequence cannot be infinite, so we get a^ = 0 after a finite number of steps, where a^-i =^ 0. From the definition of the integers an, it follows that there are integers q1,q2,..., qk-2 such that a-i = 9i ' a2 + a3, a2 = 92 ' «3 + «4, ak-3 = 9fc-3 ' ak-2 + O.k-1, ak-2 = 9fc-2 ' O-k-1- It follows from the last equality that a^-i | ak-2- Further, afc_i | afc_3, ..., afc_i | a2, afc_i | ai. Therefore, afc_i is a common divisor of the integers a1, a2. On the other hand, any common divisor of the given integers a1,a2 divides the integer a3 = a1 — q1a2 as well, hence it also divides a4 = a2 — q2a3, a5,..., and especially a^-i = afc_3 — qk-3ak-2. We have thus proved that a^-i is the greatest common divisor of the integers a1, a2. □ It follows from the previous statement and the fact that (a, 6) = (a, —6) = (—a, 6) = (—a, —6) holds for any a, b e Z that every pair of integers has a greatest common divisor. Moreover, the Euclidean algorithm provides another interesting statement, which is often used. 11.1.3. Theorem (Bezout). For every pair of integers a, b, there exist integers k, I such that (a, b) = ka + lb. 'Wikipedia, Euclidean domain, http://en.wikipedia.org/ wiki/Euclidean_domain (as of July 29, 2017). 779 CHAPTER 11. ELEMENTARY NUMBER THEORY Time : CPU 0.00 s, Wall : 0.00 s 300000000000000000000000000000000000\ 000000000000000000000000000000000000\ 00000000000000000000000000223 sage: time G=gcd (AM0000+1 ,BA10000+1); Time : CPU 2.47 s, Wall : 2.48 s 11.A.5. Find the greatest common divisor of the integers 249 — 1 and 235 — 1, and determine the corresponding coefficients in Bezout's identity. Solution. Again, we use the Euclidean algorithm. We get: 249 -l = 214(235 -l) + 214-l, 235 - 1 = (221 + 27)(214 - 1) + 27 - 1, 214 - 1 = (27 + 1)(27 - 1). The wanted greatest common divisor is thus 27 — 1 = 127. Let us notice that 7 = (49,35) - see also the following exercise 11.A.6. Reversing this procedure, we find the coefficients k,£ into Bezout's identity 27 - 1 = fc(249 - 1) + £(235 - 1): 27 - 1 = (235 - 1) - (221 + 27)(214 - 1) = (235 - 1) - (221 + 27)((249 - 1) - 214(235 - 1)) = (235 + 221 + 1)(235 - 1) - (221 + 27)(249 - 1). Therefore, k = -(221 + 2r),£ = 235 + 221 + 1. Let us bear in mind that these coefficients are never determined uniquely. □ 11.A.6. Now, let us try to generalize the result of the previous exercise, i.e., prove that it holds for any a,m,n e N, a=£l, that (am - 1, an - 1) = a(m'n) - 1. Solution. This statement follows easily from the fact that any pair of natural numbers k, £ satisfies ak — 1 | ae — 1 if and only if k | £. This can be proved by dividing the integer £ by the integer k with remainder, i.e., we set £ = kq + r, where q, r e No, r < k, and consider that akq+r-l = (ak-l)(ak{q-^+r+ak{q-V+r+- ■ ■+ar)+ar-l is the division of the integer akq+r — 1 by the integer ak — 1 with remainder (apparently, we have ar—1 < ak — 1). Hence we can easily see that the remainder r is zero if and only if the remainder ar — 1 is zero, which is what we wanted to prove. □ Proof. Surely it suffices to prove the theorem for a, b e N. Let us notice that if it is possible to express integers r, s in theformr = ria+r2b, s = sia+s2b, whereri, r2, si, s2 £ Z, then we can also express r + s = (r1 + si)a + (r2 + s2)b in this way as well as c ■ r = (c ■ ri)a + (c ■ r2)b for any c e Z, and thus also any integer linear combination of the numbers r and s arising in the process of Euclidean algorithm. It follows from the Euclidean algorithm (for a\ = a, a2 = b) that we can also thus express a3 = a1 — q±a2, «4 = «2 — 1, is prime iff n is not divisible by any prime p < yTi. Solution. For n composite we have n = ab with appropriate a,b > 1. If we admitted a,b > ^/n then we would have n = ab > y^n ■ ^/n = n. Therefore n has a divisor (and thus also a prime divisor) not greater than y/n. □ The theoretical part contains Euclid's proof of the infinitude of primes and deals in detail with the distribution of primes in the set of natural numbers (in some cases, however, we were forced to leave the mentioned theorems unproved). Now, we will give several exercises on this topic. 11.A.10. For any natural number n > 3, there is at least one N\^v prime between the integers n and n\. isL^ Solution. Let p denote an arbitrary prime dividing ^ the integer n\ — 1 (by the Fundamental theorem of arithmetic (11.2.2), there is such a prime since n\ — 1 > 1). If we had p 2 has at least two positive divisors: 1 and itself. If there are no other divisors, it is called a prime (number). Otherwise (i.e., if there exist other divisors), we talk about a composite (number). In the subsequent paragraphs, we will usually denote primes by the letter p. The first few primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37,... (in particular, the number 1 is considered to be neither prime nor composite as it is a unit in the ring of the integers). As we will prove shortly, there are infinitely many primes. However, we have rather Umited computational resources when it comes to determining whether a given number is prime: The number 282 589 933 — 1, which is the greatest known prime as of 2018, has only 24 862 048 digits, so its decimal representation would fit into many a prehistoric data storage device. Printing it as a book would, however, (assuming 60 rows on a page and 80 digits in a row) take 5180 pages. Now, let us introduce a theorem which gives a necessary and sufficient condition for being prime and is thus a fundamental ingredient for the proof of the unique factorization theorem. 11.2.1. Theorem (Euclid's on primes). An integer p > 2 is a prime if and only if the following holds: for every pair of integers a,b, p | ab implies that either p | aorp | b(orboth). Proof. " =>■ " Suppose that pis a prime and p | ab, where a,b eZ. Since (p, a) is a positive divisor of p, we have either (p, a) = p or (p, a) = 1. In the former case, we get p | a; in the latter, p | b by part (2) of the previous lemma. " <= " If p is not a prime, it has a positive divisor distinct from both 1 and p. Let us denote it by a. However, then we have b = ^ e N and p = ab, hence \ ■ (p \ a or p \ b) are always irreducible, but the contrary is not generally true. Let us mention at least one example of an ambiguous factorization -in Z[V^5], we have:4 6 = 2 • 3 = (1 + V^5) ■ (1 - V^). However, it needs a longer discussion to verify that all of the mentioned factors are really irreducible in Zfy7—5]. Proof of the Fundamental theorem of arithmetic. First, we prove by complete induction on n that every natural number n can be expressed as a product of primes. We have already discussed the validity of this statement for n = 1. Now, let us suppose that n > 2 and we have already proved that all natural numbers less than n can be factored to primes. If n is a prime, the statement is clearly true. If n is not a prime, then it has a divisor d, 1 < d < n. Denoting e = n/d, we also have 1 < e < n. From the induction hypothesis, we get that both d and e can be expressed as products of primes, so their product d- e = n can also be expressed in this way. To prove uniqueness, let us have an equality of products n — PvP2 ■ ■ -ps = 1, the integer p\ would have a divisor q\ such that 1 < q\ < p\ (since q2qz ■ ■ ■ qt > 1), which is impossible. Therefore, we must have t = \ and p\ = q\. Now, let us suppose that s > 2 and the proposition holds for s — 1. It follows from the equality pi ■ p2 ■ ■ ■ ps = qi ■ q2 ■ ■ ■ qt that ps divides the product q± ■ ■ ■ qt, which is, by Euclid's theorem, possible only if ps divides some gj(and after relabeling we may assume j = t). Since qj is a prime, it follows that ps = qt- Dividing both sides of the original equality by this integer, we obtain p\ ■ p2 ■ ■ ■ ps-\ = qi ■ q2 ■ ■ ■ qt-i, and from the induction hypothesis, we get s — 1 = t — 1, pi = qi,... ,ps-i = qs-i- Altogether, we have s = t and pi = qi,... ,ps-i = qs-i,ps = qs- This proves the uniqueness, and thus the entire theorem as well. □ 11.2.3. Prime distribution. There are two facts about the distribution of prime numbers. The first is that, [they are] the most arbitrary and ornery objects studied by mathematicians: they grow like weeds among the natural numbers, seeming to obey no other law than that of chance, and nobody can predict where the next one will sprout. The second fact is even more astonishing, for it states just the opposite: that the prime numbers exhibit stunning regularity, that there are laws governing their behavior, and that they obey these laws with almost military precision. LM. Agrawal, N. Kayal, N. Saxena. PRIMES is in P. Annals of Mathematics 160 (2): 781-793. 2004. 2See http://www.rsasecurity.com/rsalabs/node. asp?id=2093. The symbol 5] denotes the integers extended by a root of the equation x2 — —5, which is defined similarly as we obtained the complex numbers by adjoining the number V-1 to the reals. 783 CHAPTER 11. ELEMENTARY NUMBER THEORY factorizations. Every positive divisor of an integer Pi' p^k is of the form p^1.....p*, where /3i,...,/3fc G N0 and/3i < a.ufa 1, there is at least one prime p satisfying n < p < 2n. Primes are distributed quite uniformly in the sense that in any "reasonable" arithmetic sequence (i.e. such that its terms are coprime), there are infinitely many of them. For instance, considering the remainders upon division by 4, there are infinitely many primes with remainder 1 as well as infinitely many primes with remainder 3 (of course, there is no prime with remainder 0 and only one prime with remainder 2). The situation is analogous for remainders upon division by other integers, as explained by the following theorem, whose proof is very difficult. 11.2.5. Theorem (Dirichlet's on primes). If a, m are coprime natural numbers, there are infinitely many primes k such that mk + a is a prime. In other words, there are infinitely many primes among the integers 1 ■ m + a, 2 ■ m + a, 3 ■ m + a, .... We can at least mention a proof of a special case of this theorem, which is a modification of Euler's proof of the infinitude of primes. Proposition. There are infinitely many primes of the form Ak + 3, where k G N0. Proof. Suppose the contrary, i.e., there are only finitely many primes of this form, and let them be denoted by pi = 3, P2 = 7, P3 = 11, ..., pn. Further, let us set N = 4p2 ■ pz ■ —pn +3. Factoring 7Y, the product must contain (according to the result of exercise 11.A.3) at least one prime p of the form Ak + 3. If not, 7Y would be a product of only primes of See Wiklpedia, Proof of Bertrand's postulate, http://en. wikipedia.org/wiki/Proof_of_Bertrand' s_postulate (as of July 29, 2017) or see M. Algner, G. Ziegler, Proofs from THE BOOK, Springer, 2009. 784 CHAPTER 11. ELEMENTARY NUMBER THEORY Solution. If a = 2q~1{2q - 1), where p = 2« - 1 is a prime, then the previous statement yields a(a) = ■ (p + 1) = (2« - 1) • 2« = 2a. Such an integer a is thus a perfect number. For the opposite direction, consider any even perfect number a, and let us write a = 2k ■ m, where m, k e N and 2\m. Since the function a is multiplicative (see 11.3.2), we have a {a) = o(2k) ■ a(m) = (2k+1 - 1) ■ tr(m). However, it follows from a being perfect that a (a) = 2a = 2k+1 ■ m, whence 2k+1 ■m=(2k+1 -l)-o-(m). Since 2k+1 — 1 is odd, we must have 2k+1 — 1 m, so we can lay m = (2k+1 — 1) ■ n for an appropriate n G N. Rearranging leads to 2k+1 ■ n = cr(m). Both m and n divide m (and since ^ = 2k+1 — 1 > 1, these integers are different), hence 2k+1 ■ n = a{ra) >m + n = 2k+1 ■ n, and so oo. n(x) The following table illustrates how good the asymptotic estimate n(x) ~ x/\a(x) is in several concrete instances in reality: X n(x) x/\a{x) relative error 100 25 21.71 0.13 1000 168 144.76 0.13 10000 1229 1085.73 0.11 100000 9592 8685.88 0.09 500000 41538 38102.89 0.08 The density of primes among the natural numbers is also partially described by the following result by Eu-ler. Proposition. Let P denote the set of all primes, then y i = oo. Remark. On the other hand, J^nGN ~ IT' which means that the primes are distributed more "densely" in N than squares. Proof. Let n be an arbitrary natural number and pi,..., Ptt^u) all primes less than or equal to n. Let us set 785 CHAPTER 11. ELEMENTARY NUMBER THEORY categories were paid in 2000 and 2009, respectively, to the GIMPS project in both cases. Apparently, it will take a while before the other prizes are awarded.) B. Congruences In this paragraph, we will see in practice how wielding basic operations with congruences can improve the expressing of our reasonings about various problems: We would be able to solve them without congruences, using only the basic properties of divisibility. However, with the help of congruences, our proofs will often be much shorter and clearer. ll.B.l. Show that, For any a, b e Z, ra G N, the following conditions are equivalent: i) a = b (mod ra), ii) a = b + mt for an appropriate ieZ, iii) m | a — b. Solution. (1) => (3) If a = q\ra + r,b = q2ra + r, then a-b=(q1- q2)m. (3) => (2) If ra | a — b, then there is a t e Z such that m ■ t = a — b, i.e., a = b + rat. (2) => (1) If a = b + rat, then, expressing b = raq + r, it follows that a = m(q + t) + r. Therefore, a and b share the same remainder r upon division by ra, i.e., a = b (mod ra). □ 11.B.2. Prove fundamental properties of congruences stated in 11.3.1. Solution. i) If a = b (mod ra) and c = d (mod ra), by the previous lemma, there are integers s, t such that a = b + ras, c = d+mt. However, then we have a + c = b+d+m(s + t), and, by the lemma again, a + c = b + d (mod m). Adding a congruence a = b (mod m) to rak = 0 (mod m), which is clearly valid, leads to a + rak = b (mod m). ii) If a = b (mod m) and c = d (mod m), there are integers s, t such that a = b + ras, c = d + rat. Then, ac= (b + ras)(d + rat) = bd + ra(bt + ds + rast), whence we get ac = bd (mod ra). iii) Let a = b (mod ra) and n be a natural number. Since an - bn = (a - &)(a™-1 + an~2b +■■■+ bn~v), it follows that an = bn (mod ra) as well. The particular factors can be perceived as sums of geometric series, hence i=l V,=0ft J Pi Pir(n) where we sum over all 7r(n)-tuples of non-negative integers (qi, ..., aT(n)). Since every integer not exceeding n factors to only primes in the set {p±,... all of them are in- cluded in this sum. Therefore, \(n) > l + and since the harmonic series is divergent (see 5.H.3), we also have limn_j.00 A(n) = oo. Taking into account the expansion of the function ln(l + x) to a power series (see 6.C.1), we further get ir(n) ir(n) oo lnA(n) = -Eln (l"^) = EE (m^)_1 = i=l i=l m=l 7r(n) oo ^rV-ip^ + EE^rr1- i=l m=2 Since the inner sum can be bound from above by oo oo J2(mpT)-1 1 for some i (i.e. if n is divisible by a square), and (—l)k otherwise. Further, we define /i(l) = 1 (in accordance with the convention that 1 factors to the product of zero primes). Example. /i(4) = /i(22) = 0,/i(6) = /i(2-3) = (-1)2 = 1, 0(2) = 0(3) = -1. 787 CHAPTER 11. ELEMENTARY NUMBER THEORY already know that 730 = —1 (mod 25), and we can easily calculate that 730 = (-1)30 = 1 (mod 4). Since (4,25) = 1, the wanted pair of digits is 49 (it leaves the desired remainder upon division by both 25 and 4). □ We now show how helpful the notion of congruence ^ can be in solving problems similar to one already solved in 11.A.7 (where the induction and binomial theorem were used). 11.B.4. Prove that for any n G N, the integer 37n+2 + 16n+l + 23n fe divisible by 7. Solution. We have 37 = 16 = 23 = 2 (mod 7), so by the basic properties of congruences, 37n+2 + 16n+i + = 2n+2 + 2n+1 + 2n = 2n(4+2+l) = 0 (mod 7). □ 11.B.5. Prove that the integer n = (8355 + 6)18 - 1 is divisible by 112. Solution. We factor 112 = 7 ■ 16. Since (7,16) = 1, it suffices to show that 7 | n and 16 | n. We have 835 = 2 (mod 7), so n = (25 + 6)18 - 1 = 3818 - 1 = 318 - 1 = 276 - 1 = (-1)6 - 1 = 0 (mod 7), hence 7 | n. Similarly, 835 = 3 (mod 16), so n = (35 + 6)18 - 1 = (3 ■ 81 + 6)18 - 1 = (3 ■ 1 + 6)18 - 1 = 918 - 1 = 819 - 1 = l9 - 1 = 0 (mod 16), hence 16 | n. Altogether, 112 | n, which was to be proved. □ 11.B.6. Prove that the following relations hold for prime p: i) IffcG{l,...,p-l},thenp (I). ii) If a, b G Z, then aP + ¥> = (a + b)p (mod p). Solution. i) Since the binomial coefficient satisfies fp\ _ p(p - 1) ■ ■ ■ (p - fc +1) Now, we prove several important properties of the Möbius function, especially the so-called Möbius inversion formula. Lemma. For all n eN\{l}, it holds that J2d\n M°0 = 0. Proof. Writing n as n = p"1 ■ ■ ■ p^k, then all divisors d of n am of the form d = p^1 ■ ■ ■ p^, where 0 < ßi < an for all i G {1,..., k}. Therefore, E p, [PT ---Pk (ß1,...,ßk)e(Nu{o})k 0<ß, 1 and I(n) = 1 for all 71 G N. Then, every arithmetic function / satisfies: foi = iof = f and (/o/)(7i) = (/o7)(7i) = E/(4 d\n Further, Io/i = /ioJ--2, since (I o p)(7i) = ^J(^ (§) = £ I (n) ß{d) = d\n d\n = E/i(d) = ° for all 71 >1 d\n by the lemma after the definition of the Möbius function (the statement is clearly true for 71 = 1). 788 CHAPTER 11. ELEMENTARY NUMBER THEORY ii) The binomial theorem implies that (a + by = ap + Q)ap-1b+--- + (p^ob"-1 + bp. Thanks to the previous item, we have (£) = 0 (mod p) for any k e {I,... ,p—1}, whence the statement follows easily. □ 11.B.7. Prove that for any natural numbers m, n and any integers a, b such that a = b (mod mn), it is true that am = bm (modm"+1). Solution. Since clearly m \ mn, we get that the congruence a = b (mod m) holds, invoking property (5) of 11.3.1. Therefore, considering the algebraic identity am-bm = (a-b)(am-1 + am-2b+--- + abm-2 + bm-1), all the summands in the second pair of parentheses are congruent to am_1 modulo m, so am-l + am-2h+, . . + = m . flm-l = q my Since mn divides a — b, and the sum am_1 + am~2 + ■ ■ ■ + fom-i js divisible by m, we get that mn+1 must divide their product, which means that am = bm (mod mn+1). □ 11.B.8. Using the result of the previous exercise (see also 11.A.2), prove that: i) integers a which are not divisible by 3 satisfy a3 = ±1 (mod 9), ii) odd integers a satisfy a4 = 1 (mod 16). Solution. i) Cubing the congruence a = ±1 (mod 3) (and, again, raising the exponent of the modulus), we get a3 = ±1 (mod 32). ii) This statement was proved already in the third part of exercise 11.A.2. Now, we will present another proof. Thanks to part (ii) of the mentioned exercise, we know that every odd integer a satisfies a2 = 1 (mod 23). Squaring this (and recalling the above exercise) leads to a4 = l2 (mod 24). □ 11.B.9. Divisibility rules. We can surely recall the basic rules of divisibility (at least by the numbers 2, 3' 4' 5' 6' ^ a ^ in terms of tne decimal rep-/^g/ / resentation of a given integer. However, how -can these rules be proved and can they be extended to other divisors as well? Theorem (Möbius inversion formula). Let an arithmetic function F be defined in terms of an arithmetic function f by F{n) — ~l2d\n /(°0- Then, the function f can be expressed as d\n Proof. The relation F(n) = J2d\n /(°0 can be rewrit-tenasr1 = fol. Therefore, F o [i = (/o7)o/i = fo(Iop) = /oj = /, which is the statement of the theorem. □ Definition. A multiplicative function on the natural numbers is such an arithmetic function which, for all pairs of coprime natural numbers a, b, satisfies f(a-b) = f(a)-f(b). Example. Multiplicative functions include, for instance, u(n), r(n), p,(n) (this can be verified easily from the definition) or, as we will prove shortly, the so-called (Euler) totient function p(72) = 72- (1 - ■§) • (1 -1) = 24, alternatively 3 is either divisible by an odd prime p (then, ip(n) is divisible p — 1, which is an even integer) or n is a (higher-than-first) power of two (and then, p(2a) = 2Q_1 is even Corollary. Let a,beN, (a, 6) = 1. Then p(a ■ b) = ip(a) ■ (d)"( Now, we will demonstrate the use of the Möbius inversion formula on a more complex example from the theory of finite fields. Let us consider ap-element field Fp (i.e., the ring of residue classes modulo a prime p) and examine the number Nd of monic irreducible polynomials of a given degree d over this field. Let Sd(x) denote the product of all such polynomials. Now, we borrow a (not very hard) theorem from the theory of finite fields which states that for all n e N, we have Y[Sd(a Confronting the degrees of the polynomial on both sides yields Pn = J2dNd, d\n 791 CHAPTER 11. ELEMENTARY NUMBER THEORY as well). Altogether, we have found out that p{n) is odd only for n = 1, 2. ii) The integer 2n +1 is odd, so (2,2n +1) = 1, and hence p(4n + 2) = p(2 ■ (2n + 1)) = p(2) ■ p(2n + 1) = p{2n + \). □ 11.B.13. Find all natural numbers m for which: ^ i) p(m) = 30, ii) (3) = 2, p(32) = y>(7) = 6,^(11) = 10 are all integers which divide 30 into an odd integer greater than 1. Therefore, if we had, for instance, m = 7 ■ m1, where 7 \ mi, then we would also have p(mi) = 5, which is impossible, as we know from the previous exercise. We thus get j3 = 7 = 8 = 0 and m = 2a ■ 31£, whence we can easily obtain the solution m G {31,62}. p In particular, we can see that for any n G N, it holds that Nn = ^ (pn — ■ ■ ■ + p(n)p) ^ 0 since the expression in the parentheses is a sum of distinct powers of p multiplied by coefficients ±1, so it cannot be equal to 0. Therefore, there exist irreducible polynomials over Fp of an arbitrary degree n, so there are finite fields Fp™ (having pn elements) for any prime p and natural number n (in the theory of field extensions, such a field is constructed as the quotient ring ¥p[x]/(f) of the ring of polynomials over Fp modulo the ideal generated by an irreducible polynomial / G Fp [x] of degree n, whose existence has just been proved). 11.3.3. Example. By the formula we have proved, the number of (monic) irreducible polynomials over ¥2 of degree 5 is equal to N5 Pl (p(l)-25+p(5)-2) =6. d\5 The number of monic irreducible polynomials over F3 of degree four is then ^4 = \ £d|4 p (2) 3d = \ (Ml) ■ 34 + P(2) ■ 32 + p^S1) = i(81-9) = 18- 11.3.4. Fermat's little theorem, Euler's theorem. These theorems belong to the most important results of elementary number theory, and they will often be applied in further theoretical as well as practical problems. Theorem (Fermat's little). Let a be an integer and pa prime, p\ a. Then, aP'1 = 1 (mod p). Proof. The statement will follow as a simple consequence of Euler's theorem (and together with this one, it is a consequence of more general Lagrange's theorem 12.3.10). However, it can be proved directly (by mathematical induction or a combinatorial means, as mentioned in exercise 11.B.15). □ Sometimes, Fermat's little theorem is presented in the following form, which is apparently equivalent to the original statement. Corollary. Let a be an integer and p a prime. Then, aP = a (mod p). Before formulating and proving Euler's theorem, we introduce a few useful concepts. 792 CHAPTER 11. ELEMENTARY NUMBER THEORY ii) Similarly to above, only primes p G {2, 3} can divide m, and the prime 3 can divide m only in the first power. However, since = 17, the prime 3 cannot divide m at all. The remaining possibility, m = 2Q, leads to 34 = 2Q_1, which is also impossible. Therefore, there is no such number m. iii) Now, every prime p dividing m must satisfy p—1 | 20, so p — 1 G {1, 2,4,5,10,20}, which is satisfied by primes p G {2,3,5,11}, and only 2 and 5 of those can divide m in higher power. We thus have m = 2Q3'Vll'5, where a G {0,1,2,3}, 7 G {0,1, 2}, /3, S G {0,1}. First, consider <5 = 1. Then, p(2a3l35r) = 2, whence we easily get that 7 = 0 and (a,0) G {(2,0), (1,1), (0,1)}, which gives three solutions: m G {44, 66, 33}. Further, let us have 5 = 0. If 7 = 2, then p{2a3<3) = 1, whence (a,0) G {(1,0), (0,0)}. We thus obtain two more solutions: m G {50,25}. If 7 = 1, then we get = 5, similarly to the above item. This is an odd integer, so we get no solutions in this case. This is also the case for 7 = 0 since the equation p(2a) = 20 has no solution, either. Altogether, there are five satisfactory values m G {25,33,44,50,66}. iv) This problem is of a different kind than the previous ones, so we must approach otherwise. The relation 1. Then, 1. n 11.B.14. Find all two-digit numbers n for which 9\p(n). Q Residue systems A complete residue system modulo m is an arbitrary m-tuple of integers which are pairwise incongru-ent modulo m (the most commonly used m-tuple is 0,1,..., m — 1 or, for odd m, its "symmetric" variation _m —1 _-I n -I m—1 \ A reduced residue system modulo m is an arbitrary —teger a and any prime p which does not divide a, it holds that ap_1 = 1 (mod p). Solution. First, we prove (by induction on a) that an apparently equivalent statement, ap = a (mod p), holds for any a G Z and prime p. For a = 1, there is nothing to prove. Further, let us assume that the proposition holds for a and prove its validity for a +1. It follows from the induction hypothesis and the exercise 11.B.6 that (a + l)p = ap+ lp = a + l (mod p), which is what we were to prove. The statement holds trivially for a = 0 as well as in the case a < 0,p = 2. The validity for a < 0 and p odd can be obtained easily from the above: since —a is a positive integer, we get — ap = (—a)p = —a (mod p), whence ap = a (mod p). The combinatorial proof is a somewhat "cunning" one: Similarly to problems using Burnside's lemma (see exercise 12. G.1), we are to determine how many necklaces can be created by wiring a given number of beads, of which there is a given number of types. Having a types of beads, there are clearly ap necklaces of length p, a of which consist of a single bead type. From now on, we will be interested only in the other ones, of which there are thus ap — a. Apparently, each necklace is transformed into itself by rotating by p beads. In general, a necklace can be transformed into itself by rotating by another number of beads, but this number can never be coprime to p (for instance, considering p = 8 and the necklace ABABABAB, rotations by 2,4, or 6 beads leave it unchanged). However, if p is a prime, it follows that all rotations lead to different necklaces. Therefore, if we do not distinguish necklaces which differ in rotation only (i.e., in the position of the "knot"), there are exactly ap -a P of them, which especially means that p | ap — a. As an example, let us consider the case a = 2,p = 5, i.e., necklaces of length 5, consisting of 2 bead types (A, B). There are 25 = 32 necklaces in total, 2 of which consist of a single bead type (AAAAA, BBBBB). Leaving them and modulus is surely not greater than p(m). As we will see later, the integers whose order is exactly s. Dividing the integer t — s by r with remainder, we get t — s = q ■ r + z, where q, z G No, 0 < z < r. " -<= " Since t = s (mod r), we have 2 = 0, hence at-s _ aqr _ ^ary = yi (moci my Multiplying both sides of the congruence by the integer as leads to the wanted statement. " => " It follows from a* = as (mod m) that as ■ aqr+z = as (moc[ my Since ar = 1 (mod m), we also have aqr+z = az (mod m). Altogether, after dividing both sides of the first congruence by the integer as (which is co-prime to the modulus), we get az = 1 (mod m). Since z < r, it follows from the definition of the order that 2 = 0, hence r | t — s. □ The above theorem and Euler's theorem apparently lead to the following corollary (whose second part is only a reformulation of Lagrange's theorem 12.3.10 for our situation): Corollary. Let m G N, a G Z, (a, m) = 1, and r be the order of a modulo m. (1) For any n G N U {0}, it holds that an = 1 (mod m) ^=4> r | n. (2) r | - bn = 1 (mod m). □ The last statement of this series connects the orders of two integers to the order of their product: Lemma. Letm G N, a,b G Z, (a,m) = (b,m) = 1. If a has order r and b has order s modulo m, where (r, s) = 1, then the integer a ■ b has order r ■ s modulo m. Proof. Let S denote the order of a ■ b. Then, (ab)s = 1 (mod m). Raising both sides of this congruence to the r-th power leads to arSbrS = 1 (mod m). Since r is the order of a, we have ar = 1 (mod m), i.e., brS = 1 (mod m), and so s I rS. From r being coprime to s, we get s | S. Analogously, we can get r | S, so (again utilizing that r, s are coprime) r ■ s I S. On the other hand, we clearly have (ab)rs = 1 (mod m), hence 5 \ rs. Altogether, 5 = rs. □ 11.3.7. Primitive roots. Among the integers coprime to a modulus m (i.e., the elements of a reduced residue system modulo m), the most important ones are those whose order is equal to xa is called the discrete logarithm or index of the integer a (with respect to a given modulus m and a fixed primitive root g), and it is a bijection between the sets {a G Z; (a,m) = 1,0 < a < m} and {x G Z; 0 < x < 1. The modulus m has primitive roots if and only if at least one of the following conditions holds: • m = 2 or m = 4, • mis a power of an odd prime, • m is twice a power of an odd prime. The proof of this theorem will be done in several steps. We can easily see that 1 is a primitive root modulo 2 and 3 is a primitive root modulo 4. Further, we will 1| show that primitive roots exist modulo any odd prime (in algebraic words, this is another proof of the fact that the group (Z^, •) of invertible residue classes modulo a prime m is cyclic; see also 12.3.8). Proposition. Let p be an odd prime. Then there are primitive roots modulo p. Proof. Let r 1, r2,..., rp_ 1 be the orders of the integers 1, 2,... ,p — 1 modulo p. Let S = [r1,r2, • • •, rP-i] be the least common multiple of these orders. We will show that there is an integer of order S among 1, 2,..., p — 1 and that 5 = p-l. Let S = q"1 ■ ■ ■ q^k be the factorization of S to primes. For every s G {1,..., k}, there is a c G {1,... ,p — 1} such that q"s I rc (otherwise, there would be a common multiple of the integers ri, r2,..., rp_i less than S). Therefore, there exists an integer b such that rc = b ■ q"s. Since c has order rc, the order of the integer gs is equal to q"s (by the is thus sufficient to determine the remainder of the exponent modulo 5. We have 1413 = (-1)13 = -1 = 4 (mod 5), so the wanted remainder is 44 = 28 = 256 = 6 — 5 + 2 = 3 (mod 11). Alternatively, we could have finish the calculation as follows: 44 = 4_1 = 3 (mod 11). theorem 11.3.6 on orders of powers). Reasoning analogously for any s G {1,..., k}, we get integers gi,..., gk, and we can set g := g\ ■ ■ ■ gk. From the properties of the order of a product, we get that the order of g is equal to the product of the orders of the integers g\,..., gk, i.e. to q°1 ■ ■ ■ ql"* = S. Now, we prove that S = p — l. Since the orders of the integers 1, 2,..., p — 1 divide S, we get the congruence xs = 1 (mod p) for any x G {1,2,... ,p — 1}. By theorem 11.4.8, there are at most S solutions to a congruence of degree S modulo a prime p (in algebraic words, we are actually looking for roots of a polynomial over a field, and there cannot be more of them than the degree of the polynomial, as we will see in part 12.2.4). On the other hand, we have already shown that this congruence has p—1 solutions, so necessarily S >p—l. Still, S is (being the order of g) a divisor of p — 1, whence we finally get the wanted equality S = p — 1. □ Now, we show that there are primitive roots modulo powers of odd primes. First, we prove two helping lemmas. Lemma. Let p be an odd prime, £ > 2 arbitrary. Then, it holds for any a G Z that -1 (1 + apf l + ap* (mod p Proof. This will follow easily from the binomial theorem using mathematical induction on I. I. The statement is clearly true for I = 2. II. Let the statement be true for I, and let us prove it for I + 1. CHAPTER 11. ELEMENTARY NUMBER THEORY □ 11.B.25. Determine the last two digits of the decimal expan-v^ii/", si°n °f the number 1414 . ^ Solution. We are interested in the remainder of the I? number a = 1414" upon division by 100. However, since (14,100) > 1, we cannot consider the order of 14 modulo 100. Instead, we can factor the modulus to coprime integers: 100 = 4-25. Apparently, 4 | a, so it remains to find the remainder of a modulo 25. By Euler's theorem, we have 14^(25) = 1420 = 1 (mod 25^ so we are interested in the remainder of 1414 upon division by 20 = 4 ■ 5. Again, we clearly have 4 | 1414, and further 1414 = (-1)14 = 1 (mod 5), so 141 16 (mod 20). Altogether, M14" = M16 = 216 . 716 (mod25). We can simplify the computation to come a lot if we realize that 72 = -1 (mod 25), and 25 = 7 (mod 25). Then, 141414 = 216 . 716 = ^5^3 . 2 . 716 ^ 73 . 2 . 716 = 2 ■ 719 = 2 ■ (-1)9 ■ 7= 11 (mod 25). We are thus looking for a non-negative integer which is less than 100, is a multiple of 4, and leaves remainder 11 when divided 25 - the only such number is clearly 36. □ 11.B.26. Determine the last three digits of the number 121C|11. o 11.B.27. Find all natural numbers n for which the integer A01//^ 5n — 4n — 3n is divisible by eleven. fr Solution. The orders of all of the numbers 3, 4, and 5 are equal to five, so it suffices to examine n e {0,1,2,3,4}. It can be seen from the following table 0 12 3 4 5™ mod 11 1 5 3 4 9 4™ mod 11 1 4 5 9 3 3™ mod 11 1 3 9 5 4 that only the case n = 2 (mod 5) yields 3 — 5 — 9 = 0 (mod 11). The problem is thus satisfied by exactly those natural numbers n which satisfy n = 2 (mod 5). □ Invoking exercise 11.B.7 and raising the statement for £ to the p-th power, we obtain (1 + apf1'1 = (1 + ap'^f (mod pl+1). It follows from the binomial theorem that (l + ap^f = l+p-a-pe-1+ J2 a p and since we have p\ (£) for 1 < k < p (by exercise 11 .B .6), it suffices to show that pe+1 | p1+(£_:L)fe, which is equivalent to 1 < (k - 1)(£ - 1). Thanks to the assumption £ > 2, we get that pe+1 | for k = p as well. □ Lemma. Let p be an odd prime, £ > 2 arbitrary. Then, it holds for any integer a satisfying p\ a that the order ofl+ap modulo pl equals pe_1. Proof. By the previous lemma, we have (l + apf''1 = l + ape (mod/+1), and considering this congruence modulo pe, we get (1 + ap)p = 1 (mod pe). At the same time, it follows directly from the previous lemma and p not being a divisor of a that (1 + ap)p ^ 1 (mod pe), which gives the wanted propo- sition. □ Proposition. Let p be an odd prime. Then, for every £ G N, there is a primitive root modulo pl. Proof. Let g be a primitive root modulo p. We will show that if gp_1 ^ 1 (mod p2), then g is a primitive root even modulo pe for any I 6 Ff, (If we had gv~x = 1 (mod p2), then (g + p) p-i 1 + (p — \)gp p ^ 1 (mod p2), so we could choose g + p for the original primitive root instead of the congruent integer g.) Let g satisfy gv~x ^ 1 (mod p2). Then, there is an a e Z, p \ a such that gv~x = 1 + p ■ a. We will show that the order of g modulo pe is ip(pe) = (p—l)p£_1. Let n be the least natural number which satisfies gn = 1 (mod pe). By the previous lemma, the order of gv~x = 1 + p ■ a modulo pe is pe~1. However, then it follows from the corollary of 11.3.5 that (gP"1)™ = (iff"1 = 1 (mod pe) ==> p1-1 | n. At the same time, the congruence gn = 1 (mod p) implies that p — 1 | n. From p — 1 and pe_1 being coprime, we get that (p — l)p£_1 | 7i. Therefore, ti = 2. Now, we will verify that 2 is a primitive root modulo 11. The order of 2 divides (41) = 40 = 23 ■ 5, it holds that an integer g coprime to 41 is a primitive root modulo 41 if and only if g20 ^ 1 (mod 41) A g8 ^ 1 (mod 41). Now, we will go through the potential primitive roots in ascending order: g = 2: 28 = 25-23 = -9-8 = 10 (mod 41), 220 = (25)4 = (_9)4 = gl2 = (_1)2 = 1 (mod 41), 5 = 3: 38 = (34)2 = (-1)2 = 1 (mod 41), g = 4: the order of 4 = 22 always divides the order 2, 5 = 5 (52)4 = (-24)4 (28 = 102 = 18 (mod 41), 520 = (52)io = (_24)10 = 240 = (220)2 = 1 (mod 41), g = 6: 68 = 28-38 = 10-l = 10 (mod 41), g20 = 220 . 320 = 220 . (38)2 . g4 = 1 • 1 • (—1) = —1 (mod 41). We have thus proved that 6 is the least positive primitive root modulo 41 (if we were interested in other primitive roots modulo 41 as well, we would get them as the powers of 6 with exponent taking on values from the range 1 to 40 which are coprime to 40. There are exactly y>(40) = p(23 -5) = 16 of them, and the resulting remainders modulo 41 are ±6, ±7, ±11, ±12, ±13, ±15, ±17, ±19). Now, if we prove that 640 ^ 1 (mod 412), we will know that 6 is a primitive root modulo any power of 41 (if we had "bad luck" and found out that 640 = 1 (mod 412), then a The subsequent proposition describes the case of powers of two. We will use similar helping lemmas as in the case of odd primes. Lemma. Let £ G N, £ > 3. men 52 (mod 2e). Proof. Similarly as above for 2 \ p. 1 + 2* □ Lemma. Let £ G N, £ > 3. Then the order of the integer 5 modulo 2l is 2l~2. Proof. Easily from the above lemma. □ Proposition. Let I £ N. There are primitive roots modulo 2l if and only if I < 2. Proof. Let £ > 3. Then the set S = {(-If ■ 5b; a G {0,1}, 0 < b < 2e~2; b G Z} forms a reduce residue system modulo 2e: it has ip (2e) elements, and it can be easily verified that they are pairwise incongruent modulo 2e. At the same time (utilizing the previous lemma), the order of every element of S apparently divides 2e~2. Therefore, this reduced system cannot (and nor can any other) contain an element of order p(2e) = 2e~1. □ The last piece to the jigsaw puzzle of propositions which collectively prove theorem 11.3.8 is the statement about nonexistence of primitive roots for composite numbers which are neither a power of prime nor twice such. Proposition. Let m G N be divisible by at least two primes, and let it not be twice a power of an odd prime. Then, there are no primitive roots modulo m. Proof. Let m factor to primes as 2ap"1 ■ ■ ■ p^k, where a G N0, on G N, 2 \ pi, and k > 2 or both k > 1 and a > 2. Denoting 5= [p(2a), pip"1),... ^(p"1)],^ am easily see that S < p(2a) ■ pip"1) ■ ■ ■ pip"1) = p(ra) and that for any a G Z, (a, m) = 1, we have ad = 1 (mod Therefore, there are no primitive roots modulo m. □ In general, it is computationally very hard to find a primitive root for a given modulus. The following theorem describes a necessary and sufficient 5 condition for the examined integer to be a primitive root. 11.3.9. Theorem. Let m be such an integer that there are primitive roots modulo m. Let us write p(m) = q"1 ■ ■ ■ q^k, where qi,..., qu are primes and Qi,..., G N. Then, for every j 6 Z, (g, m) = 1, it holds that g is a primitive root modulo m, if and only if neither of the following congruences holds: j 'i =1 (mod m), ., g "k =1 (mod m). Proof. If either of the congruences were true, it would mean that the order of g is less than p(ra). 798 CHAPTER 11. ELEMENTARY NUMBER THEORY primitive root modulo 412 would be 47 = 6 + 41). To avoid manipulating huge numbers when verifying the condition, we will use several tricks (the so-called residue number system). First of all, we calculate the remainder of 68 upon division by 412; this problem can be further reduced to computing the remainders of the integers 28 and 38: 28 = 256 = 6-41 + 10 (mod412), 38 = (34)2 = (2 ■ 41 - l)2 = -4 ■ 41 + 1 (mod 412). Then, 68 = 28 ■ 38 = (6 ■ 41 + 10)(-4 -41 + 1) = -34 ■ 41 + 10 = 7 ■ 41 + 10 (mod 412) and 64o = (-68)5 = (7 . 41 + 10)5 = (io5 + 5 ■ 7 ■ 41 ■ 104) = 104(10 + 35 ■ 41) = (-2 ■ 41 - 4)(-6 ■ 41 + 10) = (4-41 -40) = 124 ^ 1 (mod412). In the calculation, we made use of the fact that 104 = 6 ■ 412 - 86, i.e., 104 = -2 ■ 41 - 4 (mod 412). Therefore, 6 is a primitive root modulo 412, and since it is an even integer, we can see that 1687 = 6 + 412isa primitive root modulo 2 ■ 412 (while the least positive primitive root modulo 2 ■ 412 is the integer 7). □ C. Solving congruences Linear congruences. The following exercise illustrates that a /3 1, then there must be an i G {1,..., k} such that qi\u. However, then we get 1 (mod m). 4. Solving congruences and systems of them □ This part will be devoted to the analog to solving equations in a numerical domain. We will actu-ally be solving equations (and systems of equations) in the ring of residue classes (Zm, +, ■); we will, however, talk about solving congruences modulo m and write it in the more transparent way as usual. Congruence in one variable 3945 ■ 39 x = 3945 ■ 41 (mod 47), 3946 = 1 Letm G N, f(x),g(x) G Z[x]. The notation f(x) = g(x) (mod m) is called a congruence in variable x, and it is understood to be the problem of finding the set of solutions, i.e., the set of all such integers c for which /(c) = g(c) (mod m). Two congruences (in one variable) are called equivalent iff they have the same set of solutions. The mentioned congruence is equivalent to the congruence f(x) — g(x) = 0 (mod m). The only method which always leads to a solution is trying out all possible values (however, this would, of course, often take too much time). This procedure is formalized by the following proposition. 11.4.1. Proposition. Let m G N, f(x) G Z[x]. Then, it holds for every a, b G Z that a = b (mod m) /(a) = /(&) (mod m). Proof. Let/(a;) = cnxn+cn_1xn~1+- ■ ■+c1x + + c0, where cq, c\, ..., cn G Z. Since a = b (mod m), c,a! = cib1 (mod m) holds for every i = 0,1,..., n. Adding up these congruences for i = 0,1,2,..., n leads to cnan + ■ ■ ■ + c\a + cq = cnbn + ■ ■ ■ + c]b + cq (mod m), i.e., f(a) = f{b) (mod m). □ Corollary. The set of solutions of an arbitrary congruence modulo m is a union of residue classes modulo m. Definition. The number of solutions of a congruence in one variable modulo m is the number of residue classes modulo m containing the solutions of the congruence. Example. The concept number of solutions of a congruence, which we have just defined, is a bit counterintuitive in that it depends on the modulus of the congruence. Therefore, equivalent congruences (sharing the same integers as solutions) can have different numbers of solutions. 799 CHAPTER 11. ELEMENTARY NUMBER THEORY whence it already follows that a; = 3945-41 (mod 47). To complete the solution, it remains to calculate the remainder of 3945 ■ 41 when divided by 47, which is left as an exercise to the kind reader, leading to the result x = 36 (mod 47). ii) Another option is to make use of Bezout's theorem. The Euclidean algorithm applied to the pair (39,47) yields 47 = 1 ■ 39 + 8, 39 = 4 ■ 8 + 7, 8 = 1-7 + 1. In the other direction, this leads to 1 = 8- 7 = 8-(39- 4-8) = 5- 8-39 = 5 ■ (47 - 39) - 39 = 5 ■ 47 - 6 ■ 39. Considering this equality modulo 47 and remembering that we are solving the equation 41 = x ■ 39, we obtain (mod 47), / -41 (mod 47), (mod 47), (mod 47), (mod 47). Let us notice that this procedure is usually used in the corresponding software tools - it is efficient and can be easily made into an algorithm. It was also important that 41 (the number we multiplied the congruence with) and the modulus 47 are coprime. iii) Concerning paper-and-pencil calculations, the most efficient procedure (yet one not easily generalizable into an algorithm) is to gradually modify the congruence so that the set of solutions remains unchanged: 39a; = 41 (mod 47), -8x = -6 (mod 47), /■■- 4x= 3 (mod 47), 4a; = -44 (mod 47), /:4 x = -11 (mod 47), x = 36 (mod 47). (1) The congruence 2x = 3 (mod 3) has exactly one solution (modulo 3). (2) The congruence Wx = 15 (mod 15) has five solutions (modulo 15). (3) The congruences from (1) and (2) are equivalent. 11.4.2. Linear congruence in one variable, lust like in the case of ordinary equations, the easiest congruences are the linear ones, for which we are able ^r^_ not only to decide whether they have a solution, but to efficiently find it (provided they have some). The procedure is described by the following theorem and its proof. 11.4.3. Theorem. Let m e N, a,b e Z, and d = (a, m). Then the congruence (in variable x) ax = b (mod m) has a solution if and only if d\b. If d | b, then this congruence has exactly d solutions (modulo m). Proof. First, we prove that the mentioned condition is necessary. If an integer c is a solution of this congruence, then we must have m | a ■ c — b. Since d = (a, m), we get d | m and d | a ■ c — b, so d | a ■ c — (a ■ c — b) = b. Now, we will prove that if d | b, then the given congruence has exactly d solutions modulo m. Let a1,b1 e Z and mi £ N so that a = d ■ ai, b = d ■ bi, and m = d ■ m\. The congruence we are trying to solve is thus equivalent to the congruence a\ ■ x = b\ (mod m\), where (ai,mi) = 1. This congruence can be multiplied by , which, by Euler's theorem, leads to the mteger a\p x = b l ■ a ¥>(mi (mod mi) □ This congruence has a unique solution modulo mi, thus it has d = mjm\ solutions modulo m. □ Using the theorem about solutions of linear congruences, we can, among others, prove Wilson's theorem - an important theorem which gives a necessary (and sufficient) condition for an integer to be a prime. Such conditions are extremely useful in computational number theory, where one needs to efficiently determine whether a given large integer is a prime. Unfortunately, it is not known now how fast modular factorial of a large integer can be computed. That is why Wilson's theorem is not used for this purpose in practice. Theorem (Wilson). A natural number n > 1 is a prime if and only if (n — 1)! = —1 (mod n). Proof. First, we prove that every composite number n > 4 satisfies n | (n — 1)!, i.e., (n — 1)! = 0 (mod n). Let 1 < d < n be a non-trivial divisor of n. If d ^ n/d, then the inequality 1 < d, n/d < n — 1 implies what we need: n = d ■ n/d \ (n — 1)!. If d = n/d, i.e., n = d2, then 800 CHAPTER 11. ELEMENTARY NUMBER THEORY Systems of congruences. In order to solve system of (not only linear) congruences, we will often utilize the Chinese remainder theorem, which guarantees uniqueness of the solution provided the moduli of the particular congruences are pairwise coprime. 11.C.2. Solve the system x = 7 (mod 27), x = —3 (mod 11). Solution. As (27,11) = 1, we are guaranteed by the Chinese remainder theorem that the solution is unique modulo 27 ■ 11 = 297. There are two major possible approaches to finding the solution. (a) Using the Euclidean algorithm, we can find the coefficients in Bezout's identity: 1 = 5 ■ 11 — 2 ■ 27. Hence, [ll]-,.1 = [5]27and [27]^ = [-2]n. Therefore, the solution is x = 7 ■ 11 ■ 5 - 3 ■ 27 ■ (-2) = 547 = 250 (mod 297). (b) Using step-by-step substitution, we get x = 1 14 — 3 from the second congruence. Substituting this into the first one leads to Hi = 10 (mod 27). Multiplying this by 5 yields 554 = 50, i.e., t = — 4 (mod 27). Altogether, x = 11 ■ 27 ■ s - 4 ■ 11 - 3 = 297s - 47 for s e Z, i.e., x = -47 (mod 297). □ 11.C.3. Solve the following system of congruences: x= 1 (mod 10), x = 5 (mod 18), x = -4 (mod 25). Solution. The integers x which satisfy the first congruence are those of the form x = 1 + 104, where 4 G Z may be arbitrary. We will substitute this expression into the second congruence and then solve it (as a congruence in variable t): 1 + 104= 5 (mod 18), 104= 4 (mod 18), 54= 2 (mod 9), 54 = 20 (mod 9), t= 4 (mod 9), or 4 = 4 + 9s, where s e Z is arbitrary. The first two congruences are thus satisfied by exactly those integers x which are of the form x = 1 + 104 = 1 + 10(4 + 9s) = 41 + 90s. we have d > 2 (since n > 4) and n | (d ■ 2d) | (n — 1)!. For n = 4, we easily get (4-1)1 = 2^ -1 (mod 4). Now, let p be a prime. The integers in the set {2, 3,... ,p — 2} can be grouped by pairs of those mutually inverse modulo p, i.e., pairs of integers whose product is congruent to 1. By the previous theorem, for every integer a of this set, there is a unique solution of the congruence a ■ x = 1 (mod p). Since a ^ 0,l,p— 1, it is apparent that the solution c of the congruence also satisfies c ^ 0,1, —1 (mod p). The integer a cannot be paired with itself, either: If so, i.e., a ■ a = 1 (mod p), we would (thanks to p | a2 — 1 = (a + l)(a — 1)) get the congruence a = ±1 (mod p). The product of the integers of the mentioned set thus consists of products of (p — 3)/2 pairs (whose product is always congruent to 1 modulo p). Therefore, we have (p - 1)! = l(P-3)/2 ■ (p - 1) = -1 (mod p). n 11.4.4. Systems of linear congruences. Having a system jg^ of linear congruences in the same variable, we jrj&l" can decide whether each of them is solvable by f yi the previous theorem. If at least one of the con--Sfe*-—2^ gruences does not have a solution, nor does the whole system. On the other hand, if each of the congruences is solvable, we can rearrange it into the form x = c{ (mod nrii). We thus get a system of congruences x = c\ (mod mi), x = cj, (mod mj;). Apparently, it suffices to solve the case k = 2 since the solutions of a system of more congruences can be obtained by repeatedly applying the procedure for a system of two congruences. Proposition. Let ci,c2 be integers and mi,m2 be natural numbers. Let us denote d = (mi,m2). The system of two congruences x = c\ (mod mi), x = c2 (mod m2) has no solution if c\ ^ c2 (mod d). On the other hand, if c\ = c2 (mod d), then there is an integer c such that x G Z satisfies the system if and only if it satisfies the congruence x = c (mod [mi,m2]). Proof. If the given system is to have a solution x e Z, we must have x = c\ (mod d), x = c2 (mod d), and thus ci = c2 (mod d) as well. Hence it follows that the system cannot have a solution when c\ ^ c2 (mod d). From now on, suppose that c1 = c2 (mod d). The first congruence of the system is satisfied by those integers x which are of the form x = c1 + tm1, where 4 e Z is arbitrary. Such an integer x satisfies the second congruence of the system if and only if c1 + tm1 = c2 (mod m2), i.e., 801 CHAPTER 11. ELEMENTARY NUMBER THEORY Once again, this can be substituted into the third congruence and then solved: 41 + 90s = -4 (mod 25), 90s = 5 (mod 25), 18s = 1 (mod 5), 3s = 6 (mod 5), s = 2 (mod 5), or s = 2 + 5r, where r £ Z, Altogether, x = 41 + 90s = 41 + 90(2 + 5r) = 221 + 450r. Therefore, the system is satisfied by those integers x with x = 221 (mod 450). □ 11.C.4. A group of thirteen pirates managed to steal a chest full of gold coins (there were around two thousand of them). The pirates tried to divide them evenly among themselves, but ten coins were left over. They started to fight for the remaining coins, and one of the pirates was deadly stabbed during the combat. So, they tried to divide the coins evenly once again, and now three coins were left. Another pirate died in a subsequent battle for the three coins. The remaining pirates tried to divide the coins evenly for the third time, now successfully. How many coins were there in the chest? Solution. The problem leads to the following system of congruences: x = 10 (mod 13), x = 3 (mod 12), x = 0 (mod 11). Its solution is a; = 231 (mod 11-12-13). Since the number a; of coins is to be around 2000 and x = 231 (mod 1716), we can easily settle that there were exactly 231 + 1716 = 1947 coins. □ 11.C.5. When gymnasts made groups of eight people, three were left over. When they formed circles, each consisting of seventeen people, seven remained; and when they grouped into pyramids (each of them contains 21 = 42 + 22 + 1 gymnasts), two of them were incomplete (missing a person "on the top"). How many gymnasts were there, provided there were at least 2000 and at most 4000? tmi = c2 — ci (mod m2). By the theorem about solutions of linear congruences, this congruence (in variable t) is solvable since d = (m1, m2) divides c2 — c\, and t satisfies this congruence if and only if mod m2 c-2 — ci /mi \ ^(t2)-1 / ^^-■K-d) \ i.e., if and only if . , pm1\'p(=f-) x = ci + tm1 = ci + (c2 — ci) ■ I- mim2 = c + r ■ [mi, m2], where r e Z is arbitrary and c = ci + (c2 — ci) ■ (mi/d)^m2/d\ as mim2 equals d ■ [mi,m2]. We have thus found such an integer c that every x e Z satisfies the system if and only if x = c (mod [mi, m2]), as wanted. □ We can notice that the proof of this theorem is constructive, i.e., it yields a formula for finding the integer c. This theorem thus gives us a procedure how to catch the condition that an integer x satisfies a given system by a single congruence. This new congruence is then of the same form as the original one. Therefore, we can apply this procedure to a system of more congruences - first, we create a single congruence from the first and second congruences of the system (satisfied by exactly those integers x which satisfy the original two); then, we create another congruence from the new one and the third one of the original system, and so on. Each step reduces the number of congruences by one; after a finite number of steps, we thus arrive at a single congruence which describes all solutions of the given system. It follows from the procedure we have just mentioned (supposing the condition from below holds) that a system of congruences always has a solution, and this is unique. Theorem (Chinese remainder theorem). Let m1mk e N be pairwise coprime, ai,..., ak £Z. Then, the system x = ai (mod mi), x = ak (mod mk) has a unique solution modulo mi ■ m2 ■ Remark. The unusual name of this theorem comes from Chinese mathematician Sun Tzu of the 4th century. In his text, he asked for an integer which leaves remainder 2 when divided by 3, leaves remainder 3 when divided by 5, and again remainder 2 when divided by 7. The answer is rumored to be hidden in the following song: 802 CHAPTER 11. ELEMENTARY NUMBER THEORY Solution. We solve the following system of linear congruences in the standard way: c = 3 (mod 8), c = 7 (mod 17), c = -2 (mod 21), leading to the solution c = 1027 (mod 2856), which, together with the additional information, implies that there were exactly 3883 gymnasts. □ 11.C.6. Find which of the following (systems of) linear congruences has a solution. i) x = 1 (mod 3), x = —1 (mod 9); ii) 8a; = 1 (mod 12345678910111213); iii) x = 3 (mod 29), x = 5 (mod 47). O The Chinese remainder theorem can also be used "in the opposite direction", i.e. to simplify a linear congruence provided we are able to express the modulus as a product of pair-wise coprime factors. 11.C.7. Solve the congruence 23 941a; = 915 (mod 3564). Solution. Let us factor 3564 = 22 ■ 34 ■ 11. Since none of the integers 2, 3, lldivides 23 941, we have (23 941,3564) = 1, so the congruence has a solution. Since p (3564) = 2-(33-2)-10 = 1080, the solution is of the form x = 915 ■ 239411079 (mod 3564). However, it would take much effort to simplify the right-hand integer to a more explicit form. Therefore, we will try to solve the congruence in a different way - we will build an equivalent system of congruences which are easier to solve than the original one. We know that an integer a; is a solution of the given congruence if and only if it is a solution of the system 23941a; = 915 (mod 22), 23941a; = 915 (mod 34), 23941a; = 915 (mod 11). Solving these congruences separately, we get the following, equivalent system: x = 3 (mod 4), x = —3 (mod 81), x = -4 (mod 11). S£-3^^. SunziGe — El H. Proof. It is a simple consequence of the previous proposition about the form of the solution of a system of two congruences. However, as we show here, this result can also be proved directly. Let us denote M := mim2 ■ ■ ■ mr and rii = M/nrii for every i, \ < i < r. Then, for any i, mi is coprime to n{, so there is an integer e {1,..., — 1} such that bprii = 1 (mod m,). Note that bpni is divisible by all the numbers rrijA < j < r,i =^ j. Therefore, the wanted solution of the system is the integer x = ai&ini + a2&2^2 + ■ ■ ■ + arbrnr. □ Let us emphasize that this is quite a strong theorem (which is actually valid in much more general algebraic structures), which allows us to guarantee that for any remainders with respect to given (pairwise coprime) moduli, there exists an integer with the given remainders. 11.4.5. Higher-order congruences. Now, let us get back s^^^^^l^ t0 ^ m°re £enera^case °^ con8mences- f(x) = 0 (mod m), where / (x) is a polynomial with integer coefficients and m e N. So far, we have only one method at our disposal, which is tedious, yet universal - to try all possible remainders modulo m. When solving such a congruence, it is sufficient to find out for which integers a, 0 < a < m, it holds that /(a) = 0 (mod m). The disadvantage of this method is its complexity, which increases as m does. If m is composite, i.e., m = p"1 .. ■p^k, where pi, ■ ■ ■ ,pk are distinct primes, and k > 1, we can replace the original congruence by the system of congruences f(x) = 0 (modpf), f(x) = 0 (modp^), which has the same set of solutions. However, we can solve the congruences separately. The advantage of this method is in that the moduli of the congruences of the system are less than the modulus of the original congruence. Example. Consider the congruence a;3-2a;+ 11 = 0 (mod 105). If we were to try out all possibilities, we would have to compute the value of / (x) = a;3 — 2a+l 1 for the 105 values / (0), /(l),..., /(104). Therefore, we better factor 105 = 3 ■ 5 ■ 7 803 CHAPTER 11. ELEMENTARY NUMBER THEORY Now, the procedure for finding a solution of a system of congruences yields x = —1137 (mod 3564), which is the solution of the original congruence as well. □ 11.C.8. Solve the congruence 3446a; = 8642 (mod 208). o 11.C.9. Prove that the sequence (2™ — 3)^=1 contains infinitely many multiples of 5 as well as infinitely many multiples of 13, yet there is no multiple of 65 in it. O Residue number system. When calculating with large inte-v\\)iv gers, it is often more advantageous to work not with ^>\J/< their decimal or binary expansions, but rather with Js their representation in a so-called residue number system, which allows for easy parallelization of computations with large integers. Such a system is given by a fc-tuple of (usually pairwise coprime) moduli, and each integer which is less than their product is then uniquely representable as a k-tuple of remainders (whose values do not exceed the moduli). 11.C.10. The quintuple of moduli 3,5,7,11,13 can serve to uniquely represent integers which are less than their product (i.e. less than 15015) and to perform standard arithmetic operations efficiently (and in a distributed manner if desired). Now, we will determine the representation of the integers 1234 and 5678 in this residue number system and we will determine their sum and product. Solution. Calculating the remainders of the given integers upon division by the particular moduli, we get their RNS representations, which can be written as the tuples (1,4,2, 2,12) and (2,3,1,2,10). The sum is computed componentwise (reducing the results modulo the appropriate number), leading to the tuple (0,2,3,4,9). Using the Chinese remainder theorem, this tuple can then be transformed back to the integer 6912. The product is computed analogously, yielding the corresponding tuple (2,2,2,4,3), which can be transformed back to 9662 (by the Chinese remainder theorem again). This is indeed congruent to 1234 ■ 5678 modulo 15015. □ ll.C.ll. In practice, the residue number system is often a triple 2™ — 1,2™, 2™ + 1 (why are these integers always co-prime?), which can uniquely cover integers of 3ti bits at the utmost. Consider the case n = 3 and determine the representation of the integer 118 in this residue number system. and solve the congruences f(x) = 0 for moduli 3, 5, and 7. We evaluate the polynomial f(x) in convenient integers: X -3 -2 -1 0 1 2 3 /(*) -10 7 12 11 10 15 32 The congruence f(x) = 0 (mod 3) thus has solution x = — 1 (mod 3) (only the first one of the integers 12,11,10 is a multiple of 3); the congruence f(x) = 0 (mod 5) has solutions x = 1 and x = 2 (mod 5); finally, the solution of the congruence f(x) = 0 (mod 7) is x = —2 (mod 7). It remains to solve two systems of congruences: x = —1 (mod 3), x = —1 (mod 3), x = 1 (mod 5), and x = 2 (mod 5), x = —2 (mod 7) x = —2 (mod 7). Solving these systems, we can find out that the solutions of the given congruence f(x) = 0 (mod 105) are exactly those integers x which satisfy x = 26 (mod 105) or x = 47 (mod 105). It is not always possible to replace the congruence with J, a system of congruences modulo primes, as in the jgfe. above example: if the original modulus is a multiple / iPi''/ °f a hi§her power of a prime, then we cannot "get ' I \ nx rid" of this power. However, even such a congruence modulo a power of prime need not be solved by examining all possibilities. There is a more efficient tool, which is described by the following theorem. 11.4.6. Theorem (Hensel's lemma). Let p be a prime, f(x) £ Z[x], a £ Z such thatp \ f{a),p\ /'(a). Then, for every n £ N, the system x = a (mod p), f(x) = 0 (mod pn) has a unique solution modulo pn. Proof. We will proceed by induction on n. In the case of n = 1, the congruence f(x) = 0 (mod p1) is only another formulation of the assumption that the integer a satisfies p f(a). Further, let n > 1 and suppose the proposition is true for n — 1. If a; satisfies the system for ti, then it does so for 7i — 1 as well. Denoting one of the solutions of the system for ti — 1 as c„_i, we can look for the solution of the system for ti in the form x = c„_i + k ■ pn_1, where k £ Z. We need to find out for which k we have / (c„_i + k ■ p™"1) = 0 (mod pn). We know that pn_1 | / (c„_i + k -p™-1)- Now, we use the binomial theorem for f(x) = amxm + ■ ■ ■ + a±x + a®, where a0,..., am £ Z. We have (c„_i + k ■ p^Y = <_x + i ■ ■ kpn~x (mod pn), hence / (c-! + k ■ p™-1) = /(c^x) + k ■ p"-V'(c„-i). 804 CHAPTER 11. ELEMENTARY NUMBER THEORY Solution. We can directly calculate that 118 = 6 (mod 7), 118 = 6 (mod 8), and 118 = 1 (mod 9). The wanted representation is thus given by the triple (6,6,1). In practice, however, it is very important that the RNS representation can be efficiently transformed to binary and vice versa. In our concrete case, the remainder of 118 = (1110110)2 when divided by 23 can be found easily - it is the last three bits (110)2 = 6. Computing the remainder upon division by 23 + 1 = 9 or 23 — 1 = 7 is not any more complicated. We can see (splitting the examined integer into three groups of n bits each) that (1110110)2 = (001)2 + (110)2 + (H0)2 = 6 (mod23 - 1), (1110110)2 = (001)2 - (110)2 + (H0)2 = 1 (mod23 + 1). A thoughtful reader has surely noticed the similarity with the criteria for divisibility by 9 and 11, which were discussed in paragraph 1 LB.9. □ 11.C.12. Higher-order congruences. Using the procedure . of theorem 11.4.6, solve the congruence + 7x + 4 = 0 (mod 27). Solution. First, we will solve this congruence modulo 3 (by substitution, for instance) - we can easily find that the solution is x = 1 (mod 3). Now, writing the solution in the form x = 1 + 3t, where t e Z, we will solve the congruence modulo 9: x4 + 7x + 4= 0 (mod 9), (l + 3f)4 + 7(l + 3f)+4 = 0 (mod 9), l + 4-3i + 7 + 7-3i + 4 = 0 (mod 9), 33i = -12 (mod 9), llt = - 4 (mod 3), t = 1 (mod 3). Writing t = 1 + 3s, where s 6 Z, we get 2 = 4 + 9s, and substituting this leads to (4 + 9s)4 + 7(4 + 9s) + 4 : 44 + 4 ■ 43 ■ 9s + 28 + 63s + 4 : 256 ■ 9s + 63s : 256s + 7s : 2s : 0 (mod 27), 0 (mod 27), 288 (mod 27), 32 (mod 3), 1 (mod 3), 2 (mod 3). Therefore, / (c„_i + k ■ 0 = 0 (mod pn p" + k ■ /'(c„_i) (mod p). Since c„_i = a (mod p), we get /'(c„_i) = /'(a) ^ 0 (mod p), so (/'(c„_i),p) = 1. By the theorem about the solutions of linear congruences, we can hence see that there is (modulo p) a unique solution k of this congruence, and since c„_ 1 was, by the induction hypothesis, the only solution modulo the integer c„_i +fcpn_1 is the only solution of the given system modulo pn. □ Example. Consider the congruence 3a;2+4 = 0 (mod 49). The congruence can be equivalently transformed (by solving the linear congruence 3y = 1 (mod 49) and multiplying both sides of the congruence by the integer y = 33) to the form a;2 = 15 (mod 72). Then, we proceed as in the constructive proof of Hensel's lemma. First, we solve the congruence a;2 = 15 = 1 (mod 7), which has at most 2 solutions, and those are a; = ±1 (mod 7). These solutions can be expressed in the form x = ±1 + 7t, where ( 6 Z, and substituted into the congruence modulo 49, whence we get the solution x = ±8 (mod 49) (if we were interested solely in the number of solutions, we would not even have to finish the calculation as it follows straight from Hensel's lemma that every solution modulo 7 gives a unique solution modulo 49 because for 15, wehave7f/'(±l)). 11.4.7. Congruences modulo a prime. The solution of general higher-order congruences has thus been reduced to the solution of congruences modulo a prime. As we will see, this is where the stum---■Z*. bling block is since no (much) more efficient universal procedure than trying out all possibilities is known. We can at least mention several statements describing the solvability and number of solutions of such congruences. We will then prove some detailed results for some special cases in further paragraphs. Theorem. Let p be a prime, f(x) G Z[x\. Every congruence f(x) = 0 (mod p) is equivalent to a congruence of degree at most p — 1. Proof. Since it holds for any a e Z that p a (sim- ple consequence of Fermat's little theorem), the congruence xp — x = 0 (mod p) is satisfied by all integers. Dividing the polynomial f(x) by xp — x with remainder, we get f(x) = q(x) ■ (xp — x) + r(x) for suitable f(x),r(x) e Z, where the degree of r(x) is less than that of the divisor, i.e. p. We thus get that the congruence r(x) = 0 (mod p) is equivalent to the congruence f(x) = 0 (mod p), yet it is of degree at most p—1. □ 805 CHAPTER 11. ELEMENTARY NUMBER THEORY Altogether, we get the solution in the form x = 4 + 9s = 4 + 9(2 + 3r) = 22 + 27r, where r G 1, i.e., x = 22 (mod 27). □ 11.C.13. Knowing a primitive root modulo 41 from exercise 11.B.29 , solve the congruence 7x17 = 11 (mod 41). Solution. Multiplying the congruence by 6, we get an equivalent congruence 42a;17 = 66, i.e., a;17 = 25 (mod 41). Since 6 is a primitive root modulo 41, the substitution x = 6* leads to the congruence 6m = 25 = 64 (mod 41), which is equivalent to 171 = 4 (mod 40), and this holds if and only if i = 12 (mod 40). Therefore, the congruence is satisfied by exactly those integers x with x = 612 = 4 (mod 41). □ 11.C.14. Solve the congruence x5 + 1 = 0 (mod 11). Solution. Since (5, {m)/d = gbV(m)/d (mod m) bip{m is true if and only if that any quadratic congruence can be transit »•^-'7' formed to the (possibly system of congruences of) binomial form x2 = a (mod p), and then we can decide about the solvability using the Legendre symbol. Let us illustrate it on several examples. 11.C.23. Determine the number of solutions of the congruence 13a;2 + 7x + 1 = 0 (mod 37). Solution. First, we need to normalize the polynomial on the left-hand side, i.e. we have to find the inverse of 13 modulo 37. Using the Eucliean algorithm we find that the inverse is 20, and after multiplication of both sides of the congruence by it and reducing modulo 37 we obtain the congruence a;2 +29a;+ 20 = 0 (mod 37). Now we complete the square (an odd coefficient 29 does not cause any trouble as it can be replaced by -8) and we obtain (x - A)2 + A = 0 (mod 37). After substitution of y for x — A we finally obtain the congruence in a binomial form y2 = -4 (mod 37). The fact that this congruence is solvable can be established either using theorem 11.4.10, or with use of the Legendre symbol. The former approach leads to the calculation d = (2,^.(37)) = 2, and (-4)^=1 (mod 37), while the latter one gives 37) ~ ( 37 ) ' (37) ~ 1 by the corollary after theorem 11.4.13 (as 37 = 1 (mod 4)). In any way we have obtained that the given congruence has d = 2 solutions. □ 11.C.24. Solve the congruence 6a;2 +x—1 = 0 (mod 29). Solution. Although we have not presented any special method for finding solutions of quadratic congruence yet (apart from the general method for binomial congruences or going through the complete residue system) we will see that in some case the set of solutions can be easily established. Let us first proceed in the usual way: multiplying the congruence by 5 (it is the inverse of 6 modulo 29) we obtain a;2 + 5x — 5 = 0 (mod 29), and after completing the square we have (a;-12)2 = 4 (mod 29). Law of quadratic reciprocity 11.4.13. Theorem. Let p, q be odd primes. Then, (1) (f) = (-1) V, (2) (2) = (-1)^, (3) (§) = (§)■(-!)"• The theorem is put this way mainly because we can calculate the value (a/p) for any integer a using these three formulae and the basic rules for the Legendre symbol. Example. Let us calculate the value (79/101) using the properties of the Legendre symbol. 79 \ _ /101 Toy ~ \~79 22 79 79 J \79 11 since 101 is congruent to 1 modulo A 79 = (-1) since 79 is congruent to — 1 modulo 8 since 11 = 79 = 3 (mod 4) (-!)( — ) =1 since II = 3 (mod 8). Many proofs of the the quadratic reciprocity law can be found in literature6. However, many of them (especially the shorter ones) usually make use of deeper knowledge from algebraic number theory. We will 1 present an elementary proof of this theorem here. Let S denote the reduced residue system of the least residues (in absolute value) modulo p, i.e., g _ r p-t p-3 _i 1 p-3 p-i 1 l 2 ' 2 ' " " " ' 2' 2 J " Further, for a e Z, p \ a, let nv(a) denote the number of negative least residues (in absolute value) of the integers 1 ■ a, 2 ■ a,..., —— ■ a, i.e., we decide for each of these integers to which integer from the set S it is congruent and count the number of the negative ones. If it is clear from context which values a,p we mean, we will usually omit the parameters and write only fi instead of fip(a). Example. We determine nP(a) for the prime p = 11 and the integer a = 3. In 2000, F. Lemmermeyer stated 233 proofs - see F. Lemmermeyer, Reciprocity laws. From Euler to Eisenstein, Springer. 2000 810 CHAPTER 11. ELEMENTARY NUMBER THEORY We immediately see that this congruence is solvable with the pairof solutionsx—12 = ±2 (mod 29), andfhusa; = 10,14 (mod 29). We could have also seen almost immediately that the given polynomial can be factored as Qx2 + x — 1 = (3a; — 1) (2x+1), and thus the prime modulus 29 has to divide either 3a; — 1 or 2a; + 1. The obtained linear congruences 3a; = 1 (mod 29) and 2a; = — 1 (mod 29) easily yield the same solutions x = 10 (mod 29) and x = 14 (mod 29) as above. □ 11.C.25. Find all integers which satisfy the congruence x2 = 7 (mod 43). Solution. The Legendre symbol evaluates to ,43)= ~ {t) = ~ G, Hence it follows that 7 is a quadratic nonresidue modulo 43, so there is no solution of the given congruence. □ 11.C.26. Find all integers a for which the congruence Now, the reduced residue system we are interested in is S = {—5,..., —1,1,,..., 5}, and for a = 3, we calculate 3 -5 -2 1 4 (mod 11) (mod 11) (mod 11) (mod 11) (mod 11), whence /in (3) = 2. We will show in the following statement that this integer is tightly connected to the Legendre symbol - the value of the symbol (3/11) can be determined in terms of the p function as (-l)Mn(3) = (_1)2 = L 11.4.14. Lemma (Gauss). Ifp is an odd prime, a G Z, p \ a, then the value of the Legendre symbol satisfies = -1. = a (mod 43) is solvable. Proof. For each integer i G {1,2,..., 1~-}, we set a value mi G {1,2,..., ^5^} so that i ■ a = ±m, (mod p). We can easily see that if fc, / G {1,2,..., Pj^L} are different, then the values m^, mj are also different (the equality mk = mi would imply that k ■ a = ±1 ■ a (mod p), and hence k = ±1 (mod p), which cannot be satisfied unless k = I). Therefore, the sets {1,2,...,^} and {mi, m2,..., rriv-i } coincide, which is also illustrated 2 by the above example. Multiplying the congruences p-1 leads to Solution. This exercise is a follow-up to the above one, from which we can see that the integer 7 does not meet the requirement. We can test all the remainders modulo 43 in the same way, but there is a simpler method. The congruence is surely solvable if a is a multiple of 43 (then, it has a unique solution); and if not, it must be a quadratic residue modulo 43. The quadratic residues can be most simply enumerated by calculating the squares of all elements of a reduced residue system modulo 43. The quadratic residues are thus the integers congruent to (±1)2, (±2)2, (±3)2,..., (±21)2 modulo43; so the problem is satisfied by exactly those integers a which are congruent to 11.4.12, whence (a/p) any one of 1,4, 6, 9,10,11,13,14,15,16,17,21,23,24, 25, 31,35,36,38,40,41. □ 1 ■ a = ±mi 2 ■ a = ±?7i2 a = ±mP (mod p), (mod p), (mod p) £fi! ■ = (-1)" ■ £fl! (mod p), since there are exactly p negative values on the right-hand sides of the congruences. Dividing both sides by the integer !, we get the wanted statement, making use of lemma (mod p). □ Now, with the help of Gauss's lemma, we will prove the law of quadratic reciprocity. Using law of quadratic reciprocity 11.4.13 we can calculate the value (a/p) for any integer a and an odd prime p. Moreover, evaluation of the Legendre symbol is fast enough even for high arguments, therefore using it is favourable to verifying criteria of the theorem 11.4.10. Proof of the law of quadratic reciprocity. The first part has been already proven; for the rest, we first derive a lemma which will be utilized in the proof of both of the remaining parts. Let a e Z, p \ a, k e N and let [x] and (x) denote the integer part (i.e. floor) and the fractional part, respectively, of 811 CHAPTER 11. ELEMENTARY NUMBER THEORY 11.C.27. Here, we recall the statement of the Law in the slightly modified way which is more suitable for direct calculations. i) —1 is a quadratic residue for primes p which satisfy p = 1 (mod 4) and it is a quadratic nonresidue for primes p satisfying p = 3 (mod 4). ii) 2 is a quadratic residue for primes p which satisfy p = ±1 (mod 8) and it is a quadratic nonresidue for primes p satisfying p = ±3 (mod 8). iii) If p = 1 (mod 4) or q = 1 (mod 4), then (p/q) = (q/p); for other odd p, q, we have (p/q) = —(q/p). Solution. We simply apply law of quadratic reciprocity in the appropriate cases. i) The integer 1~- is even iff 4 | p — 1. ii) We need to know for which odd primes p the exponent is 2—i ^-g— is even. Odd primes are congruent to ±1 or ±3 modulo 8, so we have (by 11.B.7) that either p2 = 1 (mod 16) orp2 = 9 (mod 16). iii) This is clear from the law of quadratic reciprocity. D 11.C.28. Derive by straight calculation from Gauss's lemma 11.4.14 once again the so-called supplementary laws of quadratic reciprocity: '2N a real number x. Then, -1\ p-1 and p2-i = (-!) — . Solution. To evaluate (—1/p) in the former case, we should realize that fi tells the number of least (in absolute value) negative remainders of integers in the set {-1,-2,...,-*=!}. However, those are exactly the desired remainders and they are all negative; hence we have /i = 1~- and (—1/p) = p-i (-1) —. In the latter case, we need to express the number of least (in absolute value) negative remainders of integers in the set {1-2, 2-2, 3-2..., *=!■ 2}. For any k e {1,2,..., *=! }, the integer 2 k leaves a negative remainder if and only if 2k > *=!, i.e., iff k > *=!. Now, it remains to determine the number of such integers k. If p = 1 (mod 4), then this number is equal to 1~- — £—! = 2—! so 4 4 >b" (—1\ p-i p-i p+1 p2-i [—) = (-iT = (-i)~ = = (-i)~ ~2ak ak n/ak\ ak n/ak\ 2 — + 2( —) = 2 — + 2( — > . P . . P . \P / . P . \P / This expression is odd if and only if (^} > \, which is iff the least residue (in absolute value) of the integer ak modulo p is negative (a watchful reader should notice the return from the calculations of (ostensibly) irrelevant expressions back to the conditions close to the Legendre symbol). The integer pp(a) thus has the same parity (is congruent to, modulo 2) as E. k=l I ], whence (thanks to Gauss's lemma) we get that (-l)Ma) = (_i) Furthermore, if a is odd, then a + p is even and we get 2 /a+p\ 2a\ _ (2a + 2p P A_a+p * 2 P J \ P = (-l)^fc=i v~~r- = (-i)E^! Lit J . (-i)E,=2! k. Since the sum of the arithmetic series J2k=i ^ *s \ 2=! 2±! = 2^!, we get (for a odd) the relation = (-1) v 2 (-i)J \p) (jv which, for a = 1, gives the wanted statement of item 2. By part (2), which we have already proved, and the previous equality, we now get for odd integers a that (1) Now, let us consider, for given primes p/g, the set T = {q ■ x; x e Z, 1 < x < (p - l)/2} x x {p-y; y G Z, 1 < y < (q - l)/2}. We apparently have \T\ = *=! ■ We will show that we also have (-i)m = (-1)2-5: [f] .(-i)ESf[^], which will be sufficient thanks to the above. Since the equality ga; = can happen for no pair of x, y from the permissible domain, the set T can be partitioned to (disjoint) subsets Tx and T2 so that Tx = T n {(u,u);u,u G Z,u < u}, T2 = T\ Ti. Clearly, |Ti| is the number of pairs (qx,py) for which x < Ey. Since 812 CHAPTER 11. ELEMENTARY NUMBER THEORY since £±1 is odd in this case. Similarly, for p = 3 (mod 4), the number of such inte- gers k equals p— 1 p—3 _ p+1 —1\ P+1 P+1 P-1 P -1 —J = (-1)— = (-I) — '— = (-I)"; since 1~- is °dd in this case as well. □ 11.C.29. Solve the congruence x2 - 23 = 0 (mod 77). Solution. Factoring the modulus, we get the system x2-l = 0 (mod 11), x2 -2 = 0 (mod 7). Clearly, 1 is a quadratic residue modulo 11, so the first congruence of the system has (exactly) two solutions: x = ±1 (mod 11). Further, (2/7) = (9/7) = 1, and it should not take much effort to notice the solution: x = ±3 (mod 7). We have thus obtained four simple systems of two linear congruences each. Solving them, we will get that the original congruence has the following four solutions: x = 10, 32, 45 or 67 (mod 77). □ 11.C.30. Solve the congruence 7a;2 + 112a; + 42 = 0 (mod 473). O Jacobi symbol. Jacobi symbol (a/b) is a generalization of the Legendre symbol to the case where the "lower" argument b need not be a prime, but any odd positive integer. It is defined as the product of the Legendre symbols corresponding to the prime factors of b: if b = p"1 ■ ■ ■ p^k, then ybj \PlJ \Pk/ The primary motivation for introducing the Jacobi symbol is the necessity to evaluate the Legendre symbol (and thus to decide the solvability of quadratic congruences) without having to factor integers to primes. We will illustrate such calculation on an example having in mind that Jacobi symbol shares with the Legendre one not only the notation but also almost all of the (computational) properties. 11.C.31. Decide whether the congruence a;2 = 219 (mod 383) is solvable. \y ^ \ ■ ^r- < %>we have [fy] ^ ^f1- For a fixed y> in Ti, there are thus exactly those pairs (qx,py) for which 1 < x < [jy]; hence |Ti| = J2y=i)/2 [*v] ■ Analogously, |T2| = ££i1)/2 [**]. By (1), we thus have (|) = (-l)|Tl1 and g) = (-1)IT2I, which finishes the proof of the law of quadratic reciprocity. □ The evaluation of the Legendre symbol (as we saw in j|| the example above) allows us to only use the law of quadratic reciprocity for primes, so it forces us to factor integers to primes, which is a very *rz±i hard operation from the computational point of view. This can be mended by extending the definition of the Legendre symbol to the so-called Jacobi symbol with similar properties. Definition. Let a e Z, b e N, 2 \ b. Let b factor as b = P1P2 ■ ■ ■ Pk to (odd) primes (here, we exceptionally do not group the same primes to a power of the prime, rather we write each one explicitly, e.g. symbol 'a\ (a J>J VPi. is called the Jacobi symbol. 135 = 3 ■ 3 ■ 3 ■ 5). The P2 Pk We show in the practical column that the Jacobi symbol has similar properties as the Legendre one. However, there is a substantial aberration-it is not generally true that (a/b) = 1 implies that the congruence x2 = a (mod 6) is solvable. (-1) •(-!) = !, Example. J5) = (3) ' (5, but the congruence x2 = 2 (mod 15) has no solution (the congruence a;2 = 2 is solvable neither modulo 3 nor modulo 5). Theorem (Law of quadratic reciprocity for the Jacobi symbol). Let a,b £N be odd integers. Then, (1) (^) = (-1)^, (2) (D=(-l)^. (3) (§) = (£)■(-!)". Proof. The proof is simple, utilizing the law of quadratic reciprocity for the Legendre symbol. See exercise 11.C.35. □ There is another application of the law of quadratic reci-jjj 1 procity in a certain sense - we can consider the question: For which primes is a given integer a a quadratic residue? We are already able to answer this question for a = 2, for example. The first step in answering this question is to do so for primes since the answer 813 CHAPTER 11. ELEMENTARY NUMBER THEORY (Jacobi) as 383 = 219 = 3 (mod 4) 164 = 22 ■ 41 (Jacobi) as41 = 1 (mod 4) Solution. Since 383 is a prime, the congruence will be solvable if the Legendre symbol will satisfy (219/383) = 1. '219\ _ //383^ ,383) _ \219y '164 \ _ / 41 K219j _ \219, '219> ~AA v4l) = "fe) fe, v41y '41N -1N T □ Now, we introduce several exercises proving that the Jacobi symbol has properties similar to the Legendre one, which relieves us of the necessity to factor the integers that appear when working purely with the Legendre symbol. 11.C.32. Prove that all odd positive numbers b, b' and all integers a, ai, a2 satisfy (the symbols used here are always the Jacobi ones): i) if ai = a2 (mod 6), then (f) = (f), ii) (flTl) = (f)(f). iii) 1 as 41 = 1 (mod 8) as41 = 1 (mod 4) as 7 = 3 (mod 4). Kbb>, o 11.C.33. Prove that if a, b are odd natural numbers, then q-l 1 b-1 2 + 2 Ü) s 1 s (mod 2), (mod 2). Solution. i) Since the integer (a — 1) (6 — 1) = (ab — 1) — (a — 1) — (b— 1) is amultipleof 4, we get (ab—1) = (a—l)+(b— 1) (mod 4), which gives what we want when divided by two. ii) Similarly to above, (a2 - l)(b2 - 1) = (a2b2 - 1) -— (a2 — 1) — (b2 — 1) is a multiple of 16. Therefore, (a2b2 - 1) = (a2 - 1) + (b2 - 1) (mod 16), which gives the wanted statement when divided by eight (see also exercise 11.A.2). □ for composite values of a depends on the factorization of the integer a. Theorem. Let q be an odd prime. • If q = 1 (mod 4), then q is a quadratic residue modulo those primes p which satisfy p = r (mod q), where r is a quadratic residue modulo q. • If q = 3 (mod 4), then q is a quadratic residue modulo those primes p which satisfy p = ±b2 (mod Aq), where b is odd and coprime to q. Proof. The first theorem follows trivially from the law of quadratic reciprocity. Let us consider q = 3 (mod 4), i.e., (q/p) = (—^^(p/q). First of all, letp = +b2 (mod Aq), where b is odd, and hence b2 = 1 (mod 4). Then, p = b2 = 1 (mod 4) andp = b2 (mod q). Therefore, (—1)R2~ = 1 and (p/q) — 1. whence (q/p) = 1. Now, if p = —b2 (mod Aq), then we similarly get that p = —b2 = 3 (mod 4) and p = —b2 (mod q). Therefore, (—\)^~ = —\ and (p/q) = —1, whence we get again that (q/p) = 1. For the opposite way, suppose that (q/p) = 1. There are two possibilities - either (—l)1^ = 1 and (p/q) = 1, or (—1)^2- = — land (p/q) = —1. In the former case, we have p = 1 (mod 4) and there is a b such that p = b2 (mod q). Further, we can assume without loss of generality that b is odd (if not, we could have taken b + q instead). However, then we get b2 = 1 = p (mod 4), and altogether p = b2 (mod Aq). In the latter case, wehavep = 3 (mod 4) and (—p/q) = (-l/q)(p/q) = (-1)(-1) = 1. Therefore, there is a b (which can also be chosen so that it is odd) such that —p = b2 (mod q). We thus get —b2 = 3 = p (mod 4), and altogether p = —b2 (mod Aq). □ 5. Diophantine equations It is as early as in the third century AD when Diophantus of Alexandria dealt with miscellaneous equations while admitting only integers as solutions. And there is no wonder -in many practical problems that lead to equations, non-integer solutions may fail to have a meaningful interpretation. As an example, we can consider the problem of how to pay an exact amount of money with coins of given values. In honor of Diophantus, equations for which we are interested in integer solutions only are called Diophantine equations. Another nice example of a Diophantine equation is Eu-ler's relation v-e+f=2 from graph theory, connecting the number of vertices, edges, and faces of a planar graph. Furthermore, if we restrict ourselves to regular graphs only, we get to the problem about existence of the so-called Platonic solids, which can be smartly described just as a solution of this Diophantine equation - for more information, see 13.1.22. Unfortunately, there is no universal method for solving this kind of equations. There is even no method (algorithm) 814 CHAPTER 11. ELEMENTARY NUMBER THEORY 11.C.34. Prove that if a1:..., are odd natural numbers, then i) "H^"1 = Eti ^ (mod 2), ii) re=i«i-i = Eti ^ (mod 2). O 11.C.35. Prove the law of quadratic reciprocity for the Ja-cobi symbol, i.e., prove that if a, b are odd natural numbers, then i) (^) = (-l)^. © = (-1)^. i") (t) = (IM-i)"- Solution. Let (just like in the definition of the Jacobi symbol) a factor to (odd) primes as p±p2 ■ ■ - pk-i) The properties of the Legendre symbol and the aforementioned statement imply that -1\ /-A /-l^ Pi P2 Pk If we PI — 1 PL. — 1 = (-l)^-...(-l)^- = = (-1)^ ^ = = (-i)sH-i = (-i)^. ii) Analogously to above. iii) Further, let b factor to (odd) primes as q± q2 ■ have pi = qj for some i and j, then the symbols on both sides of the equality are equal to zero. Otherwise, the law of quadratic reciprocity for the Legendre symbol implies that for all pairs (pi, qj), we have W W Therefore, k e 'Pi Pi -1 Qj-1 w =nn\„ i=ij=i w k e ■nnv i=lj=l v' =ik-d 2 2 — _1 qj~1 — 1) 2 2-^j = l 2 ik-d Pi-i nj=i gj'1 -pT /gj n^) — nvp i=i j=i ^ to decide whether a given polynomial Diophantine equation has a solution. This question is well-known as Hubert's tenth problem, and the proof of algorithmic unsolvability of this problem was given by fOpnii MairiHceBira (Yuri Matiya-sevich) in 1970.7 However, there are cases in which we are able to find the solution of a Diophantine equation, or - at least - to reduce the problem to solving congruences, which is besides the already mentioned applications another motivation for studying them. Now, we will describe several such types of Diophantine equations. Linear Diophantine equation. A linear Diophantine equa-tion is an equation of the form a1x1 + a2x2 H-----h anxn = b, wriere-s=r,..., xn are unknowns and ai,..., an, b are given non-zero integers. We can see that the ability to solve Diophantine equations is sometimes important in "practical" life as well, as is proved by Bruce Willis and Samuel Jackson in Die Hard with a Vengeance, where they have to do away with a bomb using 4 gallons of water, having only 3- and 5-gallon containers at their disposal. A mathematician would say that the gentlemen were to find a solution of the Diophantine equation 3x+5y = 4. One can use congruences in order to solve these equations. Apparently, it is necessary for the equation to be solvable that the integer d = (ai,..., an) divides b. Provided that, dividing both sides of the equation by the number d leads to an equivalent equation a[x1 + a2x2 H----+ a'„x„, = b' where a\ = cii/d for i = 1, have -\- cirixn n and b' = b/d. Here, we d- (a[,. J = (da[,... , da'n) = (au ...,an) = d, See the elementary text M. Davis, Hubert's Tenth Problem is Unsolv-able, The American Mathematical Monthly 80(3): 233-269. 1973. 815 CHAPTER 11. ELEMENTARY NUMBER THEORY (-u-E?=^nn(f i=ij=i ^ = (-1) , , (- We utilized the result of part (i) of the previous exercise in the calculations. □ 11.C.36. Determine whether the congruence x2 = 38 (mod 165) is solvable. Solution. The Jacobi symbol is equal to ' 38 'i , 165 y 1165/ \165i '19\ {19) (§)■(§) ,11, ■(f)-(f)-(rf) = (-i)3(l)-(^)-(ir)3 = i- This result does not answer the question of the existence of a solution. However, if we split the congruence to a system of congruences according to the factors of the modulus, we obtain x2 = -1 (mod 3), x2 = 3 (mod 5), x2 = 5 (mod 11), whence we can easily see that the first and second congruences have no solution. In particular, = and (|) = (|) = (|) = -1 ■ Therefore, neither the original congruence has a solution. □ 11.C.37. Find all primes p such that the integer below is a N\\+v quadratic residue modulo p: ">Sy/o i) 3, ii) — 3, iii) 6. Solution. i) We are looking for primes p / 3 such that x2 = 3 (mod p) is solvable. Since p = 2 satisfies the above, we will consider only odd primes p / 3 from now on. For p = 1 (mod 4), it follows from the law of quadratic reciprocity that 1 = (3/p) = (p/3), which occurs if and only if p = 1 (mod 3). On the other hand, if p = — 1 (mod 4), then 1 = (3/p) = —(p/3), which holds for p = —1 (mod 3). Putting the conditions of both cases together, we arrive at p = ±1 (mod 12), which, together with p = 2, completes the set of all primes satisfying the given condition. ii) The condition 1 = (—3/p) = (—l/p)(3/p) is satisfied if either (-1/p) = (3/p) = lor(-l/p) = (3/p) = -1. In the former case (using the result of the previous item), this means that p = 1 (mod 4) andp = ±1 (mod 12). In the latter case, we must have p = — 1 (mod 4) and p = ±5 (mod 12), at the same time - we can take, so «,...,<) = 1. Further, we will show that the equation a1x1 + a2x2 H-----h anxn = b, where a1, a2,..., an, b are integers such that (a1,..., an) = I, always have a solution in integers and all such solutions can be described in terms of n — 1 integer parameters. We will prove this proposition by mathematic induction on n, the number of unknowns. The situation is trivial for 7i = 1 - there is a unique solution (which does not depend on any parameters). Further, let n > 2 and suppose that the statement holds for equations having n — 1 unknowns. Denoting d= (ai,..., a„_i), any n-tuple x1,... ,xn that satisfies the equation must also satisfy the congruence aiXi+a2X2-\-----h anxn = b (mod d). Since d is the greatest common divisor of the integers ai,..., a„_i, this congruence is of the form anxn = b (mod d), which (since (d, an) = (ai,..., an) = 1) has a unique solution xn = c (mod d), where c is a suitable integer, i.e., xn = c+d-t, where t e Z is arbitrary. Substituting into the original equation and refining it leads to the equation aixi + ■ ■ ■ + an-ixn-i = b — anc — andt with Ti—l unknowns and one parameter, t. However, the number (b — anc) jd is an integer, so we can divide the equation by d. This leads to a[xi H----+ a'n_1xn-1 = b', where a\ = ai/d for i = 1,... ,ti — 1 and b' = ((& — anc)/d) — ant, satisfying (a[,...,a/n_1) = (da'1,...,da'n_1)-\ = (au ..., a„_i)' \ = 1- By the induction hypothesis, this equation has, for any ieZ, a solution which can be described in terms of n—2 integer parameters (different from t), which together with the condition xn = c + dt gives what we wanted. II. 5.1. Pythagorean equation. In this section, we will deal with enumeration of all right triangles with integer side lengths. This is a Diophantine equation where we will only seldom use the methods described above; nevertheless, we will look at it in detail. The task is to solve the equation 2,2 2 x + y = z in integers. Solution. Clearly, we can assume that (x, y,z) = 1 (otherwise, we simply divide both sides of the equation by the integer d = (x,y,z)). Further, we can show that the integers x,y,z are pair-wise coprime: if there were a prime p dividing two of them, 816 CHAPTER 11. ELEMENTARY NUMBER THEORY for instance, the set {—5, —1,1,5} for a reduced residue system modulo 12, and since (3/p) = 1 for p = ±1 (mod 12), we surely have (3/p) = — 1 whenever p = ±5 (mod 12). We have thus obtained four systems of two congruences each. Two of them have no solution, and the remaining two are satisfied by p = 1 (mod 12) and p = — 5 (mod 12), respectively, iii) In this case, (6/p) = (2/p)(3/p) and once again, there are two possibilities: either (2/p) = (3/p) = 1 or (2/p) = (3/p) = —1. The former case occurs if p satisfies p = ±1 (mod 8) as well as p = ±1 (mod 12). Solving the corresponding systems of linear congruences leads to the condition p = ±1 (mod 24). In the latter case, we get p = ±3 (mod 8) as well as p = ±5 (mod 12), which together gives p = ±5 (mod 24). Let us remark that thanks to Dirichlet's theorem 11.2.5, the number of primes we were interested in is infinite in each of the three problems. □ 11.C.38. The following exercise illustrates that if the mod-\\N i'/, ulus of a quadratic congruence is a prime p satisfy- fr ing p = 3 (mod 4), then we are able not only to decide the solvability of the congruence, but also to describe all of its solutions in a simple way. Consider a prime p = 3 (mod 4) and an integer a such that (a/p) = 1. Prove that the solution of the congruence x2 = a (mod p) is ±a 4 (mod p). Solution. It can be easily verified (using lemma 11.4.12) that (a2^)2 = = a • (^) = a (mod p) . □ 11.C.39. Determine whether the congruence x2 = 3 (mod 59) is solvable. If so, find all of its solutions. -(-1) = 1. Solution. Calculating the Legendre symbol ,59) = =~(sy we find out that the congruence has two solutions. Thanks to the statement above, we can immediately see (59 = 3 (mod 4)) that the congruence is satisfied by x : ±3" ,15 (35 ±3X = ±73 = ±343 = Til (mod 59), since 35 = 243 = 7 (mod 59). then we can easily see that it would have to divide the third one as well, which it may not according to our assumption. Therefore, at most one of the integers x, y is even. If neither of them were, we would get z2 = x2 + y2 = 1 + 1 (mod 8), which is impossible (see exercise 11.A.2). Altogether, we get that exactly one of the integers x, y is even. However, since the roles of these integers in the equation are symmetric, we can, without loss of generality, select x to be even and set x = 2r, r e N. Hence, we have so □ 4r2 = zl - yl, 2 z+y z-y r = -■-. 2 2 Now, let us denote u = \ (z + y), v = \ (z — y) (then, the inverse substitution is z = u + v, y = u — v). Since y is coprime to z, so is u to v (if there were a prime p dividing both u and v, then it would divide their sum as well as their difference, i.e., the integers y and z). It follows from r2 = u ■ v that there are coprime positive integers a, b such that u = a2, v = b2. Moreover, since u > v, we must have a > b. Altogether, we get x = 2r = 2ab, y = u — v = (a2 — b2), z = u + v = (a2 + b2), which indeed satisfies the given equation for any coprime a, b e N with a > b. Further solutions can be obtained by interchanging x and y. Finally, relinquishing the condition (x, y, z) = 1, each solution will yield infinitely many more if we multiply each of its component by a fixed positive integer d. □ 11.5.2. Fermat's Last Theorem for n = 4. Thanks to the §. parametrization of Pythagorean triples, we will be able to prove that the famous Fermat's Last Theorem xn + yn = zn has no solution for n = 4 in integers. For this task it is sufficient to prove that the equation x4 + y4 = z2 has no solution inN. Solution. We will use the so-called method of infinite descent, which was introduced by Pierre de Fermat. This method utilizes the fact that every non-empty set of natural numbers has a least element (in other words, N is a well-ordered set). Therefore, suppose that the set of solutions of the equation x4 + y4 = z2 is non-empty and let (x, y, z) denote (any) solution with z as small as possible. The integers x, y, z are thus pairwise distinct. Since the equation can be written in the form (x2)2 + (y2)2 = z2, 817 CHAPTER 11. ELEMENTARY NUMBER THEORY D. Diophantine equations Here, we limit ourselves only to the small class of equations which can be solved using divisibility or can be reduced to solving congruences. ll.D.l. Linear Diophantine equations. Decide whether it is possible to use a balance scale to weigh 50 grams of given goods provided we have only (an arbitrary number of) three kinds of masses; their weights are 770, 630, and 330 grams, respectively. If so, how to do that? Solution. Our task is to solve the equation 770a; + 630y + 330z = 50, where x,y,z G Z (a negative value in the solution would mean that we put the corresponding masses on the other scale). Dividing both sides of the equation by (770,630,330) = 10, we get an equivalent equation 77a; + 63y + 33z = 5. Considering this equation modulo (77,63) following linear congruence: 7, we get the 33z = 5 5z = 5 z = 1 (mod 7), (mod 7), (mod 7). This congruence is thus satisfied by those integers z of the form z = 1 + 74, where t is an integer parameter. Substituting the form of z into the original equation, we get 77a;+ 63y = 5 - 33(1 +74), llx + 9y = -4 - 334. Now, we consider this (parametrized) equation modulo 11: 9y = -4 - 334 (mod 11), -2y = -4 (mod 11), y = 2 (mod 11). Therefore, this congruence is satisfied by integers y = 2+lis for any s e Z. Now, it only remains to calculate x: Ux = -4 - 334 - 9(2 + lis), llx = -22 - 334 - 9- lis, x = -2-3t- 9s. it follows from the previous exercise that there exist r, s e N such that 2 o 2 2 2 2 i 2 x = 2rs, y = r — s , z = r +s. Hence, y2 + s2 = r2, where (y, s) = 1 (if there were a prime p dividing both y and s, then it would divide x as well as z, which contradicts that they are coprime). Making the Pythagorean substitution once again, we get natural numbers a, b with (y is odd) y = a2-b2, s = 2ab, r = a2 + b2. The inverse substitution leads to a;2 = 2rs = 2-2a&(a2 + &2), and since x is even, we get ^y = ab(a2 + b2). The integers a, b, a2 + b2 are pairwise coprime (which can be derived easily from the fact that y is coprime to s). Therefore, each of them is a square of a natural number: a = c2, b = d2, a2 + b2 = e2, whence c4 + d4 = e2, and since e < a2 + b2 = r < z, we get a contradiction with the minimality of z. □ 6. Applications - calculation with large integers, cryptography 11.6.1. Computational aspects of number theory. In many practical problems which utilize the re- .'•**rf~-1?F"1 suits of number theory, it is necessary to execute '<^^5r_' one or more of the following computations fast: • common arithmetic operations (sum, product, modulo) on integers; • to determine the remainder of a (natural) n-th power of an integer a when divided by a given m; • to determine the multiplicative inverse of an integer a modulo mei; • to determine the greatest common divisor of two integers (and the coefficients of corresponding Bezout's identity); • to decide whether a given integer is a prime or composite number. • to factor a given integer to primes. Basic arithmetic operations are usually executed on large integers in the same way as we were taught at primary school, i.e., we add in linear time and multiply and divide with remainder in quadratic time. The multiplication, which is a base for many other operations, can be performed asymptotically more efficiently (there exist algorithms of the type divide and conquer) - for instance, the Karatsuba algorithm (1960), running in time & (nlog2 3) or the Schonhage-Strassen algorithm (1971), which runs in 0(n log nlog log n) and uses Fast Fourier Transforms - see also 7.2.5. Although it is asymptotically much better, in practice, it becomes advantageous for integers of at least ten thousand digits (it is thus 818 CHAPTER 11. ELEMENTARY NUMBER THEORY We have found out that the equation is satisfied if and only if (x, y, z) is in the set {(-2-3f-9s,2 + lls,l + 7f);s,f £ Z}. Particular solutions can be obtained by evaluating the triple at concrete values of t, s. For instance, setting t = s = 0 gives the triple (—2,2,1); putting t = —4, s = 1 leads to (1,13,-27). Of course, the unknowns can be eliminated in any order -the result may seem "syntactically" different, but it must still describe the same set of solutions (that is given by a particular coset of an appropriate subgroup (in our case, it is the subgroup (2, 2,1) + (3,0,7)Z+ (-9,11,0)Z) in the commutative group Z3, which is an apparent analog to the fact that the solution of such an equation over a field forms an affine subspace of the corresponding vector space). □ Other types of Diophantine equations reducible to congruences. Some Diophantine equations are such that one of the unknowns can be expressed explicitly as a function of the other ones. In this case, it makes sense to examine for which integer arguments it holds that the value of the function is also an integer. For instance, having an equation of the form mxn = f(x1,... ,xn-i), where m is a natural number and f(xi,...,xn-i) e Z[xi,..., xn-i] is a polynomial with integer coefficients, an n-tuple of integers x\,..., xn is a solution of it if and only if f{xi, ■ ■ ■ ,Zn-i) = 0 (mod m). 11.D.2. Solve the Diophantine equation x (x + 3) = Ay — 1. Solution. The equation can be rewritten as Ay = x2 + 3x +1. Now, we will solve the congruence x2 + 3x + l = 0 (mod 4). This congruence has no solution since for any integer x, the polynomial x2 + 3x + 1 evaluates to an odd integer (the fact that the congruence is not solvable can also be established by trying out all four possible remainders modulo 4 into it). □ 11.D.3. Solve the following equation in integers: 379a; + 314y + 183y2 = 210 used, for example, when looking for large primes in the GIMPS project). 11.6.2. Greatest common divisor and modular inverses. As we have already shown, the computation of the solution of the congruence a ■ x = 1 (mod m) in variable x can be easily reduced (thanks to Bezout's identity) to the computation of the greatest common divisor of the integers a and m and looking for the coefficients k, I in Bezout's identity k-a+l-m = 1 (the integer k is then the wanted inverse of a modulo m). function extended_gcd (a ,m) if m == 0: return (1,0) else (q,r) := divide (a,m) (k,l) := extended_gcd(m, r) return (1,k — q*1) A thorough analysis shows that the problem of computing the greatest common divisor has quadratic time complexity. 11.6.3. Modular exponentiation. The algorithm for modular exponentiation is based on the idea that when computing, for instance 264 mod 1000, one need not calculate 264 and then divide it with remainder by 1000, but that it is better to multiply the 2's gradually and reduce the temporary result modulo 1000 whenever it exceeds this value. More importantly, there is no need to perform such a huge number of multiplications: in this case, 63 naive multiplications can be replaced with six squarings, as 264 = am2)2)2)2)2 2. function modular_pow (base , exp , mod) result := 1 while exp > 0 if (exp % 2 == 1): result := (result * base) % mod exp := exp >> 1 base := (base * base) % mod return result The algorithm squares the base modulo n for every binary digit of the exponent (which can be done in quadratic time in the worst case) and it performs a multiplication for every one in the binary representation of the exponent. Altogether, we are able to do modular exponentiation in cubic time in the worst case. We can also notice that the complexity is a good deal dependent on the binary appearance of the exponent. See, for example, D. Knuth, Artof'ComputerProgramming, Volume!: Seminumerical Algorithms, Addison-Wesley 1997 or Wikipedia, Euclidean algorithm, http : //en .wikipedia . org/wiki/Euclidean_ algorithm (as of July 29, 2017). CHAPTER 11. ELEMENTARY NUMBER THEORY Solution. The equation is linear in x, so the other unknown, Example. Let us compute 2560 (mod 561). y, must satisfy the congruence 183y2 + 314y- 210 = 0 (mod 379). Now, we can complete the left-hand polynomial to square in order to get rid of the linear term. First of all, we must find a f £ Z such that 183 ■ t = 1 (mod 379). (In other words, we need to determine the inverse of the integer 183 modulo 379). For this purpose, we will use the Euclidean algorithm: 379 = 2 ■ 183 + 13, 183 = 14-13 + 1, whence 1 = 183 - 14 ■ 13 = 183 - 14 ■ (379 - 2 ■ 183) = = 29 ■ 183 - 14 ■ 379. Therefore, we can take, for instance, the integer 29 to be our t. Now, multiplying bith sides of the congruence by t = 29 and rearranging it, we get an equivalent congruence: y2 + lOy - 26 = 0 (mod 379) Now, we can complete the left-hand polynomial to square, which leads to (substituting z = y + 5) (y + 5)2 26: 2 . 379 \379 \ 3 J v ' \ 17 = (-1) ■(+1) (+1) = 0 (mod 379), 51 (mod 379). Invoking the law of quadratic reciprocity, we calculate the Legendre symbol (51/379): 51 379 |) ■(-!>■ (£)-<«■<-" ■(£ whence it follows that the congruence is solvable, and, in particular, it has two solutions modulo 379. The proposition of exercise 11.C.38 implies that the solutions are of the form i C1 ISA z = ±51 4 , where 513 = 1 (mod 379), whence 5195 = (513)31 ■ 512 = -52 (mod 379). The solution is thus z = ±52 (mod 379), which gives for the original unknown that y = 47 (mod 379), y = -57 (mod 379). Therefore, the given Diophantine equation is satisfied by those pairs (x,y) withy G {47 + 379 ■ k; k G Z} U {-57 + Since 560 gives (1000110000)2, the mentioned algorithm exp base result last digit exp 560 2 1 0 280 4 1 0 140 16 1 0 70 256 1 0 35 460 1 1 17 103 460 1 8 511 256 0 4 256 256 0 2 460 256 0 1 103 256 1 0 511 1 0 Therefore, 25 1 (mod 561). 11.6.4. Primality testing. Although we have the Fundamental theorem of arithmetic, which guarantees that every natural number can be uniquely factored to a product of primes, this operation is very hard from the computational point of view. In practice, it is usually done in the following steps: (1) finding all divisors below a given threshold (by trying all primes up to the threshold, which is usually somewhere around 106); (2) testing the remaining factor for compositeness (deciding whether some necessary condition for primality holds); (a) if the compositeness test did not find the integer to be composite, i.e., it is likely to be a prime, then we test it for primality to verify that it is indeed a prime; (b) if the compositeness test proved that the integer was composite, then we try to find a non-trivial divisor. ~~ The mentioned steps are executed in this order because the corresponding algorithms are gradually (and strongly) increasing in time complexity. In 2002, Agrawal, Kayal, and Saxena published an algorithm for primality testing in polynomial time, but it is still more efficient to use the above procedure in practice. 11.6.5. Compositeness tests - how to recognize composite numbers with certainty? The so-called compositeness tests check for some necessary condition for primality. The easiest of such conditions is Fermat's little theorem. Proposition (Fermat's test). Let N be a natural number. If there is an a ^ 0 (mod N) such that aN_1 ^ 1 (mod N), then N is not a prime. Unfortunately, having a composite N, it still may not be easy to find such an integer a which reveals the compositeness of N. There are even such exceptional integers N for which the only integers a with the mentioned property are those which are not coprime to N. To find them is thus equivalent to finding a divisor, and thus to factoring N to primes. 820 CHAPTER 11. ELEMENTARY NUMBER THEORY 379 -k; k e 1} and x = ^ ■ (210 - 314y - 183y2); e. g. (-1105,47) or (-1521, -57) (which are the only solutions with \x\ < 105). □ 11.D.4. Solve the equation 2X = 1 + 3^ in integers. Solution. If y < 0, then 1 < 1 + 3y < 2, whence 0 < x < 1, so x could not be an integer. Therefore, y > 0, hence 2X = 1 + 3^ > 2 and x > 1. We will show that we also must have x < 2. If not (i.e., if x > 3), then we would have 1 + 3^ = 2X = 0 (mod 8), whence it follows that 3V = -1 (mod 8). However, this impossible since the order of 3 modulo 8 equals 2, so the powers of three are congruent to 3 and 1 only. Now, it remains to examine the possibilities x = 1 and x = 2. For x = 1, we get 3» = 21 - 1 = 1, hence y = 0. If x = 2, we have 3» = 22 - 1 = 3, whence y = 1. Thus, the equations has two solutions: a; = 1, y = 0; and a; = 2, y = 1. □ E. Primality tests 11.E.1. Mersenne primes. The following problems are in <2Jig3> deep connection with testing Mersenne num- bers for primality. For any q e N, consider the integer Mq = 2? — 1 and prove: i) If q is composite, then so is Mq. ii) If q is a prime, g = 3 (mod 4), then 2q + 1 divides Mg if and only if 2q + 1 is a prime (hence it follows that if There are indeed such ugly (or extremely nice?) composite numbers N for which every integer a which is co-prime to N satisfies a""1 = 1 (mod N). These are called Carmichael numbers, the least of which9 is 561 = 3 1117, and it was no sooner than in 1992 that it was proved10 that there are even infinitely many of them. Example. We will prove that 561 is a Carmichael number, i.e., that it holds for every a e N which is coprimeto 3T1-17 that a560 = 1 (mod 561). Thanks to the properties of congruences, we know that it suffices to prove this congruence modulo 3,11, and 17. However, this can be obtained straight from Fermat's little theorem since such an integer a satisfies a2 = 1 (mod 3), a10 = 1 (mod 11), a16 = 1 (mod 17), where all of 2, 10, and 16 divide 560, hence a560 = 1 modulo 3, 11 as well as 17 for all integers a coprime to 561 (see also Korselt's criterion mentioned below). 11.6.6. Proposition (Korselt's criterion). A composite number n is a Carmichael number if and only if both of the following conditions hold • n is square-free (divisible by the square of no prime), • p — 1 | n — 1 holds for all primes p which divide n. Proof. " <= " We will show that if n satisfies the above two conditions and it is composite, then every a e Z which is coprime to n satisfies an_1 = 1 (mod n). Let us thus factor n to the product of distinct odd primes: n = pi ■ ■ ■ pk, where Pi — 1 n — 1 for alii G {1,,..., k}. Since (a, pi) = 1, we get from Fermat's little theorem that aPl_1 = 1 (mod pi), whence (thanks to the condition pi — 1 | n — 1) it also follows that an_1 = 1 (mod pi). This is true for all indices i, hence an-i = ^ (mod n), so n is indeed a Carmichael number. " => " A Carmichael number n cannot be even since then we would get for a = — 1 that an_1 = — 1 (mod n), which would (since an_1 = 1 (mod n)) mean that n is equal to 2 (and thus is not composite). Therefore, let n factor as 7i = p"1 ■ ■ ■pk"k, where pi are distinct odd primes and a{ e N. Thanks to theorem 11.3.8, we can choose for every i a primitive root gi modulo pi', and the Chinese remainder theorem then yields an integer a which satisfies a = g{ (mod p"') for all i and which is apparently coprime to n. Further, we know from the assumption that an_1 = 1 (mod n), so this holds modulo pi', and thus g™-1 = 1 (mod pi') as well. Since gi is a primitive root modulo p°', the integer n — 1 must be a multiple of its order, i.e. amultiple of 1. ii) Let n = 2g+l be a divisor of Mg. We will show that n is aprime invoking Lucas' theorem 11.6.10, Since n — 1 = 2g has only two prime divisors, it suffices to find com-positeness witnesses for the integers 2 and q. We have 2^ = 22 ^ 1 (mod 7i), (-2)^ = -2« =. - 1 ^ 1 (mod ti), thanks to the assumption 7i | Mg = 29 — 1. Further, since (-2)™-1 = 2™-1 = 22q - 1 = (2« + l)Mq = 0 (mod ti), it follows from Lucas'theorem that ti is a prime. Now, let p = 2g + 1 = —1 (mod 8) be a prime. Since (2/p) = 1, there exists an m such that 2 = m2 (mod p). Hence, 2q = 2^ = 77ip_1 = 1 (mod p), so p\2q -1 = = Mg. iii) If p | Mg = 2q — 1, then the order of 2 modulo p must divide the prime g, hence it equals g. Therefore, g | p—1, and there exists a k e Z such that 2gfc = p — 1. Altogether, we get (2/p) = 2^ = 2qk = 1 (mod p), i.e., p = ±1 (mod 8). 11.E.2. □ For each of the following Mersenne numbers, determine whether it is prime or composite: 211 _ 1)215 _ li223 _ li229 _ ^ and283 - 1. Solution. In the case of the integer 215 — 1, the exponent is composite; therefore, the whole integer is composite as well (we even know that it is divisible by 23 — 1 and 25 — 1). In the other cases, the exponent is always a prime. We can notice that these primes, namely q = 11,23,29, and 83, are even Sophie Germain primes (i.e., 2g + 1 is also a prime). It thus follows from part (ii) of the previous exercise that 23 | 211 — 1, 47 I 223 - 1, and 167 I 283 - 1. Fermat's primality test can be slightly improved to Eu-ler's test or even more with the help of the lacobi symbol, yet this still does not mend the presented problem completely. Proposition (Euler's test). Let N be an odd natural number. If there is an integer a ^ 0 (mod N) such that a~^~ ^ ±1 (mod N), then N is not a prime. Proof. This follows directly from Fermat's theorem and the fact the for N odd, we have a !)• JV-l (^ -l)(a- □ Proposition (Euler-lacobi test). Let N be an odd natural number. If there is an integer a ^ 0 (mod N) such that a~^~ ^ (^) (mod N), then N is not a prime. Proof. This follows immediately from lemma 11.4.12. □ Example. Let us consider N = 561 = 3 ■ 11 ■ 17 as before and let a = 5. Then, we have 5280 = 1 (mod 3) and 5280 = 1 (mod 10), but 5280 = -1 (mod 17), so surely 5280 ^ ±1 (mod 561). Here, it did not hold that a^"1)/2 = ±1 (mod N), so we even did not need to check the value of the lacobi symbol (5/561). However, the Euler-lacobi test can often reveal a composite number even in the case when this power is equal to ±1. Example. Euler's test cannot detect the compositeness of the integer iV = 1729 = 7-13-19 since the integer = 864 = 25 ■ 33 is divisible by 6, 12, and 18, and so it follows from Fermat's theorem that a^-1^2 = 1 (mod N) holds for all integers a coprime to N. On the other hand, we get already for a = 11 that (11 /172 9) = -1, so the Euler-lacobi is able to recognize the integer 1729 as composite. Let us notice that the value of the Legendre or lacobi symbol (a/n) can be computed very efficiently thanks to the law of quadratic reciprocity11, namely in time 0((l0ga)(l0g7l)). pseudoprimes A composite number ti is called a pseudoprime if it passes the corresponding test of compositeness without being revealed. We thus have (1) Fermat pseudoprimes to base a, (2) Euler (or Euler-lacobi) pseudoprimes to base a, (3) strong pseudoprimes to base a, which are composite numbers which pass the following compositeness test: The subsequent test is simple, yet (as shown in theorem 11.6.8) very efficient. It is a further specification of Fermat's test, which we have introduced at the beginning. 11.6.7. Theorem. Let p be an odd prime. Let us write p—1 = 2* ■ q, where t is a natural number and q is odd. Then, every integer a which is not a multiple ofp satisfies aq = 1 (mod p) See Wikipedia, Sophie Germain prime, http : //en . wikipedia. org/wiki/Sophie_Germain_prime (as of July 28, 2013, 14:43 GMT). See H. Cohen, A Course in Computational Algebraic Number The- ory, Springer, 1993. 822 CHAPTER 11. ELEMENTARY NUMBER THEORY We cannot use this proposition for the last case since 29 ^ 3 (mod 4) and, indeed, 59 \ 229 - 1. Now, however, it follows from part (iii) of the above exercise that if there is a prime p which divides 289 — 1, then it must satisfy p = ±l (mod 8) p= 1 (mod 29), i.e., p = 1 (mod 232) or p = 175 (mod 232). If we are looking for a prime divisor of the integer n = 229 — 1 = 536 870 911, then it suffices to check the primes (of the above form) up to y^n « 23170. There are 50 of them, so we are able to decide whether n is a prime quite easily (even with paper and pencil). In this case, fortunately, n is divisible already by the least prime, 233. □ 11.E.3. Show that the integer 341 is a Fermat pseudoprime to base 2, yet it is not a Euler-lacobi pseudoprime to base 2. Further, prove that the ^ integer 561 is a Euler-lacobi pseudoprime to base 2, but not to base 3. Prove that, on the other hand, the integer 121 is a Euler-lacobi pseudoprime to base 3, but not to base 2. Solution. The integer 341 is a Fermat pseudoprime to base 2 since 210 = 1 => 2340 = 1 (mod 341). It is not a Euler-lacobi pseudoprime since 2170 = 1 (mod 341), but (gfj-) = -1, which follows from the fact that 341 = -3 (mod 8). For the integer 561, we have 2280 = 1 (mod 561) and (gfj) = 1, since 561 = 1 (mod 8). Therefore, it is a Euler-lacobi pseudoprime to base 2. But not to base 3, since 3 | 561. On the other hand, the integer 121 satisfies 35 = 1 (mod 121) => 360 = 1 (mod 121) and (I|T) = 1, but 260 = 89 ^ 1 (mod 121). □ 11.E.4. Prove that the integers 2465, 2821, and 6601 are Carmichael numbers, i.e., denoting any of them as n, then every integer a co-prime to n satisfies a71'1 = 1 (mod n). Solution. We have 2465 = 5-17-29, 2821 = 7-13-31, 6601 = 7-23-41, and the proposition follows from Ko-rselt's criterion 11.6.6 since all of the integers 4,16,28 divide 2464 = 25 ■ 7 ■ 11, all of the integers 6,12,30 divide 2820 = 22-3-5-47, and 6,22,40 divide 6600 = 23-3-52-ll. □ or there exists an e G {0,1, — 1} such that a2 q = — 1 (mod p). Proof. It follows from Fermat's little theorem that p | a?-1 - 1 = (a^ - lXa2^ + 1) = = (a^-l)(a^+l)(a^ + l) = = (aq - l)(aq + l)(a2q + 1) ■ ■ ■ (a2"1" + 1), whence the statement follows easily since p is a prime. □ Proposition (Miller-Rabin compositeness test). Let N, t, q be natural numbers such that N is odd and N — 1 = 2* ■ q, 2 { q. If there is an integer a ^ 0 (mod N) such that aq iL 1 (mod N) a2'"^-l (mod JV) for e £ {0,1, — 1}, then N is not a prime. Proof. The correctness of the test follows directly from the previous theorem. □ Miscellaneous types of pseudoprimes In practice, this easy test rapidly increases the ability to recognize composite numbers. The least strong pseudoprime to base 2 is 2047 (while the least Fermat pseudoprime to base 2 was already 341), and considering the bases 2, 3, and 5, the least strong pseudoprime is 25326001. In other words, if we are to test integers below 2-107, then it is sufficient to execute this compositeness test already for the bases 2,3, and 5. If the tested integer is not revealed to be composite, then it is surely a prime. On the other hand, it has been proved that no finite basis is sufficient for testing all natural numbers. The Miller-Rabin test is a practical application of the previous statement, and we are even able to bound the probability of failure thanks to the following theorem, which we present without a proof12. 11.6.8. Theorem. Let N > 10 be an odd composite number. Let us write N — 1 = 2* ■ q, where t is a natural number and q is odd. Then, at most a quarter of the integers from the 12Schoof, René (2004), "Four primality testing algorithms" (PDF), Algorithmic Number Theory: Lattices, Number Fields, Curves and Cryptography, Cambridge University Press, ISBN 978-0-521-80854-5 823 CHAPTER 11. ELEMENTARY NUMBER THEORY 11.E.5. Prove that the integer 2047 is a strong pseudoprime to base 2, but not to base 3. Further, prove that the integer 1905 is a Euler-Jacobi pseu-S^. doprime to base 2 but not a strong pseudo-prime to this base. Solution. In order to verify whether 2047 is a strong pseudo-prime to base 2, we factor (22046 -l) = (21023 - 1)(21023 + 1). Since 21023 = 1 (mod 2047), the statement is true. However, it is not a strong pseudoprime to base 3 as 31023 = 1565 ^ ±1 (mod 2047). Notice that for the integer 2047, the strong pseudoprimality test is identical to the Euler one (this is because the integer 2046 is not divisible by four). The integer 1905 is a Euler-Jacobi pseudoprime to base 2 since 21904/2 = 1 (mod 1905) and the Jacobi symbol (2/1905) is equal to 1. Since 1904 = 24 ■ 7 ■ 17, 1905 will be a strong pseudoprime to base 2 only if at least one of the following congruences holds: 2952 = _x 2476 = _x 2238 = _-l 2119 = ±1 (mod 1905), (mod 1905), (mod 1905), (mod 1905). However, 2952 = 2476 = 1 (mod 1905), 2238 = 1144 (mod 1905), and 2119 = 128 (mod 1905). Therefore, 1905 is not a strong pseudoprime to base 2. □ 11.E.6. Applying Pocklington-Lehmer test 11.6.11, show that 1321 is a prime. Solution. Let us set N = 1321, then N -l = 1320 = 23-3-5-ll. For the sake of simplicity, we will assume that the trial division is executed only for primes below 10, then F = 23 ■ 3 ■ 5 = 120, U = 11, where (F, U) = (120,11) = 1. In order to prove the primality of 1321 by the Pocklington-Lehmer test, we need to find a primality witness ap for each p e {2,3, 5}. Since (2— - l, 1321J = 1 and (2— - l, 1321J = 1, we can lay a3 = a5 = 2. However, for p = 2, we have ^21^a — 1,1321^ = 1321, so we have to look for another primality witness. We can take a2 = 7 since (7W1 - l, I32l) = 1. In both cases, we have 21320 = set {a £ Z; 1 < a < N, (a, N) = 1} satisfies the following condition: aq = 1 (mod N) or there is an e G {0,1,..., t — 1} satisfying a¥q = -\ (mod N). In practical implementations, one usually tests about 20 random bases (or the least prime bases). In this case, the above theorem states that the probability of failing to reveal a composite number is less than 2-40. The time complexity of the algorithm is same as in the case of modular exponentiation, i.e. cubic in the worst case. However, we should realize that the test is non-deterministic and the reliability of its deterministic version depends on the so-called generalized Riemann hypothesis (GRH ). 11.6.9. Primality tests. Primality tests are usually applied when the used compositeness test claims that the examined integer is likely to be a prime, or they are executed straightaway for special types of integers. Let us first give a list of the most known tests, which includes historical tests as well as very modern ones. (1) AKS - a general polynomial primality test discovered by Indian mathematicians Agrawal, Kayal, and Saxena in 2002. (2) Pocklington-Lehmer test - primality test of subexponen-tial complexity. (3) Lucas-Lehmer test - primality test for Mersenne numbers. (4) Pepin's test - primality test for Fermat numbers from 1877. (5) ECPP - primality test based on the so-called elliptic curves. Now, we will introduce a standard primality test for Mersenne numbers. Proposition (Lucas-Lehmer test). Let q=^2bea prime, and let a sequence {sn)n°=0 be defined recursively by s0 = 4,sn+i = s2 - 2. Then, the integer Mq = 2q — 1 is a prime if and only if Mq divides sq-2- Proof. We will be working in the ring R = Z[V3] = = {a + b^3; a,b G Z}, where the division with remainder behaves similarly as in the integers (see also 12.2.5). Let us seta = 2+V3,/3 = 2-V3 and note that q+/3 = 4,q-/3 = 1. First, we prove by induction that it holds for all n e No that (1) a2 +/32 1 + q2 The statement is true for n = 0 since so = 4 = a + f3. Now.letus suppose thatitis true forn—1, then sn = sn_1 — 2 Wikipedia, Riemann hypothesis, http: //en . wikipedia . org/wiki/Riemann_hypothesis (as of July 29, 2017). 824 CHAPTER 11. ELEMENTARY NUMBER THEORY 71320 = 1 (mod 1321). The primality witnesses of the in- iS;by the induction hypothesis, equal to (a teger 1321 are thus a2 = 7, a3 = a5 = 2. Instead, we could also have chosen for all primes p the same number (e. g. 13), which is a primitive root modulo 1321. □ 11.E.7. Factor the integer 221 to primes by Pollard's p-method. Use the function f(x) = x2 + 1 with initial value x0 = 2. + I32 Solution. Let us seta; = y = 2. The procedure from 11.6.14 gives: x:=f(x) y:=f(f(y)) j/1,221) mod 221 5 26 14 197 26 197 104 145 1 1 1 13 We have thus found a non-trivial divisor, so now it is easy to calculate 221 = 13 ■ 17. □ 11.E.8. Find a non-trivial divisor of the integer 455459. Solution. Consider the function f(x) = x2 + 1 (we silently assume that this function behaves randomly modulo an unknown prime divisor p of the integer n and has the required properties). In the particular iterations, we compute a <— f(a) (mod n), b <— /(/(&)) (mod n) while evaluating d = (a — b,n). a b d 5 26 1 26 2871 1 677 179685 1 2871 155260 1 44380 416250 1 179685 43670 1 121634 164403 1 155260 247944 1 44567 68343 743 We have found a divisor 743, and now we can easily compute that 455459 = 613-743. □ F. Encryption 11.F.1. RSA. We have overheard that the integers 29, 7, 21 were sent by means of RSA with public key (7,33). Try to break the cipher and find the messages (integers) that were originally sent. + /32 Further, since Mq = — 1 (mod 8), we have (2/Mq) 1, and it follows from the law of quadratic reciprocity that Mr Mr, 2« - 1 -1, since we have 2q — 1 = 1 (mod 3) for q odd. Both of these expressions are valid even if Mq is not a prime (in this case, it is the Jacobi symbol). Let us note that in the last part of the proof, we will use the extension of the congruence relation to the elements of the domainZfv7^] = {a+b^3; a,b e Z}; just like in the case of the integers, we write for a,/3e Z[V3] that a = j3 (mod p) if p | a—f3. Further, an analog of proposition (ii) from 11 .B.6 holds as well - if p is a prime, then (a + 0)p = ap + ff (mod p) (the proof is identical to the one for the integers). " ==> " Suppose that Mq is a prime. We will prove that Mn = —1 (mod Mg), which will imply (thanks to 1) that Sq-2. Since 2 p — I > 2q > 2q — 825 CHAPTER 11. ELEMENTARY NUMBER THEORY Solution. In order to find the private key d, we need to solve the congruence Id = 1 (mod y>(33)). However, since the integer 33 is quite small, we can factor it and easily compute that y>(33) = (3 - 1)(11 - 1) = 20. We are thus looking for a d such that 7d = 1 (mod 20), which is satisfied by d = 3 (mod 20). Since 293 = (-4)3 = 2, 73 = 13, and 213 = 21 (mod 33), the messages that were encrypted are 2,13, and 21. □ Attacks against RSA. x\^iv Using so-called Fermat's factorization method, ^ mo we can to factor n = P ' 9 if we think that the IP difference between p and q is small. Then, p + q p-q where s = (p — q)/2 is small and f = (p + q)/2 is only a bit greater than y/n. Therefore, it suffices to check whether4 t = \y/n\,t = \y/n\ + l,t = \y/n\ + 2,..., until f2 — n is a square (this condition can, of course, be checked efficiently). 11.F.2. Now, we will try to factor the integer n = 23104222007 this way. (We anticipate that it is a product of two close primes.) Solution. We compute « 152000,731 and check the candidates for t: Fori Fori 152001, we have VP 152002, it is Vi2 -n For t = 152003, Vt2 - n « 830,664 Finally, for t = 152004, we get y/W- n « 286,345. 621,287. n = 997 £ Z. Therefore, s = 997 and we can easily calculate the prime divisors of n: p = t + s = 153001, q = t-s = 151007. □ ^he symbol \x\ denotes the ceiling if a real number x, i.e., the it is the integer which satisfies \x] — 1 < x < \x\. 1 = Mq, which contradicts the fact that p is a divisor of Mq. Therefore, we have (3/p) = —1 and ap+1 = (2 + v^) (2 + v/3)P = (2 + v^) (2 - V^) = 1 (mod p). The order of a modulo p is 2q, hence 29 | p+1 and especially p > 2q — 1 = = Mg. At the same time, p is a prime divisor p is a prime. □ of Mq, therefore, Mq Unlike the proof, implementation of this algorithm is .. very easy. Algorithm (Lucas-Lehmer primality test): function LL_is_prime (q) s := 4;M := 2« - 1 repeat g — 2 times s := s2 - 2 (mod M) if s = 0 return PRIME, else return COMPOSITE. The time complexity of this test is asymptotically the same as in the case of the Miller-Rabin test. It is, however, more efficient in concrete instances. Fermat numbers are integers of the form Fn = 22" + 1. Pierre de Fermat conjectured in the 17th cen-,^crf^ tury that all of the integers of this form are -ziJl|s|JS primes (apparently driven by the effort to generalize the observation for F0 = 3, Fi = 5, F2 = 17, 7^3 = 257, and FA = 65537). However, in the 18th century, Leonhard Euler found out that F5 = 641 x 6700417, and we have not been able to discover any other Fermat primes so far. Since their size increases rapidly, it takes much time resources to compute with them (so the following test is not much used). Nowadays, the least Fermat number which has not been tested is F33, which has 2 585 827 973 digits, and it is thus much greater than the largest discovered prime. Proposition (Pepin's test). A necessary and sufficient condition for the n-th Fermat number Fn to be a prime is = -1 (mod Fn). We can see that this is a very simple test, which is actually a mere part of Euler's compositeness test. suppose 1 = 1 Proof of correctness of Pepin's test. First, that S^""1)/2 = -1 (mod Fn). Then, 3F"' (mod Fn). Since Fn — 1 is a power of two, Fn — 1 is necessarily the order of 3 modulo Fn. However, the order of every integer modulo Fn is at most 1. Let p be a prime which divides N — 1. Further, let us suppose that there is an integer ap such that = 1 (mod N) and N-l . P -l,N\= 1. Then Let pa>? be the highest power of p which divides N — 1 every positive divisor d of the integer N satisfies d=l (mod pa"). Proof of the Pocklington-Lehmer theorem. Every positive divisor d of the integer N is a product of prime divisors of N, so it suffices to prove the theorem for prime values of d. The condition ap_1 = 1 (mod N) implies that the integers ap, N are coprime (any divisor they have in common must divide the right-hand side of the congruence as well). Then, (ap, d) = 1 as well, and we have a^-1 = 1 -1,7V =1, (mod d) by Fermat's theorem. Since (apN we get apN ^ 1 (mod d). Let e denote the order of ap modulo d. Then, e | d—1, e\ Ar — 1, and e] (N — I)/p. If pQp \ e, then e | N — 1 would imply that e | ^j^, which is a contradiction. Therefore, pa" | e, and so pa" d-1. □ 827 CHAPTER 11. ELEMENTARY NUMBER THEORY and (—9) 1 = 9 (mod 41). Therefore, the decrypted message is the integer M = 9 ■ 6 = 13 (mod 41) . □ 11.F.6. Rabin cryptosystem. Alice has chosen p = 23, q = 31 as her private key in Rabin cryptosystem. The public key is n = pq = 713, then. Encrypt the message M = 327 for Alice and show how Alice will decrypt it. Solution. We compute C = (327)2 = 692 (mod 713) and send this cipher to Alice. According to the decryption procedure, we determine r = CCP+1)/4 = 692^ = 18 (mod 23), s = C(9+1)/4 = 692^ = 14 (mod 31), and further the coefficients a, b into Bezout's identity 23 ■ a + 31 ■ b = 1 (using the Euclidean algorithm). We get a = —4, b = 3; the candidates for the original message are thus the integers +4 ■ 23 ■ 14 ± 3 ■ 31 ■ 18 (mod 713). We thus know that one of the integers 386,603,110,327 is the message that was sent. □ 11.F.7. Show how to encrypt and decrypt the message M = 321 in Rabin cryptosystem with n = 437. Solution. The encrypted text can be obtained as the square modulo n: C = 3212 = (-116)2 = 13456 = 346 (mod 437). On the other hand, when decrypting, we will use the factorization (its knowledge is the private key of the message receiver) n = 437 = 19-23, and we compute r = 346iTi = 3465 = 17 = -2 (mod 19) and s = 24612^i = 3466 = 1 (mod 23). Applying Euclidean algorithm to the pair (19,23) = 1, we determine the coefficients into Bezout's identity 19- (-6) + 23- 5 = 1. Then, the message is one of the integers ±6-19-l±5-23-(—2) (mod 437), i.e., M = ±116 or M = ±344. Indeed, M = -116 = 321 (mod 437). □ 11.6.12. Theorem. Let N G N, N > 1. Suppose that we can write N - 1 = F ■ U, where (F, U) = 1 and F > VN, and that we are familiar with the prime factorization of F. Then: • if we can find for every prime p | F an integer ap G Z from the above theorem, then N is a prime; • if N is a prime then for every prime p \ N — 1, there is an integer ap G Z with the desired properties. Proof. By theorem 11.6.11, the potential divisor d > 1 of the integer 7y satisfies d = 1 (mod pa") for all prime factors of F, hence d = 1 (mod F), and so d > ^N. If 7y has no non-trivial divisor less than or equal to v^V, then it is necessarily a prime. On the other hand, it suffices to choose for ap a primitive root modulo the prime 7y (independently of p). Then, it follows from Fermat's theorem that a. N-l 1 (JV-l)/p (mod 7y), and since ap is a primitive root, we get ap 1 (mod 7y) for any p \ N — 1. The integers ap are again called primality witnesses for the integer 7y. □ Remark. The previous test also contains Pepin's test in itself (here, for 7y = Fn, we have p = 2, which is satisfied by the primality witness ap = 3). 11.6.13. The polynomial test. Viz Radan (ATC2014) -podrobne, a velmi stručne McAndrew (Crypto in sage) Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni 828 CHAPTER 11. ELEMENTARY NUMBER THEORY o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. Přidat popis AKS algoritmu - zvazit zda vcetne důkazu (podívat se do algebraické kapitoly, jestli je tam vse potrebne). Pripadne by bylo mozne přidat důkaz tvrzeni o Rabin-Millerovi ze Schoofova clanku citovaného vyse. 829 CHAPTER 11. ELEMENTARY NUMBER THEORY 11.6.14. Looking for divisors. If one of the composite- SS, » ness tests vefifies mat a given integer is indeed -Tj\ composite, we usually want to find one of its sSsggZ? non-trivial divisors. However, this task is much more difficult than mere revealing that it is composite - let us recall that the compositeness tests can guarantee the compos-iteness, yet they provide us with no divisors (which is, on the other hand, advantageous for RSA and similar cryptographic protocols). Therefore, we will present here only a short summary of methods used in practice and one sample for inspiration. (1) Trial division (2) Pollard's p-algorithm (3) Pollard's p — 1 algorithm (4) Elliptic curve method (ECM) (5) Quadratic sieve (QS) (6) Number field sieve (NFS) For illustration, we demonstrate in the exercises (11.E.8) one of these algorithms - Pollard's p-method - on a concrete instance. This algorithm is especially suitable for finding relatively small divisors (since its expected complexity depends on the size of these divisors), and it is based on the idea that having a random function / : S —> S, where S is a finite set having n-elements, the sequence (xn)™=0, where xn+i = j(xn), must loop. The expected length of the tail as well as the period is then sJ-k ■ n/8. *1 The algorithm described below is again a straightforward implementation of the mentioned reasonings. Algorithm (Pollard's p-method): Input : n — the integer to be factored , and an appropriate function j(x) a : = 2; b-=2;d:=l While d= 1 do a : = /(«) b := sum d : = gcd(a — b, n) If d = = n, return FAILURE. Else return d. 830 CHAPTER 11. ELEMENTARY NUMBER THEORY 11.6.15. Public-key cryptography. In present-day prac- tice, the most important application of number theory is the so-called public-key cryptography. Its main objectives are to provide • encryption; the message encrypted with the public key of the receiver can be decrypted by no one else (to be precise, by no one who does not know his private key); • signature; the integrity of the message signed with the private key of the sender can be verified by anyone with access to his public key. The most basic and most often used protocols in public-key cryptography are: • RSA (encryption) and the derived system for signing messages, • Digital Signature Algorithm - DSA and its variant based on elliptic curves (ECDSA), • Rabin crypto system (and signature scheme), • ElGamal crypto system (and signature scheme), • elliptic curve cryptography (ECC), • Diffie-Hellman key exchange protocol (DH). 11.6.16. Encryption - RSA. First, we describe the most 4&ri known public-key cipher - RSA. The principle of the protocol RSA is as follows: • Every participant A needs a pair of keys - a public one (Va) and a private one (Sa)- • Key generating: the user selects two large primes p, q, and calculates n = pq, (p(ri) = (p — 1) (q — 1). The integer n is public; the idea is that it is too hard to compute ip(n). • Then, the user chooses a public key e and verifies that (e, N satisfy (/(a), /(b)) = (/(a), /(|a - b|)) for all a, b e N. Prove that (/(a), /(b)) = /((a, b)). Show that this implies the result of exercise 11.A.6 as well as the fact that (Fa, Ft,) = F(ab), where Fa denotes the a-th term of the Fibonacci sequence. O 11.G.10. Let the RSA parameters be n = 143 = 11 ■ 13, e = 7, d = 103. Sign the message m = 8, and verify this signature. Decide, whether s = 42 is the signature of the message m = 26. O 836 CHAPTER 11. NUMBER THEORY Key to the exercises 9.B.6. 4tt. 9.B.7. 36tt. 9. B.8. ff. 10. C.10. | • | + | • 1 = |. 10.E.17. Simply, a = §. Thus, the distribution function of the random variable X is Fx (t) = |t3 for t e (0, 2), zero for smaller values of t, and one for greater. Let Z = X3 denote the random variable corresponding to the volume of the considered cube. It lies in the interval (0, 8). Thus, for t e (0,8) and the distribution function Fz of the random variable Z, we can write Fz{t) = P[Z < t] = P[X3 < t] = P[X < v^] = Fx (v7*) = |t. Then, the density is fz (t) = g on the interval (0, 8) and zero elsewhere. Since this is the uniform distribution on the given interval, the expected value is equal to 4. 10.F.9. EU = 1 • 0.6 + 2 • 0.4 = 1.4, EU2 = 0.4 + 4 • 0.6 = 2.8 EV = 0.4 + 0.6 + 1.2 = 2.1, EV2 = 0.3 + + 1.2 + 3.6 = 5.1, E(UV) = 2.8, var([/) = 2.8 - 1.42 = 2.8 - 1.96 = 0.84, var(F) = 5.1 - 4.41 = 0.69, cov(W) = 2.8 - 1.4 • 2.1 = -0.14, = -0-14 lJU>v \/0.84-0.69' 10.F.10. EX = 1/3, var2 X = 4/45. 10.F.11. px,y = — 1. 10. F.12. guy = -0,421. 11. B.14. Let us consider the factorization of the number n to primes. If n = p"1 ■ ■ -p^k, then •p(n) = (pi — l)p"1_1 • • • (pk — l)p^fc_1. And if we want to have 9 | 3 for some i e {1,..., k}, ill) pi = 3, ai = 2, andpj = 1 (mod 3) for some distinct i, j e {1,..., k}, iv) pi = 1 (mod 3) andpj = 1 (mod 3) for some distinct i,j 6 {1,..., k}. If we restrict our attention (as the statement of the problem asks) to numbers n < 100, then the condition i) is satisfied by primes 19, 37, and 73 (together with their multiples 38, 57, 76, 95, and 74), ii) is satisfied by 33 = 27, 34 = 81 (together with a multiple 54), ill) is matched by the number 32 • 7 = 63, iv) is matched by the number 7 • 13 = 91. 11.B.22. i) The integer 3 has order 4 modulo 10, so it suffices to determine the remainder of the exponent when divided by 4. This remainder is equal to 1, so the last digit is 31 = 3. ii) 37 = —3 (mod 10) is of order 4. Again, it suffices to compute the remainder of the exponent upon division by 4. However, we apparently have 37 = 1 (mod 4), so the wanted remainder upon division by 10 equals (—3)1 =7, and the last digit is thus 7. ill) Since (12,10) > 1, it makes no sense to talk about the order of 12 modulo 10. However, the examined integer is clearly even, so it suffices to find its remainder upon division by 5. The order of 12 = 2 (mod 5) is 4, and the exponent satisfies 1314 = l14 = 1 (mod 4). We thus have 1213 = 21 (mod 5), and since 2 is an even integer, it is the wanted digit as well. 11.B.23. Since tp(n) < n, we surely have tp(n) | n\, whence the statement already follows as odd positive integers n satisfy 2*''"' = 1 (mod n). 11.B.26. Similarly to the above exercise, we will examine the remainders upon division by coprime integers 125 and 8. We know that (12,125) = 1 and ^(125) = 100, so 12io" = 12io».io» = ^12i°y°a = = 1 (mod 125)- Since 4 | 12, the number 1210 is divisible even by 410 , so it is by mere 8 as well, hence 1210 = 0 (mod 8). The Chinese remainder theorem states that exactly one of the integers 0,1,..., 999 leaves remainder 1 upon division by 125 and is divisible by 8. This integer is 376 (it can be found, for instance, by going through the multiples of 125 increased by 1 and examining their divisibility by 8). Therefore, in11 the last three digits of the number 12 are 376. 11.C.6. i) The greatest common divisor of the moduli is 3, and 1^—1 (mod 3), so the system has no solution. ii) The condition for solvability of linear congruences, (8,12345678910111213) = 1, is clearly true, so this congruence has a unique solution. 837 CHAPTER 11. NUMBER THEORY ill) The moduli are coprime, so by the Chinese remainder theorem, there is a unique solution modulo 29 • 47. 11.C.8. We have 208 = 24 • 13 and (3446, 208) = 2 | 8642. Therefore, the congruence has two solutions modulo 208 and it is equivalent to the system 3x = 1 (mod 8), x = 10 (mod 13). The solutions of this system are x = 75 and x = 179 (mod 208). 11.C.9. Since 2 is a primitive root modulo both 5 and 13, we get that 2" = 3 (mod 5) 2" = 23 (mod 5) n = 3 (mod 4) and 2" = 3 (mod 13) 2" = 24 (mod 13) n = 4 (mod 12). This apparently implies the infinitude of the multiples of both 5 and 13 among the integers 2" — 3 in question. On the other hand, we can see that none of them can be a multiple of 5 and 13 simultaneously since the system of congruences n = 3 (mod 4), n = 4 (mod 12) has no solution. 11.C.15. Since the modulus can be written as 105 = 3-5-7, where the factors are pairwise coprime, the congruence in question is equivalent to the following system: x3 — 3x + 5 = 0 (mod 3), x3 — 3x + 5 = 0 (mod 5), x3-3x + 5 = 0 (mod 7). Clearly, the first congruence is equivalent to x3 = 1 (mod 3), and that one is equivalent to x = 1 (mod 3) as it follows from Fermat's little theorem that x3 = x (mod 3) holds for all integers x. The second congruence is equivalenttox(x2—3) = 0 (mod 5), which is satisfied iff x = 0 (mod 5) orx2 = 3 (mod 5). However, since 3 is a quadratic nonresidue modulo 5 (the Legendre symbol (3/5) is equal to —1), we get that x = 0 (mod 5) is the only solution of the second congruence of the system. The third congruence can be transformed to the form x3 — 3x — 2 = 0 (mod 7), which is satisfied iff x = — 1 (mod 7) or x = 2 (mod 7) (since the left-hand side factors as x3 — 3x — 2 = (x — 2)(x + l)2). Of course, this can also be found out by examining all possibilities modulo 7. Altogether, there are two solutions of the original congruence modulo 105: x = 55 and x = 100. 11.C.30. The modulus 473 factors to 11 • 43, thus we have to solve the system of two congruences. The first one leads to (x — 3)2 = 3 (mod 11) with two solutions x — 3 = ±5 (mod 11). The second one can be transformed to (x + 8)2 = 15 (mod 43). Now we 43 + 1 can proceed either noticing that 15 = 144 (mod 43) or calculating (using result of 11.C.38) ±(x + 8) = 15 4 =31 (mod 43), in both cases we get the result x = 4, 23 (mod 43). Now we combine the solutions of both congruences of the system and we obtain x = 152,195, 262, 305 (mod 473). 11.C.32. All of the results can be proved directly from the definition of the Jacobi symbol and the multiplicativity of the Legendre symbol. 11.C.34. In light of the previous exercise, both statements can be proved easily by mathematical induction. ll.G.l. Let n = abc, where a ± 0. Then 11 | n 11 | c + a - b c + a - b e {0,11}. We shouldhave 100a+110t'+': = a2 + b2 + c2 <^=> 100a + 106 + c = ll(a2 + b2 + c2). If c + a — b = 0, i.e. b = a + c, then 100a + 10(a + c) + c = ll(a2 + (a + c)2 + c2) 110a + 11c = ll(2a2 + 2ac + 2c2) 10a + c = 2a2 + 2ac + 2c2 2a2 + 2ac - 10a + 2c2 - c = 0 a2 + (c — 5)a + c2 — — = 0. The discriminant of this quadratic equation is —3c2 — 8c + 25 > 0 <^=> c e {0,1}. c = 0 a2 — 5a = 0 a = 5, b = 5. Thus the first solution is n = 550. c = 1 ==> a, b 0 N. 838 CHAPTER 11. NUMBER THEORY If c + a — b = 11, i.e. b = a + c — 11, then 100a + 10(a + c - 11) + c = ll(a2 + (a + c - ll)2 + c2) 110a + 11c- 110 = ll(2a2 + 2ac+ 2c2 - 22a - 22c + 121) 10a + c - 10 = 2a2 + 2ac + 2c2 - 22a - 22c + 121 2a2 + 2ac - 32a + 2c2 - 23c + 131 = 0 a2 + (c_l6)a + c2_|!c+^i=0. Now, the discriminant is -3c2 + 14c - 6 > 0 c e {1, 2, 3,4}. c=lVc=2Vc = 4 ==> a, b 0 N. c = 3 =>■ a2 — 13a + 40 = 0 =>■ a = 8, b = 0. The other solution is therefore n = 803. 11.G.2. Fermat's little theorem states that ap = a which together with the requirement ap = 1 gives the condition a = 1 (mod p) for the pairs (a,p). For a < 16 we get the following pairs: a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P P - 2 3 2 5 2,3 7 2 3 2,5 11 2,3 13 2,7 11.G.3. We prove by the induction that 5" = 5"+4 (mod 10000) for n > 4. For n = 4 we have 54 = 625 = 390625 = 5s (mod 10000). Induction step: 5k+1 = 5k+5 = 5k+4 • 5 = 5k • 5 = 5k+1 (mod 10000). For all a e N we therefore have: 54a = 625 (mod 10000), 54a+1 = 3125 (mod 10000), 54a+2 = 5625 (mod 10000) a 54a+3 = 8125 (mod 10000), and the last four digits of 5" form periodical sequence with period 4. 11.G.4. If n is an arbitrary natural number, then 22 =1 (mod 3), so it suffices to choose for k odd positive integers with k = 2 (mod 3). And there are surely infinitely many of them - they are those which satisfy k = 5 (mod 6). For these values of k, we always have that 22 + k is a multiple of 3 and greater than 3, so it is a composite number. 11.G.5. Let us fix an integer k e Z \ {1} and an arbitrary a e N. We will show that for an arbitrarily large a, we can find a positive integer n such that 22 + k is composite and greater than a. That will complete the proof. Let us fix s e No, h e Z so that k - 1 = 2s ■ h, 2 \ h, and m e N satisfying 22™ > a - k. Now, let an I satisfy I > s, £ > m. If the integer 22 + k is composite, then we are done, since 22 + k > 22 + > a. Therefore, let us assume that the integer 22 + k is a prime and denote it by p. With help of Euler's theorem, we can find an integer of the desired form which is a multiple of p. We have p- 1 = 22' + 2s ■ h = 2s ■ hi, where hi e N is odd. We thus have 2v(hl) = 1 (mod hi), whence 2s+v(hl) = 2s (mod p - 1), and since I > s, we also have 2*+v(hi)=2 2 , we also have 2 +fc>2 + k = p > a. We have thus found a composite number which is of the wanted form and greater than an arbitrarily large value of a. Let us mention that the case of k = 1 is a well-known open problem examining the infinitude of Fermat primes. 11.G.6. We can easily see that 2 | ai = 10 and 3 | ai = 48. Further, we can show that p \ aP-2 holds for any prime p > 3. By Fermat's theorem, we have 2P~1 = 3P_1 = 6P_1 = 1 (mod p). Therefore, 6ap-2 = 3-2p_1+2-3p_1+6p_1-6 = 3+2+1-6 = 0 (mod p). Let us remark that knowledge of algebra allows us to proceed more directly: for p > 3, we can consider the p-element field FP, which contains multiplicative inverses of the elements 2, 3, and 6 and their sum isi + | + | = l. 11.G.7. We could reason about the factorization of n to primes, which is a bit complicated. Instead, we will use a little trick. Suppose that there is an n satisfying the conditions n \ 2" — 1, n > 1, and let us select the least one. Surely, n is odd, hence n | 2^(n) - 1. Utilizing the result of exercise 11. A.6, we get that n | 2d — 1, where d = (n,ip(n)) (which especially implies that 2d — 1 > 1 andd > 1). Atthe same time, d < ip(n) < n and d | n, whence it follows that d | 2d — 1, which contradicts the assumption that our n is the least one that meets the conditions. 839 CHAPTER 11. NUMBER THEORY 11.G.8. Since 2P~1 = 1 (mod p), it suffices to choose appropriate multiples of p — 1 for n, i. e., to find a so that n = k(p — 1) would satisfy the condition n ■ 2™ = — 1 (mod p). However, thanks to p — 1 | n, this is equivalent to k = 1 (mod p), and there are clearly infinitely many such values k. 11.G.9. Analyze the Euclidean algorithm for computing the greatest common divisor. 11.G.10. The signature is 8103 = 83 (mod 143), which can be verified by 837 = 8 (mod 143). Finally, 42 is not the valid signature as 427 = 81 ^ 26 (mod 143). 840 CHAPTER 12 Algebraic structures The more abstraction, the more chaos? - no, it is often the other way round... A. Boolean algebras and lattices 12.A.1. Find the (complete) disjunctive normal form of the proposition (B1 =>C) A P V C) A B]'. Solution. If the propositional formula contains only a few variables (in our case, it is three), the most advanta-' 8eous Procedure is to build the truth table of the formula and build the disjunctive normal form from that. The table will consist of 23 = 8 rows. The examined formula is denoted p. In this chapter, we begin a seemingly very formal study. But the concepts reflect many properties of things and phenomena surrounding us. This is one of the parts of the book which is not in the prerequisites of any other chapter. Large parts serve as a quick illustration of interesting uses of mathematical tools and models. The simplest properties of real objects are used for encoding in terms of algebraic operations. Thus, "algebra" considers algorithmic manipulations with letters which usually correspond to computations or descriptions of processes. Strictly speaking, this chapter builds only on the first and sixth parts of chapter one, where abstract views on numbers and relations between objects are introduced. But it is a focal point for abstract versions of many concepts already met. The first two sections aim at direct generalizations of the familiar algebraic structure of numbers. This leads to a discussion of rings of polynomials. Only then we provide an introduction to group theory, for which there is only a single operation. The last two sections provide some glimpses of direct applications. The construction of (self-correcting) codes often used in data transfer is considered. The last section explains the elementary foundations of computer algebra. This includes solving polynomial equations and algorithmic methods for manipulation and calculations with formal expressions. 1. Posets and Boolean algebras Familiarity with the properties of addition and multiplication of scalars and matrices is assumed. Likewise, the binary operations of set intersection and union in elementary set theory, as indicated in the end of the first chapter. We proceed to work with symbols which stand for miscellaneous objects resulting in the universal applicability of the results. This allows the relating of the basic set operations, to propositional logic which formalizes methods for expressing propositions and evaluating truth values. 12.1.1. Algebraic operations. For any set M, there is a set K = 2M consisting of all subsets of M, together with the operations of union V : K x K —> K and intersection A : CHAPTER 12. ALGEBRAIC STRUCTURES A B c B => C [(A V C) A B]' 0 0 0 0 1 0 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 0 The resulting complete disjunctive normal form is the disjunction of the formula that correspond to the rows with one in the last column (the formula is true for the given valuation of the atomic propositions). The row corresponds to conjunction of the variables (if the corresponding value is 1) or their negations (if it is 0). In our case, it is the disjunction of conjunctions corresponding to the second, third, and sixth rows, i. e., the result is {A A B A C) V (A A B A C) V (A A B A C). We can also rewrite the formula by expanding the connective => with A and V, using the De Morgan laws and dis-tributivity: (B1 C) A [{A V C) A B]' <=h ^{B V C) A [(A V C)' V B'\ ^(BVQA [(A' AC) V B'] [(BVC) A (A AC')] V [(B V (7) A B'j <^ <^=> [(BAA' AC?) V (CAA'aC)] V [(BAB') V (CAB')] <^=> (B A A A C) V (C A B') , which is an (incomplete) disjunctive normal form of the given formula. Clearly, it is equivalent to our result above (the word "complete" means that each disjunct (called clause in this context) contains each of the three variables or their negations (these are called literals). □ 12A.2. Find a disjunctive normal form of the formula ((A A B) V C)' A (A V (B A C A D)) O We know several logical connectives: A, V, =>, = and the unary'. Any prepositional formula with these connectives can be equivalently written using only some of them, for instance V and'. There are also connectives which alone suffice to express any propositional formula. From binary connectives, these are NAND and NOR {A NAND B = (A A B)1, K x K —> K. This is an instance of an algebraic structure on the set K with two binary operations. In general, write (K, V, A). In the special case of sets, these binary operations are denoted rather by U and n, respectively. To every set A e K, its complement A' = K \ A can be assigned. This is another operation ' : K —> K with only one argument. Such operations are called unary operations. In general, there are algebraic structures with k operations pi,..., Hk, each of them ij -times fij : K x ■ ■ ■ x K —> K with ij arguments, and write (K, pi,..., p^) for such a structure. The number ij of arguments is called the parity of the operation ("unary", "binary", etc.). If ij = 0, then the operation has no arguments which means it is a distinguished element in K. With subsets in K = 2M, there is the unique "greatest object", i.e. the entire set M, which is neutral for the A operation. Similarly, the empty set 0 G K is the only neutral element for V. Notice that if M is empty, then K contains the only element 0. 12.1.2. Set algebra. View the algebraic structure on the set K = 2M from the previous paragraph as (K, V, A,', 1,0), with two binary operations, one unary operation (the complement), and two special elements 1 = M, 0 = 0. It is easily verified that all elements A, B, C e K satisfy the following properties: Axioms of Boolean algebras 4^, A a (B a C) = (A a B) a C, (2) A V (B V C) = (A V B) V C, (3) AAB = BAA, AVB = BVA, (4) A a (BVC) = (AAB) V(AAC), (5) AV(BAC) = (AV B) a(AvC), (6) there is a 0 e K such that A V 0 = A, (7) there is a 1 e K such that A a 1 = A, (8) AaA = 0, A\/A' = l. Compare these properties with those of the scalars (K, +, •, 0,1): Properties (1) and (2) say that both the operations A and V are associative. E'tiAit: Property (3) says that both operations are also commutative. So far, this is the same as for the addition and multiplication of scalars. Also there are neutral elements for both operations there. However, the properties (4) and (5) are stronger now: they require the distributivity of A over V as well as V over A. Of course, this cannot be the case for addition and multiplication of numbers. In the case of numbers, multiplication distributes over addition but not vice versa. 842 CHAPTER 12. ALGEBRAIC STRUCTURES ANORB = (AVB)'). Try to express each of the known connectives using only NAND, and then only NOR. These connectives are implemented in electric circuits as the so-called "gates". 12A.3. Express the propositional formula (A => B) using only the NAND gates. O 12.A.4. Write down a logic table for the Boolean proposition ((A ab)v C)'. Solution. Using de Morgan's and distributive laws we express ((A ab)v C)' = (A a B)' a G = A' a G v B a G. Setting 1 for the value True and 0 for False for A,B,C we obtain the table for = aA(bVd'VcA(aVdV c')) A b. □ 12.A.7. Draw a switching circuit representing Boolean proposition (a A ((b A c') V (&' A c))) V (a A & A c) Solution, see fig??? □ 12.A.8. Verify equivalence of the following two Boolean propositions (aVi)A(cVrf) and (a A c) V (a A d) V (b A c) V (b A d) Solution. Distributive laws yield 12.A.9. Determine whether the following two Boolean propositions ((a' A b) V (a A c'))' and (a V &') A (a V c') are equivalent? Solution, setting a = l,b = l,c = 0 implies ((a' A b) V (a a c'))' = 0 and (a V &') A (a V c') = 1. Hence, these propositions are not equivalent. □ 12.A.10. Determine whether the following two Boolean propositions (a A & A c) V (a V c)' and (dAc)V(a'Ac') are equivalent? Solution, setting a = l,b = 0,c=l implies (a A b A c) V (a v c)' = 0 and (a V &') A (a A c) V (a' A c') = 1. Hence, these propositions are not equivalent. □ Further properties of Boolean algebras Proposition. In every Boolean algebra (K, A, V); (1) A AO = 0, 4V1 = 1, (2) A A (A VB) = A, Av(AAB) = A, (3) A A A = A, AV A = A, (4) (A A By = A'vB, (A V B)' = A' /\ B, (5) (AJ = A. for all A, B e K. Proof. By the principle of duality, it suffices to prove only one of the claims in each item. Begin with (3), of course using just the axioms of the Boolean algebras A = A A 1 = A A (A V A') = (A A A) V 0 = A A A. Now, (1) is proved easily: A A 0 = A A (A A A') = (A A A) A A' = A A A' = 0. (2) is also easy (read the second equality from right to left): A A (A V B) = (A V 0) A (A V B) = = Av(0Ay3)=4v0 = A In order to prove the De Morgan laws, it suffices to verify that A' V B has the properties of the complement of A A B. By the above, it must be the complement. Using (1), compute = ((A A B) A A1) V((AaB)aB) = (0 A B) V (A A 0) = 0. (A4AB) V(A4'vB) = (A4v(A4'vB)) A(Bv(A4'vB)) = (1VB) a (IV A') = 1. Finally, from the definition, A' a A = 0 and A' V A = 1. Hence, A has the required properties of the complement of A', which means that A=(A')'. □ 12.1.4. Examples of Boolean algebras. The intersection /ggj; and union on all subsets in a given set M always '^Tt^'^ define a Boolean algebra. The smallest is the set '''I^Tr'' - u" surjsets of a singleton M. It contains two 3*4?=*— elements, namely 0 = 0 and 1 = M with the obvious equalities 0Al = 0, 0Vl = l, etc. The operations A and V are the same as multiplication and addition in the remainder class ring Z2 of even and odd numbers. This is called the Boolean algebra Z2. This is the only case when a Boolean algebra is a field of scalars at the same time! As in the case of rings of scalars or vector spaces, the algebraic structure of a Boolean algebra can be extended to all spaces of functions whose codomain is a Boolean algebra. For the set S = {/ : M —> K} of all functions from a set M to a Boolean algebra (K, a, V), the necessary operations and the distinguished elements 0 and 1 on S can be defined (aV&)A(cVd) = ((aVb)Ac)V((aVb)Ad) = aAcV&AcVaAdV&A^ A B) A (A' V B) □ Similarly, 844 CHAPTER 12. ALGEBRAIC STRUCTURES 12.A.11. Show that the negation a' can not be obtained by a combinatorial circuit that consists of A and V operations only. Solution. We prove the claim by the induction on the number of operations n in the circuit. For n = 1 it is clear that neither proposition b A c or b V c is equivalent to a'. Suppose a' can not be achieved by a combinatorial circuit, containing V and A. Consider a circuit with n + 1 operations and the first occurrence of the input of a. It can be either A or V. If it is A with the second argument either a or 1, then the outcome of it is the same as input a and this operation A can be deleted from the circuit with input a going to the next available operation. If the second argument is 0, then a A 0 = 0 and operation can again be deleted with 0 going as input to the next available operation. If a enters the circuit at the V, and the second argument of this V is either a or 0, then the V can be deleted with a entering the next operation. If the second argument equals 1, then the result of V is 1, and it can be entered to the next available operation. □ 12.A.12. Simplify the formula ((A A B) V (A => B)) A ((B ^(T)V(BaC)). Solution. In Boolean algebra, we obtain (a ■ b + a' + b) ■ (b + c + b ■ d) = ■ ■ ■ = a' ■ c + b. This means that the given formula is equivalent to (A' A C) V B. □ 12.A.13. Design a logic proposition in three boolean that takes value True if and only if the majority of variables also have True values. Solution. For this purpose, a dijunsctive normal form is quite suitable. Consider the proposition: a Ab Ac' V a Ab' AcV a' Ab AcV a Ab Ac. □ 12.A.14. Let A be a free boolean algebra, generated by countably many generators a±,... ,an,... with operations A, V and ' and elements being finitely many of these generators, linked by the operations. Prove that these algebra is atomless. Solution. Suppose / is an atom in A. Without loss of generality / can be expressed in terms of the first n generators ai,...,o„. Then f Aan+1 is neither / nor 0, so / can not be an atom in A. □ as functions of an argument ieMas follows: (fi/\f2)(x) = (Mx))A(f2(x))eK, UiM f2){x) = {h{x))M {j2{x)) e K, (\)(x) = ieK, (0)(x) = o e K, (f)'(x) = (f(x)yeK. It is easy and straightforward to verify that these new operations define a Boolean algebra. Recall that the subsets of a given set M can be viewed as mappings M —> Z2 (the elements of the subset in question are mapped to 1 while all others go to 0). Then, the union and intersection can be defined in the above manner — for instance, evaluating the expression (A V B)(x) for a point x e M, This determines whether it lies in A or whether it lies in B, and whether the join of the results is in Z2. The result is 1 if and only if x lies in the union. 12.1.5. Propositional logic. The latter simple observation brings us close to the calculus of elementary logic. View the notations for operations in a Boolean algebra as creating "words" from the elements A, B,... e K, the operations V, A,' and parentheses, which clarify the desired precedence of the operations. The axioms of the Boolean algebras and their corollaries say how different words may produce the same result in K. This is clear in the case of K = 2M, the set of all subsets of a given set; it is just equality of subsets. Now, another interpretation is mentioned in terms of operations in formal logic. Work with words as above but view them as propositions composed from elementary (atomic) propositions A,B,... and the logical operations AND (the binary operation A), OR (the binary operation V), and the negation NOT (the unary operation ')■ These words are called propositions. They are assigned a truth value depending on the truth values of the individual atomic propositions. The truth value is an element of the trivial Boolean algebra Z2, i.e. either 0 or 1. The truth value of a proposition is completely determined by assigning the truth values for the simplest propositions AA B, A\J B and A'. A A B is denned to be true if and only if both A and B are true. A V B is false if and only if both A and B are false. The value of A' is complementary to A. A proposition with n elementary propositions defines a function (Zy™ —> Z2. Two propositions are called logically equivalent if and only if they define the same function. In the previous paragraph, it is already verified that the set of all classes of logically equivalent propositions has the structure of a Boolean algebra. Propositional logic satisfies everything proved for general Boolean algebras. Next, we consider how other usual simple propositions of propositional logic are represented as elements of the Boolean algebra. Expressions always correspond to a class of logically equivalent propositions: 845 CHAPTER 12. ALGEBRAIC STRUCTURES 12.A.15. Prove monotonicity laws for Boolean algebras: if A < A and B < B then i) A V B < Ä V B; ii) A A B < A A B; iii) (I)' < A' Solution. Since (A V B) V (I V B) = (A V A) V (BB) = Ä V B, it implies the first assertion. The second follows from the duality the fact that A < A if and only if A A A = A. Application of de Morgan's law (Ä)'VÄ = (AAA') =A' verifies the third. □ 12.A.16. For Boolean algebras prove i) A < B if and only if B of all divisors q of p. For two such divisors q, r, define q A r to be the greatest common divisor of q and r, and q V r is defined to be their least common multiple (cf. the 846 CHAPTER 12. ALGEBRAIC STRUCTURES i) If A = B then A < B ans B < A, and therefore, by previous exercise, A a B = 0 and B a A' = 0, thus A A £ = 0. On the other hand, A /\ B = 0 implies that both AM? = 0 and BAA' = 0, which yield A < B ans B < A, thus A = B. ii) Expanding (Busing distributivity and de Morgan's laws, we obtain (B /\ C)' = (B V C) a (C V B) = B a C V B ac". Further expanding A /\(B /\ (7) we get AA(BA°) = AA{BACVB AC)VA'a{BACvB AC) = AaBaC'VAaB AC" V A'a BAG" VA'aB' AC, which equals (A /\ B) /\ C by obvious symmetry. iii) Follows from AA(B/\C) = AA(BAC'VCAB) = AaBaCvAaCaB and (AAB) /\(Aac) = AABA(A'VC)VAACA(A'VB') □ 12.A.18. The Sheffer stroke is defined as: A|B = A' AB'. The Pierce arrow can be defined as: A t B = A' V B. Prove that | and t define A V B, A A B and A'. Solution. It is clear from de Morgan's laws, in particular, that previous chapter for the definitions and context). The distinguished element 1 G Dp is defined to be p itself. The neutral element 0 for join on Dp is the integer 1 G N. The unary operation ' is defined using division as q' = p/q. Proposition. The set Dp together with the above operations A, V, and' is a Boolean algebra if and only if the factorization ofp contains no squares (i.e., in the unique factorization p = qi... qn, the prime numbers qi are pairwise distinct). Proof. It is easy to verify the axioms of Boolean algebras under the assumptions of the proposition. It might be interesting to see where the assumption squarefree is needed. The greatest common divisor of a finite number of integers is independent of their order. This also holds for the least common multiple. This corresponds to the axioms (1) and (2) in 12.1.2. The commutativity (3) is clear. For any three elements a, b, c, write their factorizations without loss of generality as a = g™1 ... g™s, b = q™1... q™s, c = g^1... g^s. Zero powers are allowed and all qj are pairwise coprime. Thus, aAbe Dp corresponds to the element in which each qi occurs with the power that is the minimum of the powers in a and b. This holds analogously ^^^(^'aA^and^paximum. The distributivity laws (4) and (5) of 12.1.2 now follow easily. There is no problem with the existence of the distinguished elements 0 a 1. These are already defined directly and clearly satisfy the axioms (6) and (7). However, if there are squares in the factorizations, then this prevents the existence of complements. For instance, in Dvi = {1, 2,3,4,6,12}, 6 a 6' = 1 cannot be reached since 6 has a non-trivial divisor which is common to all other elements of Dvi except for 1, but 6Vl = 6^12. (The number 1 is the potential smallest element in B12; it plays the role of 0 from the axioms.) Nevertheless, if there are no squares in the factorization of the integer p, then the complement can be defined as q' = p/q, and it can be verified easily that this definition satisfies the axiom 12.1.2(8). □ A' = A f A, A V B = (A f -4) t (B t B). Since A a B = (A' V B)', we obtain that t generates all Boolean operations. Analogously, A' = A|A and A a B = A'|B = (A|A)|(B|B). De Morgan's law A V B = (A'a B')'proves that I also generates all three Boolean operations. □ 12.A.19. The exclusive or binary operation in Boolean algebra is defined as: A®B = (AVB)a(A'VB'). Prove that ffi and a do not generate all Boolean operations. Solution. If we only apply ffi and a operations, from the inputs with all values False we'll be getting outputs with the same False value. Thus, A' can not be obtained using only ffi and a out of a False proposition A. □ If there are no squares in the decomposition of p, then the number of all divisors is a power of 2. This suggests that these Boolean algebras are very similar to the set algebras we started with. We return to the classification of all finite Boolean algebras. Before that, we consider structures like the divisors above for general p. 12.1.8. Partial order. There is a much more fundamental concept, the partial order. See the end of chapter 1. Recall that the definition of a partial order is a reflexive, antisymmetric, and transitive ■Mv^t^-relation < on a set K. A set with partial order (K, <) is called & partially ordered set, or poset for short. The adjective "partial" means that in general, this relation does not say whether a < b or b < a for every two different elements a, b G K If it does for each pair, it is called a linear order or a total order. 847 CHAPTER 12. ALGEBRAIC STRUCTURES 12.A.20. Translate the following sentence as a logical proposition considering ambiguity of English language: Either school X improves its performance and continues to perform well or school Y wins academic competition. Solution. Modern English is sometimes vague about the meaning of or. Therefore, the sentence can be translated using V or ffi in place of the or. Let A, B, C represent propositions School X improves its performance, School X continues to perform well, and School Y wins academic competition. Then the proposition of the sentence can be interpreted as i) (AaB)vC; ii) (AAB)ffiC; This two propositions differ when all three propositions A, B, C have True value. □ 12.A.21. Let logical implication -> mean A -> B = A' \/B. Prove the equivalence of the following logical propositions i) (A' -> (B -> C)) = (B -> (A V C)); ii) (A -> B)' a (A v B)' = 0, that is, it is always False. iii) (A v B) a (B' v (7) -> (B v (7) = 1 that is, it is always True 12.A.22. Show that the following dvisibilty relations are partial orders on X. Is any one of them a linear order? i) X = N. Relation | is denned as: m\n if m divides n. ii) Let A be a set of all integer divisors of 36. Solution. i) As n|n for each ti, the relation is reflexive. IF m\n and 7i|m, then m = n. Since r\m and m\n implies r\n, it is also transitive. The order is not linear, since, for example, 4 and 5 are not divisors of each other. ii) X = {1,2,3,4,6, 9,12,18,36}. By previous item, | is reflexive, antisymmetric and transityive. However, divisors 2 and 3 are not related. (DRAW HASSE DIAGRAM!) □ 12.A.23. Given a partial order < on a set A, one can also define a corresponding strict order <, that is also useful in various situations. A relation R on A is called irreflexive if (x, x) g R for all x e A. R is called asymmetric if with every (x,y) e R, (y,x) g R. Relation R on A is a strict order if R is a partial order and, in addition, it is irreflexive and asymmetric. Let A be a set There is always a partial order on the set K = 2M of all subsets of a given set M - the inclusion. In terms of intersections or joins, the inclusion can be denned as A C B if and only if A A B = A, or equivalently, A C B if and only if A V B = B. In general, each Boolean algebra is a very special poset: Lemma. Let (K, A, V) be a Boolean algebra. Then the relation < defined by A < B if and only if A A B = A is a partial order. Moreover, for all A,B,C G K: (1) AAB < A, (2) A < A V B, (3) if A in the above, the definitions of an upper bound and of the least upper bound (or supremum) of a subset L are obtained. If the suprema and infima exist for all couples A, B, they define the binary operations V and A, respectively. Lattices Definition. A lattice is a poset (K, <) where every two-element set {^4, B] has a supremum A V B and an infimum AAB. The poset (K, <) is said to be a complete lattice if and only if every subset of K has a supremum and an infimum. The binary operations A and V on a lattice (K, <) are clearly commutative and associative (prove this in detail!). The latter properties (of associativity and commutativity) ensure that all finite non-empty subsets in K possess infima and suprema. Note that any element of a lattice K is an upper bound for the empty set. Thus in a complete lattice, the supremum of the empty set is the least element 0 of K. Similarly, the infimum of the empty set is the greatest element 1 of K. Of course, a finite lattice (K, <) is always complete (with 1 being the supremum of all elements in K and 0 the infimum of all elements in K). Remark. The poset of real numbers, completed with the greatest and least elements ±oo, is a complete lattice (with the standard ordering). We may view it as a completion of the poset of rational numbers. The classical Dedekind cut construction of this completion of rational numbers extends 849 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. Let x' and z denote complements of a; £ A Then (1) x1 = x1 A 1 = x' f\{xV z) = (x' As) V (x' A z) = 0 V (i' A z) = (a;' Az)V (a;' A z) = (iVi')Az lAz □ 12.A.28. Anne, Brenda, Kate, and Dana want to set out on a trip. Find out which of the girls will go if the following must hold: At least one of Brenda and Dana will go; at most one of Anne and Kate will go; at least one of Anne and Dana will go; at most one of Brenda and Kate will go; Brenda will not go unless Anne goes; and Kate will go if Dana goes. Solution. Transforming the problem to Boolean algebra, simplifying it, and transforming it back, we find out that either Anne and Brenda will go or Kate and Dana will go. □ 12.A.29. Solve the following problem by transforming it to Boolean algebra: Tom, Paul, Sam, Ralph, and Lucas ^xT^c are suspected of having committed a murder. It is sp certain that at the crime scene, there were: at least one of Tom and Ralph, at most one of Lucas and Paul, and at least one of Lucas and Tom. Sam could be there only if so was Ralph. However, if Sam was there, then so was Tom. Paul could never cooperate with Ralph, but Paul and Tom are an inseparable pair. Who committed the murder? Solution. Transforming into Boolean algebra, using the first letter of each name, we get (t + r)(l' + p')(l + t)(r + s')(s' + t)(p' + r')(pt + p't') and thanks to ar x, xx1 = 0, x + x1 1, we can rear- range the above to s'r'ptl' + s'rp't'l. Thus, the murder was committed either by Tom and Paul or by Ralph and Lucas. □ 12.A.30. A vote box for three voters is a box which processes three votes and outputs "yes" if and only if majority of the voters is for. Design this box using from switch circuits. Solution. to all posets. Indeed, the so called Dedekind-MacNeille com pletion provides the unique smallest complete lattice contain ing a given poset. We shall not go into any details here. A lattice is said to be distributive if and only the operations A and V satisfy the distributivity axioms (4) and (5) of subsection 12.1.2 on page 842. There are lattices which are not distributive; see the Hasse diagrams of two such simple lattices below (and check that in both cases x A (y V z) =^ {x A y) V (x A z)). add a simple example to the other column, . e.g. complete the graf K3.3 as Hasse diagram ■ of a poset. Now, Boolean algebras can be defined in terms of lattices: a Boolean algebra is a complete distributive lattice such that each element has its complement (i.e. the axiom 12.1.2(8) is satisfied). It is already verified that the latter requirement implies that complements are unique (see the ideas at the beginning of subsection 12.1.3), which means that the alternative definition of Boolean algebras is correct. During the discussion of divisors of any given integer p, distributive lattices Dp are encountered. These distributive lattices are Boolean algebras if and only if p is squarefree, see 12.1.7. 12.1.10. Homomorphisms. Dealing with mathematical structures, most information about ob-ZL-SvY/ jects can be obtained/understood from the homomorphisms. These are mappings which preserve the corresponding operations. The linear mappings between vector spaces, or continuous mappings on R™ or any metric spaces with the given topology of open neighbourhoods represent very good examples. This concept is particularly simple for posets: Poset homomorphisms Let (K, L is called a poset homomorphism (also order-preserving mapping, monotone mapping or isotone mapping) if for all A 4ŕ i Therefore, there are 219 partial orders on a given 4-element set. Note that the condition of existence of suprema and in-fima of any pair of elements in a lattice implies (by induction) the existence of them for any finite non-empty subset. In particular, this means that every non-empty finite lattice has a greatest element as well as a least element. Using this criterion, we can see that only the last two Hasse diagrams may be lattices. Indeed, they are lattices; the first one is even a Boolean algebra. □ 12.A.33. Find the number of partial orders on the set {1,2,3,4,5} such that there are exactly two pairs of incomparable elements. O 12.A.34. Draw the Hasse diagram of the lattice of all (positive) divisors 36. Is this lattice distributive? Is it a Boolean algebra? Solution. The lattice distributive (it does not contain a sub-lattice isomorphic to diamond or pentagon). A mapping / : (K, A, V) —> (L, A, V) is a homomorphism of Boolean algebras if and only if for all A, B e K (1) f(AAB) = f(A)Af(B), (2) f(A VB) = f(A) V f(B), (3) f(A) = f(Ay. Moreover, if / is bijective, it is an isomorphism of Boolean algebras. Similarly, lattice homomorphisms are denned as mappings which satisfy the properties (1) and (2). It is easily verified that if a homomorphism / is bijective, then /_1 is also a homomorphism. It is clear from the definition of the partial order on Boolean algebras or lattices that every homomorphism / : K -> L also satisfies f(A) < f(B) for all A,B e K such that A < B, i.e. it is in particular a poset homomorphism. In particular, /(0) = 0 and /(l) = 1. The converse of the above is generally not true, that is, it may happen that a poset homomorphism is not a lattice homomorphism. 12.1.11. Fixed-point theorems. Many practical problems €5U lead to discussion on the existence and proper-, '*^iry^*' ties of fixed points of a mapping / : K —> K iJ/'Afey on a set K, i.e. of elements x G K such that ' Viefe-- f(x) = x. The concepts of infima and suprema allows the derivation of very strong propositions of this type surprisingly easily. There follows here a classical theorem proved by Knaster and Tarski1: TARSKl'S FIXED-POINT THEOREM Theorem. Let (K, A, V) be a complete lattice and f : K —> K a poset homomorphism. Then, f has a fixed point, and the set of all fixed points of f, together with the restricted ordering from K, is again a complete lattice. Proof. Denote M = {x e K; x < f(x)}. Since K has a least element, M is non-empty. Since / is order-preserving, f(M) C M. Moreover, denote «i = supM. Then, for x e M, x < zi, which means that f(x) < f(zi). At the same time, x < f(x), hence f(zi) is an upper bound for M. Then z1 < f(zi), and also z1 e M, and hence f(zi) < z1. It follows that f(zi) = zi, so a fixed point is found. ^Knaster and Tarski proved this in the special case of the Boolean algebra of all subsets in a given set already in 1928, cf. Ann. Soc. Polon. Math. 6: 133-134. Much later in 1955, Tarski published the general result, cf. Pacific Journal of Mathematics. 5:2: 285-309. Alfred Tarski (1901-1983) was a renowned and influential Polish logician, mathematician and philosopher, who worked most of his active career in Berkeley, California. His elder colleague Bronislaw Knaster (1893-1980) was also a Polish mathematician. 851 CHAPTER 12. ALGEBRAIC STRUCTURES 2\ □ 12.A.35. Draw the Hasse diagram of the lattice of all (positive) divisors 30. Is this lattice distributive? Is it a Boolean algebra? Solution. This lattice is a Boolean algebra, and it has 8 elements. All finite Boolean algebras are of size 2™ for an appropriate n, and they are all isomorphic for a fixed n (see 12.1.16). This Boolean algebra is a "cube": its graph can be drawn as projection of a cube onto the plane. □ 12.A.36. Decide whether every lattice on a 3-element set is a chain, i. e., whether each pair of elements are necessary comparable. Solution. As we have noticed in exercise 12.A.32, every finite non-empty lattice must contain a greatest element and a least element. Each of these is thus comparable to any other, It is more difficult to verify the last statement of the theorem, namely that the set Z C K of all fixed points of / is a complete lattice. The greatest element z\ = max Z is found already. Analogously, using infimum and the property j(x) < x in the definition of M, we find the least element zq = min Z. Consider any non-empty set Q C Z and denote y = sup Q. This supremum need not lie in Z. However, as seen shortly, the set has a supremum in Z with respect to the partial order in K, restricted to Z. For that purpose, denote R = {x e K; y < x}. It is clear from the definitions that this set together with the partial order in K, restricted to R, is again a complete lattice, and that the restriction of / to R is again a poset homomorphism f^:R^R.By the above, has a least fixed point y. Of course, y e Z, and y is the supremum of the fixed set Q with respect to the inherited order on Z. Note that it is possible that y > y. Analogously, the infimum of any non-empty subset of Z can be found. Since the least and greatest elements are already found, the proof is finished. □ Remark. In the literature, one may find many variants of J.',, the fixed-point theorems, in various contexts. One of very useful variants is Kleene's recursion theo-I rem, which can be determined from the theorem just fi' 1 proved and formulated as follows: Consider a poset homomorphism / and a countable subset of K (using the notation of Tarski's fixed-point theorem), formed by the Kleene chain 0 Z2 can be denned in terms of the basic logical operations. Clearly all such functions form a Boolean algebra, since their values are in the Boolean algebra Z2. Similarly, there is the problem of deciding whether or not two systems of switches can have the same function, lust as for propositions, a system consisting of n switches corresponds to a function (Z2)n —> Z2. There are 22" such functions. A Boolean algebra can be naturally denned on these functions (again using the fact that the function values are in the Boolean algebra Z2). We summarize a few such questions: some basic questions Question 1: Are all finite Boolean algebras (K, A, V) defined on sets K with 2™ elements? Question 2: Can each function (Z2)n —> Z2 be the truth function of some logical expression built of n elementary propositions and the logical operators? Question 3: How to recognize whether two such expressions represent the same function? Question 4: Can each function (Z2)n —> Z2 be realized by some switch board with n switches? Question 5: How to recognize whether two switchboards represent the same function? All these questions are answered by finding the normal form of every element of a general Boolean algebra. This is achieved by writing it as the join of certain particularly simple elements. By comparing the normal forms of any pair of elements, it is easily determined whether or not they are the same. This helps to classify all finite Boolean algebras, giving the affirmative answer to question 1. 12.1.13. Atoms and normal forms. First, define the "simplest" elements of a Boolean algebra: Atoms in a Boolean algebra -< b [(sgn(a)-sgn(&) = 1A& > a)V(sgn(a) > sgn(&))] Let AT be a Boolean algebra. An element A e K, A ^ 0, is called an atom if and only if for all B e K, A A B = A or A AB = 0. In other words, A ^ 0 is an atom if and only if there are only two elements B such that B < A, namely B = 0 and B = A. _ Note that 0 is not considered an atom, just as the integer 1 is not considered a prime. Let us remark that infinite Boolean algebras may contain no atoms at all. 853 CHAPTER 12. ALGEBRAIC STRUCTURES □ 12.A.40. Give an example of an infinite chain which is a complete lattice. Solution. We can take the set of real numbers together with —oo, oo, where —oo is the least element (and thus the supre-mum of the empty set) and oo is the greatest element (and thus the infimum of the empty set). The lattice suprema and infima are thus denned in accordance with these concepts in the real numbers. Moreover, — oo is the infimum of subsets which are not bounded from below, and similarly oo is the supremum of subsets which are not bounded from above. □ 12.A.41. Decide whether the set of all convex subsets of R3 is a lattice (with respect to suitably denned operations of suprema and infima). If so, is this lattice complete, distributive? Solution. It is a lattice. The infimum is simply the intersection, since the intersection of convex subsets is again convex. The supremum is the convex hull of the union. It is clear that the lattice axioms are indeed satisfied for these operations (think this out!). The lattice is complete, since the above operations work for infinite subsets as well, and clearly, the lattice has both a least element (the empty set) and a greatest element (the entire space). However, the lattice is not distributive. For example, consider three unit balls B\, B2, B3 centered at [3,0,0], [-3, 0,0], [0,0,0], respectively. Then, Kx V (K2 A K3) =KX± (Kx V K3) A (K, V K2). □ 12.A.42. Decide whether the set of all vector subspaces of R3 is a lattice (with respect to suitably denned operations of suprema and infima). If so, is this lattice complete, distributive? The situation is very simple in the Boolean algebra of all subsets of a given finite set M. Clearly, the atoms are precisely the singletons A = {x}. For every subset B, either A A B = A (if x e B) or A A B = 0 (if x <£ B). The requirements fail whenever there is more than one element in A. Next, consider which elements are atoms in the Boolean algebra of functions of the switch boards with n switches Ai,..., An. It can be easily verified that there are 2™ atoms, which are of the form A"1 A- ■ ■ AA^p, where either A"' = A{ oxAT=A\. The infimum p A ip of functions p and ip is the function whose values are given by the products of the corresponding values in Z2. Therefore, p < ip if P takes the value 1 £ Z2 only on arguments where ip also has value 1. Hence in the Boolean algebra of truth-value functions, a function p is an atom if and only if p returns 1 e Z2 for exactly one of the 2™ possible choices of arguments. All these functions can be created in the above mentioned manner. Now, the promised theorem can be formulated. While this one is called the disjunctive normal form, there is also the opposite version with the suprema and infima interchanged (the conjunctive normal form). Disjunctive normal form Theorem. Each element B of a finite Boolean algebra (K, A, V) can be written as a supremum of atoms B = Ax V ■ ■ ■ V Ak. This expression is unique up to the order of the atoms. The proof takes several paragraphs, but the basic idea is quite simple: Consider all atoms Ax,A2,..., Ak in K which are less or equal to B. From the properties | of the order on K, (see 12.1.8(3)) it follows that Y = Ax V ■ ■ ■ V Ak < B. The main step of the proof is to verify that B AY1 = 0, which by 12.1.8(4) guarantees that B < Y. That proves the equality B = Y. 12.1.14. Three useful claims. We derive several technical properties of atoms, in order to complete the proof of the theorem on disjunctive normal form. We retain the notation of the previous subsection. Proposition. (1) If Y, Xx,..., Xe are atoms in K, then Y < Ii V ... V I( if and only if Y = Xi for some i = 1, ...,£. (2) For each 7 6 if, 7 / 0, there is an atom X G K such thatX < Y. (3) IfXx, • • •, Xr are precisely all the atoms ofK, then Y = 0 if and only ifY A Xi = Ofor all i = 1,..., r. then Proof. (1) If the inequality of the proposition holds, Y A (Xx V ■ ■ ■ V = Y. 854 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. This is a lattice, infima correspond to intersections and suprema to sums of vector spaces (it is easy to verify that these operations satisfy the lattice axioms). This lattice is complete (the operations work for infinite subsets as well, the least element is the zero-dimensional sub-space, and the greatest element is the entire space). However, it is not distributive (consider three lines in a plane). □ B. Rings 12.B.1. Decide whether the set R with the operations ffi, © form a ring, a commutative ring, an integral domain or a field: i) R = Z, a(Bb = a + b + 3, a © b = -3, ii) R = Z,a®b = a + b-3,aQb = a- b-l, iii) R = Z,a®b = a + b-\,aQb = a + b- a-b, iv) R = R. Clearly, it must satisfy p(0) = 0 and p(l) = 1. Since p respects addition, we must have for all positive integers n that p(n) = ip(l + 1 + ■ ■ ■ + 1) = np(l) = n and > 0. Thus, for any x,y e R such that x < y, we must have p(x) < p(y). Now, assume that ip is not the identity, i. e., there exists a z e R such that ip(z) ^ z. We can assume without loss of generality that ip(z) < z. Since Q is dense in R, there exists an r for which ip(z) < r < z. However, we know that p(r) = r, which means that r < z implies p(r) < • 2M by f(X) = f(Ax) U ■ ■ ■ U f(Ak) = {Ax,Ak}, as the union of the singletons A{ C M that occur in the expression. The uniqueness of the normal form implies that / is a bijection. It remains to show that it is a homomorphism of the Boolean algebras. Let X,Y e K. The normal form of their supremum contains exactly the atoms which occur in at least one of X, Y; while the infimum involves just those atoms which occur in both. This verifies that / preserves the operations A and V. As for the complements, note that an atom A occurs in the normal form of X if and only if X A A = 0. Hence / preserves complements, which finishes the proof. □ The classification of infinite Boolean algebras is far more complicated. It is not the case that each would be isomorphic to the Boolean algebra of all subsets of an appropriate set M. However, every Boolean algebra is isomorphic to a Boolean subalgebra of a Boolean algebra 2M for an appropriate set M. This result is known as Stone's representation theorem3. 2. Polynomial rings The operations of addition and multiplication are fun-W damental in the case of scalars as well as vec-ipS0M/ tors- There are other similar structures. Besides bAdfr^^- me integers Z, rational numbers Q and complex numbers C, there are polynomials over similar scalars K to be considered. 3The American mathematician Marshall Harvey Stone (1903 - 1989) proved this theorem in 1936 when dealing with the spectral theory of operators on Hilbert spaces, required for analysis and topology. Nowadays, it belongs to standard material in advanced textbooks. 856 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. Applying the Euclidean algorithm, we get 131 = 7-17+12, 17= 1 ■ 12 + 5, 12 = 2-5 + 2, 5 = 2-2 + 1. Therefore, 1 = 5- 2- 2 = 5-2(12- 2-5) = 5- 5- 2-12 = 5-(17-12)-2-12 = 5-17-7-12 = 5-17-7-(131-7-17) = 54-17-7-131. Theinverseof 17is54. Similarly, [18]"1 = 51 and [19]-1 = 69. □ 12.B.7. Find the inverse of [49]z253 in Z253 0 12.B.8. Find the inverse of [37]z20S inZ208. 0 12.B.9. Find the inverse of [57]z359 inZ359. 0 12.B.10. Find the inverse of [17]z40 in Z40. 0 C. Polynomial rings 12.C.1. Eisenstein's irreducibility criterion This criterion provides a sufficient condition for a polynomial over Z to be irreducible over Q (which is the same as to be irreducible over Z): Let j(x) = anxn + ajj-ia;™-1 + ■ ■ ■ + a\x + ag be a polynomial over Z and p be a prime such that • p divides a,j, j = 0,..., n — 1, • p does not divide an, • p2 does not divide a0. Then, j(x) is irreducible over Z (Q). Prove this criterion. o 12.C.2. Factorize over C and R the polynomial x4 + 2x3 + 3x2 + 2x + 1. Solution. This polynomial can be factorized either by looking for multiple roots or as a reciprocal equation: • Let us compute the greatest common divisor of the polynomial and its derivative 4x3 + Qx2 + Qx + 2, using the Euclidean algorithm. The greatest common divisor is given in any ring up to a multiple by a unit, and during the Euclidean algorithm, we may multiply the partial results by units of the ring. In the case of a polynomial Among others, the abstract algebraic theory can be in many aspects viewed as a straightforward generalization of divisibility properties of integers. 12.2.1. Rings and fields. Recall that the integers and all other scalars K have the following properties: Commutative rings and integral domains Definition. Let (M, +, ■) be an algebraic structure with two binary operations + and -. It is a commutative ring, if it satisfies • (a + 6) + c = a + (6 + c) for all a, b, c G M; • a + b = b + a for all a, b G M; • there is an element 0 such that for all a G M, 0+a = a; • for each a e M, there is the unique element —a G M such that a + (—a) = 0. • (a ■ 6) ■ c = a ■ (6 ■ c) for all a, b, c G M; • a ■ b = b ■ a for all a, b G M; • there is an element 1 such that for all a e M, 1 ■ a = a; • a ■ (b + c) = a ■ b + a ■ c for all a, 6, c G M. If the ring is such that c ■ d = 0 implies either c or d is zero, then it is called an integral domain. The first four properties define the algebraic structure of a commutative group (M, +). Groups are considered in more detail in the next part of this chapter. The last property in the list of ring axioms is called distributivity of multiplication over addition. There are similar axioms for Boolean algebras where each of the operations is distributive over the other. If the operation "■" is commutative for all elements, then the ring is called a commutative ring. Otherwise, the ring is called a non-commutative ring. In the sequel, rings are commutative unless otherwise stated. Traditionally, the operation "+" is called addition, and the operation "■" multiplication, even if they are not the standard operations on one of the known rings of numbers. In the literature, there are structures without the assumption of having the identity for multiplication. These are not discussed here, so it is always assumed that a ring has a multiplicative identity denoted by 1. The identity for addition is denoted by 0. Fields A non-trivial ring where all non-zero elements are invertible with respect to multiplication is called a division ring. If the multiplication is commutative, it is called a field. Typical examples of fields are the rational numbers Q, the real numbers R, and the complex numbers C. Furthermore, every remainder class set Zp is a commutative ring, while only Zp for prime p are also fields. Recall the useful example of a non-commutative ring, the set Matfc(K) of all k-by-k matrices over a ring K, k > 2. As can be checked for K = Z2 and k = 2, these rings are 857 CHAPTER 12. ALGEBRAIC STRUCTURES ring over a field of scalars, the units are exactly the nonzero scalars. We perform the multiplication in the way to avoid calculations with fractions as much as possible. 2x4 + Ax3 + 6x2 + Ax + 2 : 2x3 + 3x2 + 3x + 1 = x + 2x4 + 3x3 + 3x2 + x x3 + 3x2 + 3x + 2 x° + -X* + -x + - 2 2 2 3 2 3 3 -xl + -X + - 2 2 2 Further, we divide the polynomial 2x3 + 3x2 + 3x + 1 by the remainder |x2 + |a;+| (multiplied by the unit D 2x3 + 3x2 + 3x + 1 : x2 + x + 1 = 2x + 1 2x3 + 2x2 + 2x x2 + x + 1 The roots of the greatest common divisor of the original polynomial and its derivative are exactly the multiple roots of the original polynomial. In this case, the roots of x2 + x + 1 are — | ± iv/3/2, which are thus double roots of the original polynomial. The factorization over C is thus to root factors (this is always the case over C, as stated by the fundamental theorem of algebra): x4 + 2x3 + 3x2 + 2x + 1 = 1 V3 x H---i- 2 2 1 V3 X+2+l^2 The factorization over R can be obtained by multiplying the factors corresponding to pairs of complex-conjugated roots of the polynomial (verify that such a product must always result in a polynomial with real coefficients!): x4 + 2x3 + 3x2 + 2x + 1 = (x2 + x + l)2 . • Let us solve the equation x4 + 2x3 + 3x2 + 2x + 1 = 0. Dividing by x2 and substituting t = x + \, we get the equation t2 + It + 1 = 0 with double root — 1. Now, substituting this into the definition of t, we get the known equation x2 + x + 1 = 0, which was solved above. ^ Remark. Let us remark that the only irreducible polynomials over R are linear polynomials and quadratic polynomials with never an integral domain (see 2.1.5 on page 72 for the full argument). As an example of a division ring which is not a field, consider the ring of quaternions H. This is constructed as an extension of the complex numbers, by adding another imaginary unit./, i.e. H = C ffi jC ~ R4, just as the complex numbers are obtained from the reals. Another "new" element i j is usually denoted k. It follows from the construction that ij = —ji. This structure is a division ring. Think out the details as a not completely easy exercise! 12.2.2. Elementary properties of rings. The following ■ lemma collects properties which all seem ^2E-7kY/ to be obvious for rings of scalars. But the properties need proof to build an abstract theory: Lemma. In every commutative ring K, the following holds (1) 0- c= c-0 = 0forallceK, (2) -c= (-1) ■ c= c- {-I) for all ce K, (3) -(c ■ d) = (-c) -d = c- (-d) for all c,deK, (4) a ■ (b — c) = a ■ b — a ■ c, (5) the entire ring K collapses to the trivial set {0} = {1} if and only ifO = 1. Proof. All of the propositions are direct consequences of the definition axioms. In the first case, for any c, a c-a = c-(a-\-Q) = c- a-\-c-Q, and since 0 is the only element that is neutral with respect to addition, c • 0 = 0. In the second case, it suffices to compute 0 = c-0 = c-(l + (-1)) = c+c- (-1). This means that c ■ (—1) is the inverse of c, as desired. The following two propositions are direct consequences of the second proposition and the axioms. If the ring contains only one element, then 0 = 1. On the other hand, if 1 = 0, then for any c e K, necessarily c=l-c = 0- c = 0. □ 12.2.3. Polynomials over rings. The definition of a commutative ring uses precisely the properties that are expected for multiplication and addition. The concept of polynomials can now be extended. A polynomial is any expression that can be built from (known) constant elements of K and an (unknown) variable using finite number of additions and multiplications. Formally, polynomials are defined as follows:4 It is not by accident that the symbol K is used for the ring - you can imagine e.g. any of the number rings behind that. 858 CHAPTER 12. ALGEBRAIC STRUCTURES negative discriminant. This also follows from the reasonings in the above exercise. 12.C.3. Factorize the polynomial x5+3x3+3 to irreducible factors over i) Q, ii) Z7. Solution. i) By Eisenstein's criterion, the given polynomial is irreducible over Z and Q (we use the prime 3). ii) (x — l)2(a;3 + 2x2 — x + 3). Using Horner's scheme, for instance, we find the double root 1. When divided by the polynomial (x — l)2, we get (a;3 + 2a;2 — x + 3), which has no roots over Z7. Since it is only of degree 3, this means that it must be irreducible (if it were reducible, one of the factors would have to be linear, which means that the cubic polynomial (a;3 + 2a;2 — x + 3) would have a root). a 12.C.4. Factorize the polynomial x4 + 1 over • Z3, • C, • R. Solution. • (a;2 + a; + 2)(a;2 + 2a; + 2) • The roots are the fourth roots of — 1, which lie in the complex plane on the unit circle, and their arguments are tt/4, 7r/4 + 7r/2, 7t/4 + 7r, and tt/4 + 3ir/2 i. e., they are the numbers ±y^2/2 ± i\[2j2. Thus, the factorization is X — - — t- \ \ X —--h 2- \ \ X H--— t- \ \ X —--h ^ 2 2 ) \ 2 2 ) \ 2 2 ) \ 2 • Multiplying the root factors of complex-conjugated roots in the factorization over C, we get the factorization over R: (a;2- V2x + 1) (x2 + V2x + l).. D 12.C.5. Find a polynomial with rational coefficients of the lowest degree possible which has 20 \/2 as a root. Solution. P(x) = x2007—2. Let us show that there is no polynomial of lower degree with root 20 \/2: Let Q(x) be a nonzero polynomial of the lowest degree with root 20 \/2. Then, deg Q(x) < 2007. Let us divide P(x) by Q(x) with remainder: P(x) = Q(x)-D(x)+R(x), where D(x) is the quotient and R(x) is the remainder, and either deg R(x) < st Q(x) or R(x) = 0. Substituting the number 20\/2 into the last equation, we can see that 20\/2 is also a root of R(x). By the definition of Q(x), this means that R(x) must be the zero Polynomials Definition. Let K be a commutative ring. A polynomial over K is a finite expression k f(x) = YaiX^ i=0 where e K, i = 0,1,... ,k are the coefficients of the polynomial. If ak =^ 0, then by definition, f(x) has degree k, written deg f = k. The zero polynomial is not assigned a degree. Polynomials of degree zero (called constant polynomials) are exactly the non-zero elements of K. Polynomials f(x) and g(x) are equal if they have the same coefficients. The set of all polynomials over a ring K is denoted K[a;]. Every polynomial defines a mapping / : K —> K by substituting the argument c for the variable x and evaluating the resulting expression, i.e. /(c) = a0 + aic H-----h akck. Note that constant polynomials define constant mappings in this manner. A root of a polynomial f(x) is such an element c G K for which/(c) = 0 G K. It may happen that different polynomials define the same mapping. For instance, the polynomial a;2 + x G Z2 [x] defines the mapping which is constantly equal to zero. More generally, for every finite ring K = {a0,ai,...,afc}, the polynomial f(x) = (x — a0)(x — ai)... (a; — ak) defines the constant-zero mapping. Polynomials f(x) = J2i aixl and g(x) = J2i can be added and multiplied in a natural way (just think to introduce again the structure of a ring and invoke the expected ^dlstributivity of multiplication over addition): T/J+ 9){x) = («0 + M + (ai + h)x + ■ ■ ■ + (ak + bk)xk, (f ■ g)(x) = (aobo) + (aoh + a^x + ... + (aobr + aibr_i + arbif)xr + ■ ■ ■ + akb(Xk+e, where k > £ are the degrees of / and g, respectively. Zero coefficients are assumed everywhere where there is no coefficient in the original expression.5 This definition corresponds to the addition and multiplication of the function values of /, g : K —> K, by the properties of "coefficients" in the original ring K. It follows directly from the definition that the set of polynomials K[x] over a commutative ring K is again a commutative ring, where the multiplicative identity is the element 1 G K, perceived as a polynomial of degree zero. The additive identity is the zero polynomial. You should check all the axioms carefully! 5To avoid this formal hassle, a polynomial can be defined as an infinite expression Qike a formal power series over the ring in question) with the condition that only finitely many coefficients are non-zero. Concerning the order, zero polynomial is also atributed to have order —oo. 859 CHAPTER 12. ALGEBRAIC STRUCTURES polynomial, which means that Q(x) divides P(x). However, P(x)is irreducible (by Eisenstein's criterion for 2), so its only non-trivial divisor is itself (up to multiplication by a unit of the polynomial ring over Q, i. e. a non-zero rational constant). Thus, we have Q(x) = P(x) up to multiplication by a unit. For instance, the polynomial ^a;2007 — | also satisfies the stated conditions. However, if we require the polynomial to be monic (i. e., with leading coefficient 1), then the only solution is the mentioned polynomial P(x). □ Lemma. A polynomial ring over an integral domain is again an integral domain. Proof. The task is to show that K[x] can contain non-trivial divisors of zero only if they are in K. However, this is clear from the expression for polynomial multiplication. If f(x) and g(x) are polynomials of degree k and £ as above, then the coefficient at xk+e in the product f(x) ■ g(x) is the product cik ■ be, which is non-zero unless there are zero divisors in K. □ 12.C.6. Find all irreducible polynomials of degree at most 2 over Z3. Solution. By definition, all linear polynomials are irreducible. As for quadratic irreducible polynomials, the easiest way is to simply enumerate them all and leave out the reducible ones, i. e. those which are a product of two linear polynomials. The reducible polynomials are (x + l)2 = x2 + 2x + 1, (x + 2)2 = x2 + x + 1, (x + l)(x + 2) = x2 + 2, x2, x(x + 1) = x2 + x, x(x + 2) = x2 + 2x. (It suffices to consider monic polynomials, since the non-monic can be obtained by multiplication by 2.) The remaining quadratic polynomials over Z3 are irreducible; these are a;2 + 2a; + 2, a;2 + x + 2, x2 + 1. □ 12.C.7. Decide whether the following polynomial is irreducible over Z3; if not, factorize it: x4 + x3 + x + 2. Solution. Evaluating the polynomial at 0, 1, 2, we find that it has no root in Z3. This means that it is either irreducible or a product of two quadratic polynomials. Assume it is reducible. Then, we may assume without loss of generality that it is a product of two monic polynomials (the only other option is that it is a product of two polynomials with leading coefficients equal to 2 - then both can be multiplied by 2 in order to become monic). Thus, let us look for constants a, b, c,deZ3 so that x4 + x3 + x + 2 = (x2 + ax + b) (x2 + cx + d) = = x4 + (a + c)x3 + (ac+b + d)x2 + + (ad + bc)x + bd. 12.2.4. Multivariate polynomials. Some objects can be de-Q scribed using polynomials with more variables. -£jV F°r instance, consider a circle in the plane R2 5S=l|fg-j|z5 whose center is at S = (x0,y0) and whose radius is R. This circle can be defined by the equation (x - x0)2 + (y- y0)2 -R2 = 0. Rings of polynomials in variables x1,..., xr can be defined similarly as in the case of K[x]. Instead of the powers xh of a single variable x, consider the monomials and their formal linear combinations with coefficients ak!-kr G K. However, it is simpler, both formally and technically, to define them inductively by K[x1,...,xr] := (K^i,...,^-!])^]. For instance, K[x, y] = K[x] [y]. One can consider polynomials in the variable y over the ring K[x]. It can be shown (check this in detail!) that polynomials in variables x1,..., xr can be viewed, even with this definition, as expressions created from the variables x1,..., xn and the elements of the ring K with a finite number of (formal) addition and multiplication in a commutative ring. For example, the elements of K[x, y] are of the form / = an(x)yn + arl_i(a;)yrl-1 + ■ ■ ■ + a0(x) = = (amnxm + ■■■ + a0n)yn + ■■■ + (bp0xp + ■■■ + b00) = = coo + ci0a; + Qny + c20a;2 + cxlxy + c02y2 + .... To simplify the notation, we use the multi-index notation (as we did with real polynomials and partial derivates in infinitesimal analysis). 860 CHAPTER 12. ALGEBRAIC STRUCTURES Comparing the coefficients of individual power of x, we get the following system of four equations in four variables: Multi-indices a + c, ac + b + d, ad + be, 2 = bd. From the last equation, we get that one of the numbers b, d is equal to 1 and the other one to 2. Thanks to symmetry of the system in the pairs (a, b) and (c, d), we can choose b=l, d = 2. From the second equation, we get ac = 0, i. e., at least one of the numbers a, c is 0. From the first equation, we get that the other one is 1. From the third equation, we get 2a + c = 1, i. e., a = 0, c = 1. Altogether, x4 + x3 + x + 2 = (x2 + T)(x2 + x + 2). □ 12.C.8. For any odd prime p, find all roots of the polynomial P(x) = xp~2 + xp~3 + ■ ■ ■ + x + 2 over the field Zp. Solution. Considering the equality x"'1-l = {x-l){P{x)-l), we can see that all numbers of Zp, except 0 and 1, are roots of P(x) — 1, so they cannot be roots of P(x) + 1. Clearly, 0 is never a root of P(x), and 1 is always a root, which means that it is the only root.. □ 12.C.9. Factorize the polynomial p(x) = x2 + x + 1 in Z5[x] andZr[x]. Solution. Irreducible in Z5 [x]; p(x) = (x — 2) (x — 4) in Zr[x]. □ 12.C.10. Factorize the polynomial p(x) = x6— x4 —5x2—3 in C[x], R[x], Q[x], Z[x], Z5[x], Z7[x], knowing that it has a multiple root. Solution. Applying the Euclidean algorithm, we find out that the greatest common divisor of p and its derivative p' is x2+1. Dividing the polynomial p(x) twice by this factor, we get p(x) = (x2 + l)2(x2-3). Clearly, these factors are irreducible in the rings Q[x] and Z[x]. In C [x], we can always factorize a polynomial to linear factors. In this case, it suffices to factorize x2 + 1, which is A multi-index a of length r is an r-tuple of non-negative integers (qi, ..., ar). The integer \a\ = a\ + ■ ■ ■ + ar is called the size of the multi-index a. Monomials are written shortly as xa instead of x^x^2 ■ ■ ■ x"r. Polynomials in r variables can be symbolically expressed in a similar way as univariate polynomials: / = Y aaxa, g = y bpx13 e K[xx,... ,xr}. a| • • • ,xr-i) = (^T bltl3x^j xr+..., f + g = (a0(x1,... ,av_i) + b0(x1,... ,a;r_i)) + (ai(xi, ■ ■ ■ ,a;r_i) + b1(x1,..., a^-i))^ + .. 861 CHAPTER 12. ALGEBRAIC STRUCTURES easy: x2 + 1 = (x + i) (x — 1). The factor x2 — 3 is equal to (x - Vs)(x + Vs) even in R [x]. Thus, in C [x], we have p(a;) = (x + i)2(a; - i)2 (x - V^j (x + V^j , while in R[x], we have p(a;) = (x2 + l)2 (a; - V^) (x + V^) . In Z5 [a;], the polynomial x2 + 1 has roots ±2, and the polynomial x2 — 3 has no roots, which means that p(x) = (x-2)2(x + 2)2(x2-i). In Z7 [2], neither polynomial has a root, so the factorization to irreducible factors is identical to that in Q[x] and Z[x]. p(x) = (x2 + l)2(x2-3). □ 12.C.11. Knowing that the polynomial p = x6 +x5 +4x4 + 2x3 + 5x2 + x + 2 has multiple root x = i, factorize it to irreducible polynomials over C [x], R [x], Z2 [x], Z5 [x], and Z7 [a;]. Divide the polynomial g = a;2y2 + y2 + xy + x2y + 2y + 1 by the irreducible factors of p in R[x], and use the result to solve the system of polynomial equations p = g = 0 over C. Solution, p = (a;2+l)2(a;2+a;+2),inZ2: p = x(x+l)5,in Z5: p = (x-2)2(x+2)2(x2+x+2),inZT: p = (a;2+l)2(a;+ 4)2. For the second polynomial, we get g = (y2 +y)(x2 +x+ 2) - y2(x + 1) + 1 and g = (y2 + y)(x2 + 1) + y(x + 1) + 1. Thus, if x = a is a root of a;2 +a; + 2, i. e., a = — ^ ± ^Jv7^, then y = ■ If a; = /3 is a root of a;2 + 1, i. e., (3 = ±i, theny = -^ □ 12.C.12. Factorize the following polynomial to irreducible polynomials in R[x] and in C[x]: 4x5 - 8a;4 + 9a;3 - 7a;2 + 3a; - 1. O 12.C.13. Factorize the following polynomial to irreducible polynomials in R[x] and in C[x]: x5 + 3a;4 + 7a;3 + 9a;2 + 8a; + 4. O 12.C.14. Factorize a;4 - 4a;3 + 10a;2 - 12a; + 9 to irreducible polynomials in R [x] and in C [x]: O 12.C.15. Decide whether the following polynomial over Z3 is irreducible; if not, factorize it to irreducible polynomials: x5 +x2 + 2x + 1. o = |^E(afe^ + bk,7)(xi,. ■ ■ ,av_i)7J xr + ... + (^2(ao,-< + b0n)(xu ... ,av_i)7^ The proof for multiplication is similar (do it yourselves!). □ The definition or the above formulae for polynomials over general commutative rings yield the following corollary: Corollary. If a ring K is an integral domain, then the ring also an integral domain. Proof. Proceed by induction on the number r of variables.6 We know the result for the univariate polynomials already. In particular, the product of non-zero polynomials is again non-zero. If the proposition holds for r—1 variables, then the result follows from the inductive definition of the polynomials. □ Let us notice that each polynomial / G K[x1,..., xn] represents a mapping Ix ... xI->K symmetric in its n arguments. 12.2.5. Divisibility and irreducibility. The next goal is to understand how polynomials over a general integral domain can be expressed as products of simpler polynomials. In the case of univariate polynomials, this means finding the roots of a polyno- mial. Since multivariate polynomials can be denned inductively, it suffices to consider univariate polynomials over a general integral domain. This leads to a generalization of the concept of divisibility which forms the basis of elementary number theory in chapter eleven. Consider an integral domain K (for instance, the integers Z or the ring Zv for prime p). Divisibility in rings For a, c e K we say that a divides c in K if and only if there is some b eK such that a-b = c. This is written a\c. The divisors of 1 (the multiplicative identity), i.e. the invertible elements of K, are called units. The units of a commutative ring always form a commutative group in the sense used for the properties of addition in the definition of rings. This is called the group of units in K. The group of units inZis {—1,1}, while all nonzero elements in a field are units there. In an integral domain, the divisors are determined uniquely. If b = a ■ c and 6^0, then c is determined by the choice of a and b, since if b = ac = ad, then 0 = a ■ (c — c') Alternatively, proceed directly using multi-index formulae for the product, provided an appropriate ordering on monomials is defined, see the last part of this chapter. 862 CHAPTER 12. ALGEBRAIC STRUCTURES 12.C.16. Decide whether the following polynomial over Z3 is irreducible; if not, factorize it to irreducible polynomials: x4 + 2x3 + 2. O 12.C.17. Find all monic quadratic irreducible polynomials over Z5. Solution. We write out all monic quadratic polynomials over Z5 and exclude those which are not irreducible, i. e., have a root: x2 ± 2, x2 ± x + 2, x2 ± 2x - 2, x2 - x ± 1, x2 ± 2x - 1. □ D. Rings of multivariate polynomials 12.D.1. Find the remainder of the polynomial x3y + x + yz + yz4 with respect to the basis (x2y + z, y + z) and the orderings o. The multiplication and addition is defined as with polynomials assuming the standard behaviour of rational powers of x. Then, the only units in K are ±1, and all elements with a0 = 0 are reducible, but the expression x, for example, cannot be expressed as a product of irreducible elements. There are simply very few irreducible elements in K. 12.2.6. Euclidean division and roots of polynomials. The 11 fundamental tool for the discussion of divisibility, common divisors, etc. in the ring of integers Z is I the procedure of division with remainder, and the Eu-w clidean algorithm for the greatest common divisor. These procedures can be generalized. Consider univariate polynomials akxk H-----h a0 over a general integral domain K. akxk is called the leading monomial, while ak is the leading coefficient. Lemma (An algorithm for division with remainder). Let K be an integral domain and /, g G K[ai] polynomials, g 0. 863 CHAPTER 12. ALGEBRAIC STRUCTURES Figure i. V3(x3 + x2 — y2) / 20- \ 1.8- \ \ 16- \ \l4- \ \ \ / : \ I04" >v \o.2 - -1 -0.5 0 0.5 1 x Figure 2. V3(2x4 - 3x2y + y2 - 2y3 + y4) means that we substitute tx for y and express x in terms of t from the equation: x3 + x2 - t2x2 = x2(x + l- t) => x = t- l V x = 0. Then, y = t2(t — 1), or for x = 0, the only satisfying point on the curve is y = 0. The point [0,0] can be obtained by choosing t = 1 in the mentioned parametrization, so it suffices to consider only this parametrization. □ Then, there exists an a G K, a =f^ 0, and polynomials q and r such that af = qg + r, where either r = 0 or deg r < deg g. Moreover, ifK is a field or if the leading coefficient of g is one, and if the choice a = 1 is made, then the polynomials q and r are unique. Proof. If deg / < deg g or / = 0, then the choice a = 1, q = 0, r = /, satisfies all the conditions. If g is constant, set a = g, q = /, r = 0. Continue by induction on the degree of /. Suppose deg / > deg g > 0, and write / = a0 H-----h anxn, g = b0-\-----h bmxm. Either bmf - anxn-mg = 0 or deg(&m/ - anxn-mg) < deg /. In the former case, the proof is finished. In the latter case, it follows by the induction hypothesis that there exist a',q', r' satisfying a' (bmf - anxn-mg) = q'g + r' and either r' = 0 or deg r' < deg g. This means that a'bmf= (q' + a'anxn-m)g + r'. libm = 1 or K is a field, then the induction hypothesis can be used to choose a' = 1 and then q', r' are unique. In this case, bmf = {q' + anxn~m) g + r1. If K is a field, then this equation can be multiplied by &~1. Assume that there is another solution / = q\ g+r 1. Then, 0 = / — / = (q — qi)g + (r — r{) and either r = r\, or deg(r — r\) < deg g. In the former case, it follows that q = qi as well, since there are no zero divisors in K[x]. Let axs be the term of the highest degree in q — q± ^ 0 (it must exist). Then, its product with the term of the highest degree in g must be zero (since the term of the highest degree is just the product of the terms of the highest degrees and there are no other terms of this degree there). However, this means that a = 0. Since axs is the non-zero term with the highest degree, q — q± contains no non-zero monomials, so it equals zero. But then r = r1. □ The procedure for the Euclidean division can be used to discuss the roots of a polynomial. Consider a polynomial / G K[x], deg/ > 0, and divide it by the polynomial x — b, b G K. Since the leading coefficient is one, the algorithm produces a unique result. It follows that there are unique polynomials q and r which satisfy / = q(x — b) + r, where r = 0 or deg r = 0, i.e. r G K. This means that the value of the polynomial / at b G K equals f(b) = r. It follows that the element b G K is a root of the polynomial / if and only if (x — b)\f. Since division by a polynomial of degree one decreases the degree of the original polynomial by at least one, the following proposition is proved: Corollary. Every polynomial f G K[x] has at most deg / roots. In particular, polynomials over an infinite integral domain define the same mapping K —> K if and only if they are the same polynomial. 864 CHAPTER 12. ALGEBRAIC STRUCTURES We obtain more curves if we consider quotients of polynomials in the parametrization, i. e., / = g = Then, we talk about a rational parametrization. 12.D.6. Derive the parametrization of a circle using stereo-graphic projection (see the picture). Solution. Substituting the equation of the line y = f+1 into the equation of the circle, we get the equation , fx2 2x \ x 2 + (f-t+) = 1' with solution x = 0 or the parametric expression 2t _ t2 - 1 x ~ T+f2"' y ~ Y+t2' which does not include the point [0,1], however. □ Remark. Note that in this case, the inclusion of the real line gives only "almost all points" of the parametrized variety, since one of them (i. e., the point from which we project) is not reachable for any value of the parameter t. This is not our fault - it follows from different topological properties of the line and the circle that there exists no global parametrization. Remark. Since R is not an algebraically closed field, we have problems with existence of roots of polynomials. As a result, a mere perturbation of coefficients of the defining equation may drastically change the resulting variety. It is possible to work with complex polynomials C[x, y] and with the subsets they define in C2. We need not be scared by that; on the contrary, our originally real curves are contained in their "complexifications" (real polynomials are simply viewed as complex which happen to have real coefficients), and we just obtain richer tools for description of their properties (imaginary tangent lines, etc.). If two polynomials over an integral domain define the same mapping K —> K, then their difference has any element of K as a root. This means that if their difference is not the zero polynomial, then K has at most as many elements as the maximum of the degrees of the polynomials in question. 12.2.7. Multiple roots and derivatives. We shall now work over infinite integral domains K and so we may . identify the algebraic expressions for the poly-'+§j^ST nomials with the mappings. CJ2j#-— The differentiation of polynomials over real or complex numbers is an algebraic operation which make sense for all K and it still satisfies the Leibniz rule: Derivative of polynomials Consider polynomials fix) = a0 + aix + ■ ■ ■ + anxn, g(x) = b0 +bxx----Ybmxra be polynomials of orders n and m over an commutative ring K. The derivative ' : fix) i-> fix) = ai +a2x-\-----l-nanX71-1 respects the addition of polynomials and their multiplication by the elements in K. Moreover it satisfies the Leibniz rule (1) (f(x)g(x)y = f(x)g(x)+f(x)g'(x). While claim on the additive structure is obvious, let us check the Leibniz rule: m+n f(x) ■ g(x) = ^ ckxk, ck = ^ ciibj k=0 i+j=k and thus, expanding fix) ■ g(x) + fix) ■ g'(x) yields exactly the expression for the derivative of the product In particular, the derivative is not a homomorphism K[x] —> K[x] of the ring of polynomials, in view of (1). In much more general context, the homomorphisms of the additive structure of a ring satisfying the Leibniz rule are called derivations. For polynomial rings, we see inductively that the only derivative there is our operation '. Differentiation can be used for discussing multiple roots of polynomials. Consider a polynomial fix) e K[x] over infinite integral domain K, with root c e K of multiplicity k. Thus, in view of the division of polynomials discussed in the previous paragraph 12.2.6, f(x) = (x-c)kg(x), with the unique polynomial g, g(c) ^ 0. Differentiating fix) and applying the Leibniz rule we obtain f'(x) = k(x-c)k-1g(x) + (x-c)kg'(x) = ix-c)k-1(kgix) + ix-c)g'ix)). Clearly the polynomial h(x) = kg(x) + (x — c)g'(x) does not admit c as root, i.e. h(c) = kg(c) ^ 0. Thus we have arrived at the following very useful claim 865 CHAPTER 12. ALGEBRAIC STRUCTURES Moreover, we are missing "improper points". For instance, when parametrizing a circle, we can describe the missing points as the image of the only improper point of the real line, i. e. the point "at infinity". These problems can be best avoided by working in the so-called projective extension of the (real or complex) plane. The projective extension is advantageous to use in various problems, we will also use its application when denning the group operation on the points of an elliptic curve (see 1). 12.D.7. (The complex circle). Consider the sets of points XE = V(z2 + zl-e) C C2 for any e e R \ {0}. The corresponding real curves are p2 I circle with radius ^fe e > 0, e < 0. We will write Zj = Xj + iy.j = Xj + y—Therefore, Xe is given as a subset in R4 by a system of two real equations: Re(z2 + z\ - e) = x\ + x\ - y\ - y\ - e = 0, Im(z2 + z%-e) = 2{x1y1 + x2y2) = 0. Thus, we can assume that Xe will be a "two-dimensional surface" in R4. We will try to imagine it as a surface in R3 in a suitable projection R4 —> R3. For this purpose, we choose the mapping p+ : (x1,x2,y1,y2) i-> xux2, x\y2 - x2y1 VX1+X2 Denote by V the subset of R4 which is given by our second equation, i. e., V = {(x1,x2,y1,y2); x±y±+x2y2 = 0, (x1,x2) ^ (0,0)}. The restriction of (p+ to V is invertible, and its inverse is given by ip+ : (u, v, w) H> \/u2 + v2 ' \/u2 + V2 Now, note that xiy2 - x2yx , 22 Vi+V2, VXl + X2 and hence it follows that p+(VtlXe) = IT = {(u,v,w); u2+v2-w2- \e\ = 0}. Now, we can compose the constructed mappings ipe : Xe -4- V® > p+ » R3\ {(0,0,0)} D He, Proposition. A polynomial f(x) over an infinite integral domain K admits the root c£l of multiplicity k if and inly if c is the root of f'{x) of multiplicity k — 1. 12.2.8. The fundamental theorem of algebra. While it may happen that a polynomial over the real numbers has no roots, every polynomial over the complex numbers has a root. This is the statement of the so called fundamental theorem of algebra, which is presented here with an (almost) complete proof. By this result, every polynomial in C [x] has as many roots (including multiplicity) as its degree deg f = k. Hence it always admits a factorization of the form f(x) = b(x - ai) ■ (x - a2) ... (x - ak) for the complex roots a{ and an appropriate leading coefficient b. Theorem. The field C is algebraically closed, i.e. every polynomial of degree at least one has a root. Proof. We are going to provide an elementary proof based on simple real analysis, in particular the /,/, concept of continuity (the readers should be familiar with the techniques developed in Chapters 5 and 6). Suppose that / G C [z] is a non-zero polynomial with no root, i.e. f(z) ^ 0 for all z e C. Consider the mapping denned by Z i/»r ' into the unit circle , t £ R} C i Then p maps the entire ( By the assumption that f(z) is never zero, this mapping is well-defined. Next, we shall consider the restrictions of p to the individual circles Kr C C with center at zero and radius r > 0 We can parameterize these circles by the mappings ipr:R^Kr, t ipr(t) = r eu . For all r, the composition k : (0,oo) x R —> K\, n(r, t) = p o ipr(t), is continuous in both t and r. Thus, for each r, there exists a mapping ar : R —> R which 866 CHAPTER 12. ALGEBRAIC STRUCTURES and for every e > 0, we get a bijection pe : Xe —> He. The real part of this variety is the "thinnest circle" on the one-part rotational hyperboloid He; see the picture. For e < 0, we can repeat the above reasoning, merely interchanging x and y and the signs in the definition of p+: ip- : (x1,x2,yi,y2) i-> -yi,-y2, -IJ1X2 + V2Xl Vvi + ví which changes the inversion ip. / nil ifj+ : (u,v,w) h+ - \/u2 + v2 ' \/u2 + V2 ' Now, iře is again a one-part rotational hyperboloid, but its real part is X| = 0. In the complex case, we can observe that when continuously changing the coefficients, the resulting variety changes only a bit, except for certain "catastrophic" points, where a qualitative leap may occur. This is called the principle of permanence. In the real case, this principle does not hold at all. 12.D.8. The projective extension of the line and the plane. The real projective space Pi (R) is defined as the set of all directions in R2, i. e., its points are one-dimensional subspaces of R2. The complex projective space Pi (C) is defined as the set of all directions in C2, i. e., its points are one-dimensional subspaces of C2. is uniquely given by the conditions 0 < ar(0) < 2ir and n(r, t) = eJQr(*). Again, the obtained mapping ar continuously depends on r. Altogether, there is a continuous mapping a : R x (0,oo) (f,r) h+ ar(t). It follows from its construction that for every r, 2^(ar(27r) — ar(0)) = nr G Z. Since a is continuous in r, it means that nr is an integer constant which is independent of r. In order to complete the proof, it suffices to note that if / = ao + ■ ■ ■ + adzd and ad ^ 0, then for small values of r, ar behaves nearly as a constant mapping, while for large values of r, it behaves almost as if / = zd. First, calculate nr for / , then make this statement more precise. This completes the proof. t d The complex functions z i-> z , z i-> can be expressed easily using the trigonometric form of the complex numbers z = r(cos 6 + i sin6): zd = rd(cos d9 + i sin d9) = rd elde, = 1 (cos d9 + i sin d9) = e ide In this case, the mapping

v) a^& = a + & + a ■ 6, vi) a^& = a + b — a ■ b, vii) a^& = a+(-l)ab. o 12.E.6. In how many ways can we fill the following table so that ({a, b, c}, *) would be a i) groupoid iv) a monoid ii) commutative groupoid v) a group iii) a groupoid with a neutral element * a b c a c b a b b c Solution. i) 35 iv) 1 ii) 9 v) 0 iii) 9 □ It is easily verified that this definition is correct and that the resulting structure satisfies all field axioms. In particular, j is the additive identity, and \ is the multiplicative identity. If a 0, b 0, then f \ = \. All the details of the arguments are in fact identical with the discussion of rational numbers in 1.6.6. The field of fractions of a ring K[x±,... ,xr] is called the field of rational functions (of r variables) and denoted K(xi,... ,xr). In software systems like Maple or Mathemat-ica, all algebraic operations with polynomials are performed in the corresponding field of fractions, i.e. in the field of rational functions, usually using K = Q. 12.2.12. Completion of the proof. It remains to prove that if ^Ji,, a polynomial / = /1/2 is divisible by an irreducible polynomial h, then h divides either f\ or f2 or both. rV-pPt This statement is proved in the following three lemmas. Lemma. Let Kbe a unique factorization domain. Then: (1) If a, b, c £ K, a is irreducible and a\bc, then either a\b or a\c. (2) If a constant polynomial a G K[x] divides f G K[x], then a divides all coefficients of f. (3) If a is an irreducible constant polynomial in K[x] and a\f9< L 9 £ K[x], then a\f or a\g. Proof. (1) By the assumption, be = ad for a suitable d G K. Let d = di... dr, b = bi... bs, c = c\... cq be the factorizations to irreducible factors. This means that adi... dr = bi... bsci... cq. Since ad factors in a unique way, it follows that a = ebj or a = ec{ for a suitable unit e. (2) Let / = b0 + b\x + ■ ■ ■ + bnxn. Since a\f, there must exist a polynomial g = cq + c\x + ... ckxk such that / = ag. Hence it immediately follows that k = n, ac0 = bo, ■ ■ ■, acn = bn. (3) Consider /, g G K[x] as above and suppose that a divides neither / nor g. By the previous claim, there exists an i such that a does not divide b{, and there exists a j such that a does not divide cj. Choose the least such i and j. The coefficient at xl+i of the polynomial fg is b0ci+j + bici+j_i + ■ ■ ■ + bi+jC0. By choice, a divides all of b0ci+j,..., bi-!Cj+1, bi+1Cj-!,..., bi+jc0. At the same time, it does not divide b{Cj. Therefore, it cannot divide the coefficient. □ 12.2.13. Lemma. Consider the field of fractions L of a unique factorization domain K. If a polynomial f is irreducible in K[x], then it is irreducible in h[x], too. Proof. Each coefficient a G K can be considered as an element \ G L. Therefore, every non-zero polynomial / G K[x] can be considered a polynomial in h[x]. Suppose that / = g'h' for some g',h' G h[x], where the polynomials g', hi are not units in h[x] (i.e. they are not constant polynomials, since L is a field). Let a be a common 870 CHAPTER 12. ALGEBRAIC STRUCTURES 12.E.7. Find the number of groupoids on a given three-element set. Solution. Since the set is given, it remains to define the binary operation. In a groupoid, there is no restriction except that the result of the operation must be an element of the underlying set. Thus, for any pair of elements, there are three possibilities for the result. By the product rule, this gives 33'3 = 19683 groupoids. □ 12.E.8. Decide whether the set G = (R \ {0} x R) together with the operation A defined by (x, y)A(u, v) = (xu,xv+y) for all (x, y), (u, v) G G is a groupoid, semigroup, monoid, group, and whether A is commutative. O 12.E.9. Let fl be a set with the multiplication operation defined by ab = a for all a, b in fl Prove that such multiplication is associative. O Solution. For a, b, c G fl we have: i(bc) = ab = a = ac = (ab)c □ 12.E.10. Suppose that abc = e for some a, b, c in a group G. Show that bca = e as well. O Solution. Set x = be. Then ax = e i.e. x~x = a since x is invertible and so xa = (be) a = e. □ F. Groups We begin with recalling permutations and their properties. We have already met permutations in chapter two, see ??, where we used them to define the determinant of a matrix. 12.F.1. For each of the following conditions, find all permutations 7r G §7 which satisfy it: i) tt4 = (1,2,3,4,5,6,7) ii) 7T2 = (l,2,3)o(4,5,6) iii) tt2 = (1,2,3,4) O 12.F.2. Find the signature (parity) of each of the following permutations: A 2 3 4 5 6 ... 3ti-2 3ti - 1 3ti v2 3 1 5 6 4 ... 3n - 1 3n 3n - 2 '1 2 3 ... n n + 1 Ti + 2 ... 2ti ii) ,2 4 6 2ti 1 271-1 multiple of the denominators of the coefficients in g' and b be a common multiple of the denominators of coefficients in h'. Then, bh',ag' G K[x], and so abf = (bh')(ag'). Let c be an irreducible factor in the factorization of ab. Then, c divides (bh!) (ag1), and hence c divides bh' or ag' (by the previous lemma). This means that c can be canceled out. After a finite number of such cancellations, the conclusion is that / = gh for polynomials g,h G K[x]. Since the degrees of the polynomials are not changed, neither g nor h is constant. Thus if / is reducible in h[x], then it is also reducible in K[x], contradicting the implication to be proved. □ 12.2.14. Lemma. Let K be a unique factorization domain and f,g,h G K[x}. Suppose that f is irreducible and f\gh. Then, either f\g or f\h. Proof. This statement is already proved in one of the previous lemmas for the case that / is a constant polynomial (i.e. an element of K). Suppose that deg / > 0. Then / is irreducible in h[x] as well, where L is the field of fractions of the ring K. Suppose that K itself is a field (and as such equals its field of fractions). Moreover, suppose that f\gh and / does not divide g. The greatest common divisor of the polynomials g and / must be a constant polynomial in K. Therefore, there are A, B e K[x] such that 1 = Af + Bg. Hence, h = Afh + Bgh. Since f\gh, it follows that f\h as well. Return to the general case. It follows from the assumptions that /1 g or /1 h in the polynomial ring L [x] over the field of fractions L of the ring K. For instance, let h = kf in h[x], and choose an a G K so that ak G K[x]. Then, ah = akf and it must hold for every irreducible factor c of a that c\ak, because / is irreducible and not constant. It follows that c can be canceled. After a finite number of such cancellations, a becomes a unit, i.e. h = k'f for an appropriate k! G K[x]. □ The proof of this lemma completes the proof of theorem 12.2.10. 3. Groups As an illustration of the most abstract approach to an algebraic theory, concepts enjoying just one operation only are considered. The focus is on ob-'^t---L_ jects and situations where equations of the form a ■ x = b always have a unique solution (as usual with linear equations, the objects a and b are given, while x is what is sought for). This is group theory. Note that nothing is known about the "nature" of the objects, or even what the dot stands for. The only assumption is that two objects a and x are assigned an object a ■ x. In a previous part of this chapter, such operations are known as addition or multiplication in rings. The concepts and vocabulary concerning such operations are now extended. 'Among them, numbers and transformations of the plane and space, where such "group" objects are met. Then follows the foundations of a general theory. 871 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. The parity of a permutation corresponds to the number of transpositions from which it is built or, equiva-lently, to the number of its inversions, see 2.2.2. The number of inverses can be read easily from the two-row representation of the permutation. For each number of the second row, we count the number of numbers that are less and lie more to the right than the current number. Thus, the first permutation is even (the signature is 1), and in the second case, the signature depends on n and is equal to (—1) □ 12.3.1. Examples and concepts. Let A be a set. A binary operation on A is denned to be any mapping A x A —> A. The result of such an operation is often denoted (a, 6) H> a ■ b and called the product of a and b. A set together with a binary operation is called a groupoid or a magma. Further assumed properties of the operations are needed in order to be able to say something interesting, Binary operations and semigroups (a) a in the symmetric 12.F.6. Find a'1 and a2013, where '1 2 3 4 5 6 7^ v4 5 7 6 1 2 3y group (S7,o). • (b) a = [4] H in the group (Z*, •). Solution. (a) a'1 = 44 = 3 (mod 1)1 ^2013 = 420i3 = 43 = 9 (modl) □ 12.F.7. Prove that every group whose number of elements is even contains a non-trivial (i. e., different from the identity) element which is self-inverse. Solution. Since each element which is not self-inverse can be paired with its inverse, we see that there are an even number of elements which are not-self inverse. Thus, there remain an even number of elements which are self-inverse, and one of them is the identity, so there must be at least one more such element. □ 12.F.8. Prove that there exists no non-commutative group of order 4. Solution. By Lagrange's theorem (see 12.3.10), the non-trivial elements of a 4-element group are of order 2 or 4. If there is an element of order 4, then the group is cyclic, and thus commutative. So the only remaining case is that there are (besides the identity e) three elements of order 2, call them a, b, c. We are going to show that we must have ab = c: It cannot be that ab = e, since the inverse of a is a itself, and not b. It cannot be that ab = a, since this would mean that b = e, and similarly, it cannot be that ab = b, since this would mean that a = e. Therefore, the only remaining possibility is that indeed ab = c, and it can be shown analogously that the product of any two non-trivial elements, regardless of the order, must be equal to the third one, so this group is commutative, too. Altogether, we have shown that there are exactly two groups of order 4, up to isomorphism. The latter is called the Klein group, and one instance of it is the group Z2 x Z2. □ 12.F.9. Show that there exists no non-commutative group of order 5. Solution. By Lagrange's theorem (see 12.3.10), the non-trivial elements of a 5-element group are of order 5, so the group must be cyclic, and thus commutative. □ Remark. The same argumentation show that each group of prime order must be cyclic, and thus commutative. In particular, there are neither 2-element nor 3-element non-commutative groups. As we have shown (see 12.F.8), there Similarly, in a monoid, an element x cannot have both a left inverse a and a different right inverse b, since if a ■ x = x ■ b = e, then also a = a ■ (x ■ b) = (a ■ x) ■ b = b. Note that associativity of the operation is needed here. It follows that if x has an inverse, then it is unique. It is usually denoted by a;-1. As an example, consider again the subtraction on integers. This operation is not associative. There is a right identity (zero), i.e. a — 0 = a for any integer a, but it is not a left identity. There is no left identity for subtraction. The integers are a semigroup with respect to either addition or multiplication. They form a group only with addition, since with respect to multiplication, only the integers ±1 have an inverse. If (A, ■) is a group, then any subset B C A which is closed with respect to the restriction of ■ (i.e. a ■ b e B for any a,b e B) and forms a group with this operation is called a subgroup. Both conditions are essential. For instance, consider the integers as a subset of the rational numbers and the multiplication there. Let G be a group and McG. The subgroup generated by M is the smallest (with respect to set inclusion) subgroup of G which contains all the elements of M. Clearly, this is the intersection of all subgroups containing M. Here are a few very well known examples of groups. The rational numbers Q are a commutative group with respect to addition. The integers are one of their subgroups. The non-zero rational num- bers are a commutative group. For every positive integer k, the set of all fc-th roots of unity, i.e. the set {z e C; zk = 1} is a finite commutative group with respect to multiplication of complex numbers. For k = 2, this is the two-element group {—1,1}, both of whose elements are self-inverse. For k = 4, this is the group G = {1, 1,-1,-l). The set Mat„, n > 1, of all square matrices of order n is a (non-commutative) monoid with respect to multiplication and a commutative group with respect to addition (see subsections 2.1.2-2.1.5). The set of all linear mappings Hom(VJ V) on a vector space is a monoid with respect to mapping composition and a commutative group with respect to addition (see subsection 2.3.12). In every monoid, the subset of all invertible elements forms a group. In the former of the above examples, it was the group of invertible matrices. In the latter case, it was the group of linear isomorphisms of the corresponding vector space. In previous chapters, there are several (semi)group structures, sometimes met quite unexpectedly. For example, recall various subgroups of the group of matrices or the group structure on elliptic curves. 873 CHAPTER 12. ALGEBRAIC STRUCTURES is even no 4-element non-commutative group. Therefore, the smallest non-commutative group may be of order 6. As we have seen (see 12.E.l(vii)), this is indeed the case. 12.F.10. Prove that any group G where each element is self-inverse must be commutative. Solution. Let a,b e G. Since each of ba, b, a is assumed to be self-inverse, we get ab = ab((ba)(ba)) = a(bb)aba = (aa)ba = ba. □ 12.F.11. Prove that every group G of order 6 is isomorphic to Z6 or §3. Solution. By Lagrange's theorem (see 12.3.10), the non-trivial elements of a 6-element group are of order 2, 3, or 6. If there is an element of order 6, then G is cyclic, and thus isomorphic to Z6. Therefore, assume from now on that the order of each non-trivial element is 2 or 3. Since an element a of order 3 is not self-inverse (we have a-1 = a2 since a ■ a2 = a3 = 1), we get from exercise 12.F.7 that there must be at least one element of order 2. As we are going to show, there must also be an element of order 3. For the sake of contradiction, assume that each element of G is self-inverse, and let a ^ b be any two elements different from the identity e. The same argumentation as in 12.F.8 shows that the product ab cannot be any of e, a, b. Thus, H = {e, a, b, ab} is a 4-element subset of G. Thanks to the self-inverseness, we can see that H is closed under the operation, with the possible exception of b ■ a, b ■ ab, and ab ■ a. However, we get from the above exercise that G is commutative, so that these 3 products also lie in H, and it follows that H is actually a subgroup of G. However, this contradicts theorem 12.3.10, by which a 6-element group cannot have a 4-element subgroup. The only remaining case is that there is an element of order 2 (call it a) as well as an element of order 3 (call it b). Then, b2 is also of order 3 (and different from b), so G contains the 4 elements e, a, b, b2. Furthermore, G must also contain ab, ba, ab2, b2a, and by uniqueness of inverses, none of these is equal to e. Moreover, none of these may be equal 12.3.2. Permutation groups. Groups and semigroups often ytaa arise as sets of mappings on a fixed set M, which are closed with respect to mapping com-/'r«§V^ position. Afer^%»A-J— This is easily seen on finite non-empty sets M, where every subset of invertible mappings generates a group with respect to composition. Such a set M consisting of m = \M\ e N elements allows for mm possible mappings (each of m elements can be sent to arbitrary element of M), and all of these mappings can be composed. Since mapping composition is associative, it is a semigroup. If a mapping a : M —> M is required to have an inverse a-1, then a must be a bijection. Composition of two bijections yields again a bijection; hence the set Em of all bi-jections on an m-element set M is a group. This is called the symmetric group (on m elements). It is an example of a finite group.8 The name of the group Em brings another connection: Instead of bijections on a finite set, permutations can be viewed as the rearranging of distinguished objects. Permutations are encountered in this sense when studying determinants, for example, see subsection 2.2.1 on page 81 for a few elementary results. Let us briefly recollect them now in view of the general concepts of groups and their homomorphisms. What the operation in this group looks like needs more thought. In the case of a (small) finite group, build a complete table of the operation results for all pairs of operands. Considering the group £3 on numbers {1,2, 3} and denoting the particular permutations by the ordering of the numbers (not to be confused with the notation for cycles!) a = (1,2,3), 6= (2,3,1), c= (3,1,2), d= (1,3,2), e=(3,2,l), f = (2,1,3), then the composition is given by the following table: a b c d e f a a b c d e f b b c a f d e c c a b e f d d d e f a b c e e f d c a b / / d e b c a Note that there is a fundamental difference between the permutations a, b, c and the other three. The former three form a cycle, generated by either b or c: b2 = c, b3 = a, c2 = b, c3 = a. It follows that these three permutations form a commutative subgroup. Here (as well as in the whole group), a is the neutral element and b and c are inverses of each other. Therefore, this subgroup is the same as the group Z3 of residue classes modulo 3, or as the group of third roots of unity. It can be proved that every finite group is a subgroup of an appropriate finite symmetric group. This can be interpreted so that the groups Em are as non-commutative and complex as possible. 874 CHAPTER 12. ALGEBRAIC STRUCTURES to any of a, b, b2 (e. g. if we had a = ab, then multiplication by a-1 from the left yields e = b; the other equalities can be refuted similarly). Since G contains only 6 elements, the set {ab, ba, ab2, b2a} has at most two. Again, we can have neither ab = ab2 nor ba = b2a. If ab = ba, then (ab)2 = a2b2 = b2 =^ e and (ab)3 = a3b3 = a =^ e, so the order of ab is greater than 3, which contradicts our assumption. Therefore, it must be that ab = b2a and ba = a2b, so that G is indeed isomorphic to §3 (a corresponds to a transposition and b does to a cycle of length 3). This group can also be viewed as the group of symmetries of an equilateral triangle (a corresponds to a reflection and b does to a rotation by 120°), see also 12.3.3. We have discussed all possibilities, so the proof is finished. □ 12.F.12. Find all commutative groups of order 8 (up to isomorphism). Then, for each of the following groups, decide to which of the found ones it is isomorphic (the operation is always multiplication): • ZlX5. • ZlX6. • Z1V{[1].[-1] = [16],-}, • the complex roots of the polynomial zs — 1. Solution. By theorem 12.3.8, every commutative group is a product of cyclic groups. By 12.3.10, their orders divide 8. This means that there are only 3 possibilities: Z8, Z2 x Z4, and Z2xZ2x Z2. • The group Z*5 contains the residue classes which are co-prime to 15. There are p(15) = (5 - 1)(3 - 1) = 8 of them, so indeed Z*5 = 8. In particular, these are 1,2,4,7,8,11,13,14. Their orders are either 2 (for 4,11,14) or 4 (for 2,7, 8,13), which means that Z*5 is isomorphic to Z2 x Z4. • Zf6 = {1,3,5,7,9,11,13,15}. Again, this group contains 8 elements, and their orders are either 2 (for 7,9,15) or 4 (for 3,5,11,13), which means that Z*6 is also isomorphic to Z2 x Z4. • Zf7 = {±1,±2,...,±8}. Thus, the quotient Z1X7/(±1) = {1,2,...,8} has 8 elements. We can easily calculate that the order of 3 is 8. Therefore, 3 generates the entire group, which means that Z1X7/(±1) =■ Z8. The other three permutations are self-inverse, which means that any one of them together with the identity a create a subgroup, the same one as Z2. Further, b and c are elements of order 3, i.e. the third power is the first one equal to the identity a, while d, e, and / are of order 2. Since the table is not symmetric with respect to the main diagonal, the composition ■ is not commutative. Other permutation groups Em of finite m-element sets behave similarly. Each permutation a partitions the set M into a disjoint union of maximal invariant subsets, which are obtained by taking unprocessed elements x e M step by step and putting all iteration results uk(x), k = 1,2,..., into the class Mx until ak(x) = x. Each permutation is obtained as a composition of the cycles, which behave as the identity outside Mx and as a on Mx. If the elements of Mx are numbered as (1,2,..., \ MX\) so that i corresponds to a1 (x), then the permutation is simply a one-place shift in the cycle (i.e. the last element is mapped back to the first one). Hence the name cycle. These cycles commute, so it does not matter in which order the permutation a is composed from them. (Of course, if we pick arbitrary two cycles on M, they do not have to commute.) The simplest cycles are one-element fixed points of a and two-element subsets (x, a(x)), where a(a(x)) = x. The latter are called transpositions. Since every cycle can be composed from transpositions of adjacent elements (just let the last element "bubble" back to the beginning), every permutation can be written as a composition of transpositions of adjacent elements. Return to the case of £3. Two elements, b, c, represent cycles which include all the three elements; each of them generates {a, b, c} = Z3. Besides those, d, e, f are composed of cycles of length 2 and 1; finally a is composed of three cycles of length one. There are no more possibilities. However, it is clear from the procedure that for more elements, there are very many possibilities. In general, there are many ways of expressing a permutation as a composition of transpositions. However, for a given permutation, the parity of the number of transpositions is fixed and independent of the choice of particular transpositions. This can be seen from the number of inverses of a permutation, since each transposition changes the number of inverses by an odd number (see the discussion in subsection 2.2.2 on page 82). It follows that there is a well-defined mapping sgn : Em -> Z2 = {±1}, the permutation parity. This recovers the proposition crucial for building the determinants (see 2.2.1 and on): Theorem. Every permutation of a finite set can be written as a composition of cycles. A cycle of length £ can be expressed as a composition of £ — 1 transpositions. The parity of this cycle is (—1)£_1. The parity of the composition got is equal to the product of the parities of the composed permutations a and r. 875 CHAPTER 12. ALGEBRAIC STRUCTURES • The complex roots of the polynomial zs — 1 are e^2, where n = 1,2,. ..,8. Clearly, these form a cyclic group of order 8, isomorphic to Zs. ^ 12.F.13. Let G be a commutative group and denote H = {g e G \ g2 = = e}, where e is the identity of G. Prove that H is a subgroup of G. Solution. Clearly, e e H. If a G H, then we also have a-1 G i7, because a = a-1 (since a2 = e). Moreover, if b G H, then (ab)2 = a2b2 = e (this is where we use the commutativity of G), which means that ab G H. Thus, H is closed under the operation, and it is indeed a subgroup. □ 12.F.14. Let gCn(R) denote the set of all n-by-n regular matrices with real coefficients. Prove that G = gC2(R) with multiplication is a group and decide for each of the following subsets H of G whether it is a subgroup of G: i) H = gc2(®), ii) H = gc2(z), iii) H = {AeGC2{Z) 1 \A\ = 1}, v) H = vi) # = vii) H = viii) # = 1 0 a 1 1 a a 1 0 a b c 1 a & c g G | a g Z g G I a g q g G | a, &, c g g G I a, 6, c g o 12.F.15. i) Decide whether the set H = {a g I subgroup of the group (R*, ■) ii) Decide whether the set H = {a g subgroup of the group (R, +) is a o 12.F.16. Find all positive integers m/5 such that the group Z^> is isomorphic to Zg . O 12.F.17. How many cycles of length p (1 < p < n) are there in §„? Solution. The elements of the cycle (i. e., the non-fixed points of the permutation) can be selected in (") ways. Now, The last proposition suggests that the mapping sgn transforms permutation composition a o r to the product sgn a ■ sgn r in the commutative group Z2. (Semi)group homomorphisms In general, a mapping / : G\ —> G2 is a (semi)group homo-morphism if and only if it respects the operation, i.e. /(a •&) = /(<*)•/(&). In particular, the permutation parity is a homomorphism sgn : Em -> Z2. In a moment, we shall see that the group inversions and units are also preserved by homorphisms. Before disussing the theory, let us look at more examples of groups. 12.3.3. Symmetries of plane figures. In the fifth part of chapter one, the connections between invertible 2-by-2 matrices and linear transformations in the plane are thoroughly considered. A matrix in Mat2 (R) defines a linear mapping R2 —> R2 that preserves standard distances if and only if its columns form an orthonormal basis of R2 (which is ' a simple condition for the matrix entries, see subsec-VJ 1 tion 1.5.7 on page 33). Combining the orthogonal linear mappings with translations, we arrive at the group of all Euclidean transformations of the plane. In fact, it is possible to prove that every mapping of the plane into itself which preserves distances is affine such a Euclidean transformation.9 As observed, the linear part of this mapping is orthogonal. Thus, all these mappings form a group of all orthogonal transformations (also called Euclidean transformations) in the plane. Moreover, it is shown that besides translations Ta by a vector a, these are only rotations around the origin by any angle p, and reflection Fe with respect to any line that goes through the origin (also note that the central inversion is the same as the rotation by it). Now, general group concepts are illustrated on the problem of symmetries of plane figures. For exam-^__pie, consider tiles. First, consider them individually, in the form of a bounded diagram in the plane. Then consider with the condition of tiling in a band, and then in the entire plane. If a mapping F : M2 —► M2 preserves distances, then this must also hold for the mapped vectors of velocity, i.e. the Jacobi matrix DF(x, y) must be orthogonal at every point. Expanding this condition for the given mapping F — (f(x, y),g(x, y)) : M2 —> M2 leads to a system of differential equations which has only affine solutions, since all second derivatives of F must be zero (and then, the proposition is an immediate consequence of Taylor's remainder theorem). Try to think out the details! The same procedure leads to the result for Euclidean spaces of arbitrary dimension. Note that the condition to be proved is independent of the choice of affine coordinates. Composing F with a linear mapping does not change the result. Hence, for a fixed point (x,y), compose (DF)^1 o F and assume, without loss of generality, that DF(x, y) is the identity matrix. Differentiation of the equations then yields the desired proposition. 876 CHAPTER 12. ALGEBRAIC STRUCTURES without loss of generality, we can proclaim one of the p elements to be the first element in the cycle representation (for instance the least one, if we are working with numbers). This element can be mapped to any of the p — 1 remaining elements, that one can be mapped to any of the p — 2 remaining elements, etc. Altogether, we get by the product rule that there Ip-(p-l)!. cycles of length p. □ 12.F.18. Let G be the set of real-valued matrices with zeros above the diagonal and ones on it. Prove that G with matrix multiplication forms a group, i. e., a subgroup in QC(3,R), and find the center of G (i. e., the subgroup defined by Z(G) = {z G G | Vg G G : zg = gz}). Solution. We can either verify all the group axioms or make use of the known fact that QL (3, R) is a group, and we verify only that G is closed with respect to multiplication and inverses. Clearly, the neutral element (the identity matrix) lies inG. 1 0 0\ / 1 0 0\ ai 1 0 J = I a + ai 1 0 ,&i ci 1/ \& + cai + &i c + ci 1/ It follows from the form of the products in G that the center contains precisely the matrices of the form □ 12.F.19. For any subset X C G, we define its centralizer as CG(X) = {y G G | xy = yx, for all x G X}. Prove that if X C Y, then CG(Y) C CG(X). Further, prove that X C Cq(Cq(X)) and CG(X) = CG(CG(CG(X))). Solution. The first proposition is clear: The elements of G which commute with everything from Y also commute with everything from X. We have from the definition that CG(CG(X)) = {y G G | xy = yx^x G CG(X)}, and this is in particular satisfied by the elements y G X. The last statement follows simply using the two above. Substituting X := CG(X) into the second one, we get CG(X) C CG(CG(CG(X))), and applying the first one to the second one, we obtain CG(X) D CG(CG(CG(X))). □ 12.F.20. Suppose that a group G has a non-trivial subgroup H which is contained in every non-trivial subgroup of G. Prove that H is contained in the center of G. As an example, consider a line segment and an equilateral triangle. It is of interest in how much these objects are symmetric; that is, with respect to which transformations (that preserve distances) they are invariant. In other words, we want the image of the figure to be identical to the original one (unless some significant points are labeled , for example the vertices of the triangle A, B, C or the endpoints of the line segment). It is clear that all symmetries of a fixed object form a group (usually with only one element, the identity). 1 In the case of the line segment, the situation is very simple - it is clear that the only non-trivial symmetries are rotation by 7r around the center of the segment, reflection Fh with respect to the axis of the segment, and reflection Fy with respect to the line itself. All these symmetries are self-inverse. Hence the group of symmetries has four elements. Its table fc6fes as follows: Ro Rw Fh Fy Ro Rir FH Fv Ro Rir FH Fv Rir Ro FV FH Fh Fv Ro Rir FV FH Rir Ro This group is commutative. For the equilateral triangle, there are more symmetries: one can rotate by 27r/3 or one can mirror with respect to axes of the sides. In order to obtain the entire group, all compositions of these transformations must be added in. In 1.5.9 it is shown that the composition of two reflections is always a rotation. At the same time, it is clear that changing the order of composition of fixed two reflections leads to a rotation by the same angle but the other orientation. It follows that the reflections around two axes generate all the symmetries, of which there are six altogether. Placing the triangle as is shown in the diagram, the six transformations are given by the following matrices: b = d = 0 _ i 2 vs 2 1 2 vl 2 a/1 2 _ 1 2 s/3 2 _ 1 2 _ 1 2 vs 2 1 2 \/3 2 2 ~~ 2/ A comparison of the table of the operation, with that of the permutation group £3, shows that it is the same. For the sake of clarity, the vertices are labeled with numbers, so the corresponding permutations can be easily understood. Similarly, there are groups of symmetries with k rotations and k reflections. It suffices to consider a regular k-gon. These groups are usually denoted Dk and are called the 877 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. For each g e G, the centralizer Cq (g) = {x G G \ xg = gx} is a non-trivial subgroup, since g G Cc(g) and Cq (e) = G. Thus, the group H is contained in every Cc(g)- Therefore, it is contained in their intersection (over all g G G), which is exactly the center of G. □ 12.F.21. Let G be a finite group. The conjugation class for a G G is the set Cl(a) = {xax-1 | x G G}. Prove that: i) the set of conjugation classes of all elements of G is a partition of G, ii) the size of each conjugation class divides the order of G, iii) if G has only two conjugation classes, then its order is 2. Solution, (i) It suffices to show that we have for any a,b G G that either Cl(a) = Cl(b) or Cl(a) n CZ(&) = 0. Thus, assume that the intersection of Cl(a) and Cl(b) is non-empty. Then, by definition, there are x, y G G such that rrarr-1 = yby-1. Multiplying this equality by y-1 from the left and by y from the right leads to y~1xax~1y = b. However, (y~1x)~1 = x~xy, which means that b is of the form zaz-1 for z = y~xx and thus lies in Cl(a). Analogously, we get a G Cl(b), so that both conjugation classes coincide. (ii) Note that the elements of CI (a) are in one-to-one correspondence with the cosets corresponding to the centralizer Cc(a) = {x G G xax~x = a}. Indeed, if elements b and c lie in the same coset (i. e., they satisfy b = cz for some z G Cc(a)), then bab-1 = cza(cz)-1 = czaz_1c_1 = czz_1ac_1 = cac-1. By 10.2.1, we haveG = |GG(a)|-|G/Gc(a)|, whichmeans that |G/(a) = |G/GG(a)| divides |G|. (iii) The neutral element always forms its own conjugation class Cl(e) = {e}. Therefore, if there are only two conjugation classes, then all the other elements u/e must lie in one class. Thus, its size is |G| — 1, and by (ii), this integer must divide |G|, which means that G = 2. □ 12.F.22. Let G be a commutative group. Suppose that the order r of an element a G G and the order s of an element b G G are coprime. Prove that the order of ab is rs. Solution. We have (ab)rs = arsbrs = (ar)s(&s)r = eser = e, so the order is at most rs. For the sake of contradiction, assume that (ab)q = e for some q < rs. Since q is less than the least common multiple of r and s (recall that r, s are dihedral groups of order k. They are not commutative for k > 3 (D2 is commutative). The name comes from the fact that D2 is the group of symmetries of the hydrogen molecule H2, which contains two hydrogen atoms and can be imagined as a line segment. Similarly, there are figures whose only symmetries are rotations, and hence the corresponding groups are commutative. They are denoted Ck and called cyclic groups of order k. For that, it suffices to consider a regular polygon whose sides are changed non-symmetrically, but in the same manner (see the extension of the triangle in the diagram). Note that the group G2 can be realized in two ways: either using the rotation by it or a single reflection. As the first illustration of the power of abstraction, we prove the following theorem. A figure is said to have a discrete group of symmetries if and only if the set of images of an arbitrary point over all the symmetries is a discrete subset of the plane (i.e. each of its points has a neighbourhood where there is no other point of the set). Note that every discrete group of symmetries of a bounded figure is necessarily finite. Theorem. Let M be a bounded set in the plane R2 with discrete group of symmetries G. Then, G is either trivial or one of the groups Ck, Dkfor k > 1. Proof. If there were a set M with translation as one of its symmetries, then it could not be bounded. \' If M had, as one of its symmetries, a rotation by angle which is an irrational multiple of 2ir, then iterating this rotation would lead to a dense subset of images on the corresponding circle. It follows that the group is not discrete. If M had non-trivial rotations with different centers as symmetries, then again it could not be bounded. To see this, write the corresponding rotations in the complex plane as R : z H> (z — a)(" + a, Q : z H> zrj for complex units ( = e2m/k, 77 = e2m/e and an arbitrary a ^ 0 G C. Then, it is immediate (a straightforward computation with complex numbers) that Q0R0 Q'1 o R'1 : m z + a(-l + C + V~Cv), which is a translation by a non-trivial vector unless the angle of one of the rotations is zero. It follows that M is not bounded. The same holds for the case of a rotation and a reflection with respect to a line which does not go through the center of the rotation. Check this case yourself! The only symmetries available are rotations with a common center and reflections with respect to lines which pass through this center. It remains to prove that the entire group The argument is subtle but straightforward: if there was an interval of length e on the circle not hit by an orbit of a point under the rotation, all points in the orbit would have to be at distance at least e. Thus there can be only finitely many of them and this contradicts the irrationality of the angle. 878 CHAPTER 12. ALGEBRAIC STRUCTURES coprime), at least one of them does not divide q. Assume that it is r (the other case can be refuted analogously). Taking the s-th power of the equality (ab)q = e, we get e = ((ab)q)s = (ab)qs = aqsbqs = aqs(bs)q = aqseq = aqs. Since r does not divide q and is coprime to s, we get that r (the order of a) does not divide qs, but aqs = e, which is a contradiction. □ 12.F.23. Prove that every finite group G whose order is greater than 2 has a non-trivial automorphism. Solution. If G is not commutative and a is an element that does not lie in the center, then the conjugation x i-> axa-1 defines a non-trivial automorphism. For a cyclic group of order m, we have, for any n coprime to m, the automorphism x i-> xn. If G is commutative, then it is a product of cyclic groups (see 10.1.8). If the order of at least one of the factors is greater than 2, then we can use the above automorphism for cyclic groups. If the order of each factor is 2, then permuting any pair of factors is a non-trivial automorphism. □ 12.F.24. Consider the group (Q, +) of the rational numbers with addition and the group (Q+, •) of the positive rational numbers with multiplication. Find all homomorphisms (Q,+)^(Q+ ■). Solution. There is only one homomorphism, the trivial one. For the sake of contradiction, assume that there exists a non-trivial homomorphism p, i. e., p(a) = b =^ I for some a,b e Q. Then, for all n e N, we have b = p(a) = p(n^) = p(^)n- This is a contradiction, since only some n-th roots of b are rational (cf. l.G.l). □ 12.F.25. Let G be the group of matrices of the form ? l\ I, where a, b e K and a > 0, and let N be the set b a J of matrices of the form ^ ^, where b eR. Show that N is a normal subgroup of G and prove that G/N is isomorphic toR. Solution. The key to the proof is the formula for multiplication in G: a 0 \ (a\ 0 J) a 1J \b1 axxj \ba\ + a 1bi a 1a1 1 0 is composed either only from rotations, or from the same number of rotations and reflections. Recall that the composition of two different reflections yields a rotation whose angle doubles the angle enclosed by the corresponding axes (see 1.5.9). Therefore, composing a reflection with respect to a line p with a rotation by angle p is again a reflection with respect to the line which is at angle p/2 from p (draw a diagram!). The proof is almost complete. Observe that the subgroup of all rotations in the group of symmetries contains a rotation by the smallest positive angle po (there are only finite many of them there). But then it is impossible to allow a rotation Rtp where p is not a multiple of po (for then p e (kp0, (k + l)po) for some k and the composition R-kVo ° Rv would have an smaller angle than po). This subgroup coincides with one of Ci. Next, adding one reflection produces exactly one different reflection for each nontrivial element in Ce, as seen above. □ 12.3.4. Symmetries of plane tilings. There is more complicated behaviour in the case of plane figures in A_ bands or in the entire plane (for example, symmetries of various tilings). Consider first the set containing the points that he between two fixed parallel lines. Suppose that this band is covered with disjoint images of a bounded subset M by some translation. Of course, this translation is a symmetry of the chosen tiling of the band. So the group of symmetries is necessarily infinite. Such a set allows for no other rotation symmetries than Rw, and the only possible reflections are either horizontal with respect to the axis of the band, or vertical with respect to any line which is perpendicular to the boundary lines. In addition, there are translations given by a vector which is parallel to the axis of the band. A not-too-complicated discussion leads to description of all discrete groups of symmetries for these bands. Such a group is generated by some of the following symmetries: translation T, shifted reflection G (i.e. composition of horizontal reflection and translation), vertical reflection V, horizontal reflection H and rotation R by it. Theorem. Every discrete group of symmetries of a band in the plane is isomorphic to one of the groups generated by the following symmetries: (1) a single translation T, (2) a single shifted reflection G, (3) a single translation T and a vertical reflection V, (4) a single translation T and the rotation R, (5) a single shifted reflection G and the rotation R, (6) a single translation T and the horizontal reflection H, (7) a single translation T, the horizontal reflection H and a vertical reflection V. The proof is not presented here. The following diagram shows examples of schematic patterns with corresponding symmetries: 879 CHAPTER 12. ALGEBRAIC STRUCTURES 0 Hence we can see that the mapping ^ | i-> a is a homomorphism with kernel AT. Thus, A is a normal subgroup of G. Moreover, G/N is isomorphic to the multiplicative group R+, which is isomorphic to the additive group R. □ 12.F.26. Let G be the group with operation of matrix mul ra b tiplication of 2 x 2 matrices [™ ^ J where ad — bc^O and a, b, c, d are integers from Z3. Show that i) \G\ = 48. ii) if ad- be = lthen \G\ = 24. Solution. i) For the first row (a, 6) of a M G G values a, b could be arbitrary except for a = 0,b = 0 in Z3. Hence, there are 3x3 — 1 = 8 possibilities to fill the first row. The second row should be not a multiple of the first row. The multiplication factor could be 0,1 or 2. Hence, for the second row there are 3x3 — 3 = 6 possibilities to fill the second row once the first row is filled. Thus, \G\ = 8 x 6 = 48. ii) The set of matrices ( H , 1, such thatad—&c = 1 forms Vc d) a normal subgroup in G. Note that for Vg G G values of detg G Z3 can be either 1 or 2. E.g. detg = 2 for 9 — ^ det ^ = 1 f°r ^ = ^ ■ Hence, det g' = 2 V H from a group G to a group H is ~\ called a group homomorphism if and only if it s&fjjjs*^ respects the operation, i.e. for all a,b G G that /(a •&) = /(<*)•/(&). Note that the operation on the left-hand side is the operation in G, before / is applied, while the operation on the right-hand side is the operation in H, after / is applied. The following properties of homomorphisms follow easily from the definition: Proposition. Every group homomorphism f : G —> H satisfies: (1) the identity ofG is mapped to the identity of H, (2) the inverse of an element of G is mapped to the inverse of its image, i.e. f(a-1) = f{a)-\ (3) the image of a subgroup K C G is a subgroup f(K) C H, (4) the preimage /_1(A) C G of a subgroup K C H is again a subgroup, (5) if f is also a bijection, then the inverse mapping f~x is also a homomorphism, (6) f is infective if and only if /_1(e^f) = {ec}. Proof. (3), (2), and (1). If A C G is a subgroup, then for each y = f(a), z = f(b) in H, a,b G K, y ■ z = f(a ■ b) also lies in the image. In particular, f(b) = f(e-b) = /(e) ■ f(b) and similarly f(b) = /(&)■ /(e). Thus /(e) is the unique unit in the image / (K). Finally,/(e) = /(aa-1) = /(a) ■ /(a-1), so that /(a-1) is the right inverse of /(a). Similarly, it is a left inverse, too, and we have proved the first three claims. 880 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. On the one hand, mnm"1!!"1 = m(nm~17i~1) = mm' G M. On the other hand, mnm-1?!-1 = (mnm-1)?!-1 = n'n-1 G TV. Hence mnm~1n~1in M U N. So, mnm-1?!-1 = e. Thus, ran = nm. □ 12.F.29. Show that the intersection of two normal subgroups of G is also a normal subgroup of G. Solution. It is clear that the intersection of two subgroups of G is also a subgroup of G. Let N and M be normal subgroups of G. Then for any j£G and any n e N and any m G M we have gng~x G N and gmg~A G M. Let w G N U M. Since w G N we have gwg-1 G N and since w G M we have gwg-1 G M. Hence gwg-1 G N U M for any g G G i.e. iV U M is a normal subgroup of G. □ 12.F.30. Prove that if G is some group and H C G its subgroup of index 2 then H is a normal subgroup of G. Solution. Let H and ai? be the left cosets of H in G and and Hb be right cosets of H in G. Since there are only two cosets aH = G \ H and Hb = G \ H, then aH = Hb. In order to show that H is normal in G, we need xH = Hx for any x G G. Let x G G. If x & H then, obviously, xH = H = Hx i.e. xH = Hx. If x eG\H = aH = Hb then there exist /ii, /i2 G -ff such that a; = a/ii = /i2b. Then xtf = (ah\)H = aH = Hb = H(h2b) = Hx, i.e. xH = Hx for any x G G, so G~1HG = H. □ 12.F.31. Show that the intersection of two normal subgroups in G is a normal subgroup of G. Solution. It is clear that the intersection of two subgroups of G is a subgroup of G: Let N and M be normal subgroups of G: Then for any j £ G and any n G N and any m G M we have gng~x G N and gmg~A G M. Let w G iV U M. Since w G N we have gwg-1 G iV and since w G M we have gwg-1 G M. Hence gwg-1 G ^ U M for any g G G i.e. iV U M is a normal subgroup in G. □ 12.F.32. If N and M are normal subgroups of G show that NM is also a normal subgroup of G. Solution. Let nm G iVM, then gnmg~x = gng~xgmg—\ G iVM, since gmg~x G M and gng~x e N. □ Proceed similarly in the case of preimages: if a, b G G satisfy f(a),f(b) eK C #, then also f(a-b) eK. Suppose there exists an inverse mapping g = Fix arbitrarily y = /(a), 2 = /(b) G Then, f(a-b) = y ■ z = /(a) • /(&), which is equivalent to the expression g(y) ■ g(z) = a ■ b = g(y ■ z). Thus the inverse mapping is also a homomorphism. If /(a) = f(b), then /(a ■ b"1) = eH. Therefore, if the only element that is mapped to &h is ec, then a ■ b-1 = ec, i.e. a = b. The other implication is trivial. □ The subgroup f~1(en) (the preimage of the identity in H) is called the kernel of the homomorphism / and is denoted ker /. A bijective group homomorphism is called a group isomorphism. It follows directly from the above ideas that a homomorphism / : G —> H with a trivial kernel is an isomorphism onto the image /(G). 12.3.6. Examples. The additive group Zk of residue classes <>7 modulo k is isomorphic to the group of fc-th roots of unity, and also to the group of rotations by integer multiples of 2-k/k. Draw a diagram, calculation with the complex units e2m/k is very efficient. The mapping exp : R —> R+ is an isomorphism of the additive group of the real numbers onto the multiplicative group of the positive real numbers. This isomorphism extends naturally to a homomorphism exp : C —> C\{0} of the additive group of the complex numbers onto the multiplicative group of the non-zero complex numbers. However, this homomorphism has a non-trivial kernel. The restriction of exp to purely imaginary numbers (which is a subgroup isomorphic to R) is a homomorphism it H> erf = cos t + i sin t. This means that the numbers 2km, k G Z, lie in the kernel. It can be shown that nothing else is in the kernel. If es+rf = es. e«t _ 1 for real numbers s and t, then es = 1, i.e. s = 0, and then t = 2kir for an integer fc. The determinant of a matrix is a mapping which assigns, to each square matrix of scalars in K, a scalar in K (the cases K = Z, Q, R, C have already been worked with). The Cauchy theorem about the determinant of the product of square matrices det(A-B) = (detA) ■ (det B) can be also seen as the fact that for the group G = GL(n, K) of invertible matrices, the mapping det : G —> K \ {0} is a group homomorphism. 12.3.7. Group product. Given any two groups, a more complicated group can be constructed using the following construction: 881 CHAPTER 12. ALGEBRAIC STRUCTURES 12.F.33. If a is a normal subgroup in the finite group such that gcd(|G : N\,\N\) = 1. Show that a must contain any element x g G satisfying x^N^ = e. Solution. Let x be an element in G such that x wl = e. Since gcd(\G : a|, |7y|) = 1 there exist m,n g Z such that m\G : N\+n\N\ = 1. Group product Then x „mlG-.Nl+nlN :\G:N\ Since x<- e we have x = xmlG:iVl. Consider element xN g G/N. Since xNk = xkN for any k g Z, then N = (xN)^/1^ = x\g/n\ jy ^ wjjjcjj js me identity element in GN. This means a;|g/w| G at and so a; = (a;lG:Arl )m g N. □ 12.F.34. If H is a subgroup of G such that the product of any two right cosets of H in G is again a right coset of H in G show that H is normal in G. Solution. Let TJa and Hb be two right cosets of H g G. By assumption HaHb is a right coset of H in G: The set HaHb contains the element eaeb = ab. Since there is only one right coset of H in G containing ab; which is Hab; we have HaHb = Hab for all a,b e G. Hence HaH = Ha for all a g G. i.e. h1ah2 is equal to h3a for some h3 g H. But h\ah2 = h3a implies ah2a~x = h^1^ = h' g H for any h2 e H and a e G. So, 7J is normal in G. □ 12.F.35. Let Z(G) be centraliser of group G and xy = z e Z(G) for some x,y e G. Show that a; and y commute. Solution. As zx = xz, we have xyx = zx = xz = xxy. Multiplying by a;-1 on the left, we obtain ya; = xy. □ 12.F.36. Let G be a group of 2 x 2 matrices M a b c d such that ad — be ^ 0, a, b,c,d g r. Let subgroup H = x ^ I a; g z| g G and g = ^ Show that H ^ gHg-1 c H. Solution. Obviously, matrix (} ^ J g H. However, 1 0 0 2 1 0 x 1 1 0 1 0 2a; 2 1 0 0 I 1 0 2a; 1 Thus, g~1Hg = 1 0 2a; 1 g H. x g z > does not contain □ For any groups G, 7J the grawp product Gx H is denned as follows: The underlying set is the Cartesian product G x H and the operation is denned componentwise. That is, (a,x) ■ (b,y) = (a-b,x- y), where the left-hand operation is the one to be denned, while the right-hand operations are respectively those in G and H. The projections onto the components G and H in the product: pG : G x H 3 (a,b) >-> a g G, pH ■ G x H 3 (a,b) >-> b are surjective homomorphisms, whose kernels are kerpG = {(eG,b); b g 7J} — H, kerpff = {(a, en); a g G} ~ G. The group Z6 is isomorphic to the product Z2 x Z3. This can be seen easily in the multiplicative realization of the groups as the complex fc-th roots of unity. z6 consists of the points of the unit circle that form the vertices of a regular hexagon. Then, z2 corresponds to ±1, while z3 corresponds to the equilateral triangle, one of whose vertices is the number 1. If each point is identified with the rotation that maps 1 to that point, then the composition of such rotations is always commutative. Composing a rotation from z2 with a rotation from z3 yields exactly all rotations from z6. Draw a diagram! This leads to the following isomorphism (using additive notation, as is common for the residue classes): [0]e ([0]2, [0]3), [1W([1]2,[2]3), [2W([0]2,[1]3), [3W([1]2,[0]3), [4]e ([0]2, [2]3), [5W([1]2,[1]3). Similar constructions are available for finite commutative groups in complete generality. 12.3.8. Commutative groups. Any element a of a group G is contained in the minimal subgroup {.. ., a~2, a-1, e, a, a2, a3,... }, which contains it. It is clear that this subgroup is commutative. If G is finite, then it must once happen that ak = e. The least positive integer k with this property is called the order of the element a in G. A cyclic group G is one which is generated by one of its elements a in the above manner. If the order k of the generator is finite, then it results in one of the groups Ck, known from the discussion of symmetries of plane figures. It follows directly from the definition that every cyclic group is isomorphic either to the group of integers z (if it is infinite) or to one of the groups of residue classes Zk (if it is finite). These simple building stones are sufficient to create all finite commutative groups. 882 CHAPTER 12. ALGEBRAIC STRUCTURES 12.F.37. Let G be a group of an even order. Prove that it contains a subgroup of order 2. Solution. Consider the set of pairs fl = {g,g-1}, where g ^ g~x. This engages an even number of elements of G. Those elements g e G that are not in fl are the ones satisfying g = g_1i.e., g2 = e. Therefore if we count Gmod2 , we can ignore the pairs {g, g-1} G fl and obtain \G\ = \{g e G\g2 = e}| mod 2. One solution to g2 = e is e itself. If it were the only solution, then \G\ = 1 mod 2, which is false. Therefore some j^e satisfies g2 = e, which provides an element and also a subgroup in G of order 2. □ 12.F.38. Let G be finite abelian group, which order n is divisible by prime factor p. Show that G contains an element of order p and, hence a cyclic subgroup of that order. This claim is known as abelian case of Cauchy theorem. Solution. We use induction on n. Case n = p is trivial. Let n > p and p\n and the proposition is true for all abelian groups of orders divisible by p and less than n. Assume no element of G has order p. Then no element has order divisible by p, because if g G G has order r and p\r then gp would have order p. Let G = {gi,g2, ■ ■ ■ ,gn} and let gi have order mi so m, is not divisible by p. Set m to be the least common multiple of all mj, i = 1, 2,..., n, so m is not divisible by p and gm' = e for all i. Because G is abelian, the function / : (Z/m)n -> G given by /(ai,...,a„) is a homomorphism /(al,an)/(61,6n) = /(ai + 6i,..., a„ + bn). as, by commutativity of G, •5n"5lbl ai+bl 01 's. This homomorphism is surjective (each element of gi e G is -F(ai,..., a„) for a, = 1 and other a,j = Q,j=^ i. gi, and if ai = 1 and other aj's are 0 then f(al,. . . , an) = gi) and the elements where / takes particular value gi is a coset of ker /, so | G | = number of cosets of ker / which is a factor of \Z/m\n, which is a factor of mn. Since p\\G\ and mn is not divisible by p, this is a contradiction. □ 12.F.39. Let G be finite -abelian group, which order n is divisible by prime factory. Show that G contains an element of order p and, hence a cyclic subgroup of that order. This is non-abelian case of Cauchy theorem. Solution. If a proper subgroup H of G has order divisible by p, then by induction there is an element of order p in H, Theorem. Every finite commutative group G is isomorphic to a product of some cyclic groups Ck. The orders of the components Ck are always powers of the prime divisors of the number of elements n = |G|. This product decomposition is unique, up to order. Ifn = pkl ■pkr is the prime factorization of n, then the group Cn is isomorphic to the product Gt? —— C k-i x ' ' ' x G kr . Pi Pr Incomplete proof. We are going to prove only the second claim now, and we return to the first claim later, see 12.3.12,12.3.13. For a simpler case, suppose n = pq with p coprime to q. Fix a generator a of the group Cn, a generator 6 of Cp, and a generator c of Cq. Define the mapping / : G„ —> Cp x Cq by f(ak) = (bk,ck). Since ak ■ ae = ak+e and similarly for 6 and c, it follows that f(ak ■ a1) = (bk+e,ck+e) = (bk,ck) ■ (be,ce), so the mapping / is a homomorphism. If the image is the identity, then k must be a multiple of p as well as q. Since p and q are coprime, k is a multiple of n, so / is injective. Moreover, the group G„ has the same number of elements as Cp x Cq, so / is an isomorphism. Finally, the proposition about the decomposition of cyclic groups of order k into smaller cyclic groups follows by induction on the number of different primes pi in the factorization of n. □ Notice, Cp2 is never isomorphic to the product Cp x Cp. While Cp2 is generated by an element of order p2, the highest order of an element in Cp x Cp is only p. Since every finite commutative group is isomorphic to a product of cyclic groups, it is possible, for a given number of elements, to enumerate all commutative groups of that order up to isomorphism. For instance, there are only two groups of order 12: C12 = G4 x G3, C2 x C2 x d = C2 x G6. Notice similarly that if all elements (except the identity) of a finite commutative group G have order 2, then G has the form (C2)n for an integer n. In particular, such a group G has 2™ elements. If the decomposition of G into cyclic groups contains a group Cp, p > 2, then the group contains elements of higher order. 12.3.9. Subgroups and cosets. Selecting any subgroup H fjj,, of a group G, gives further information about the structure of the whole group. A binary relation ~H on G can be defined as follows: a ~# 6 if and only 1 if 6_1 ■ a G H. This relation expresses when two ele ments of G are "the same" up to multiplication by an element of H from the right. It is easily verified that this relation is an equivalence: Clearly, a = e G H, so it is reflexive. If a = h G H, then a"1 ■ 6 = (6"1 ■ a)"1 = h~x G H, 883 CHAPTER 12. ALGEBRAIC STRUCTURES which gives us an element of order p in G. Thus we may assume no proper subgroup of G has order divisible by p. For any proper subgroup H C G we have G = \H\[G : -ff] and |-ffj is not divisible by p, so p\[G : H] for every proper subgroup H. Denote conjugacy class of an element a e G by C(a) = {b G G\b = g_1agfor someg G G}. Let the conjugacy classes in G with size greater than 1 be represented by 9i, 92, ■ ■ ■, 9k- The conjugacy classes of size 1 are the elements of the center Z(G). Since the conjugacy classes are a partition of G, counting G by counting conjugacy classes implies k k |G| = \Z(G)\ + \C(9i)\ = \Z(G')\ + EtG : W], i=l i=l where Z(gi) is the centraliser of gi. Every [G : Z(gi)] > 1 since each C(gi) is greater than 1. Since Vj, |G| = \Z(gA\[G : Z(9i)], it yields p\[G : Z(9i)]. Thus, |Z(G)| is divisible by p and so is non-trivial. Since proper subgroups of G do not have order divisible by p, center Z(G) has to be all of G. That means G is abelian, which is a contradiction. □ 12.F.40. Prove that any group of order 15 is cyclic. Solution. Let G be a group of order 15. By Cauchy's theorem there exists a subgroup M of order 5 and a subgroup N of order 3. [G : M] = ^ = 3. Both M and N are cyclic subgroups. Let a; be a generator of M and y - the generator of N. Let's prove that xy generates entire G. (NEED TO COME UP WITH AN ELEGANT PROOF.) (x9? = □ 12.F.41. Let G be a group of order 14 which has a normal subgroup N of order 2. Prove that G is commutative. Solution. Clearly, the order of the group G/N is |G/iV| = I GI jjyf- = 7. By Lagrange's theorem 12.3.10, the orders of its elements are 1 or 7. Since only the identity has order 1, this means that there is an element of order 7, so that the group G/N is cyclic. Let N = {e, n}, where e is the identity of G and let [a] be the generator of G/N. Since N is normal, we have ana-1 G N, but ana-1 = e implies n = e, which means that we must have ana-1 = n, i. e., na = an. Since [a] generates G/iV, we get that each element of G/N is of the form [a]k, k = 0,..., 6, i. e., [ak]. Then, each element of G so it is symmetric as well. Finally, if c-1 ■ b G H and b-1 ■ a G -ff, then c"1 ■ a = c"1 ■ 6 ■ b"1 ■ a G -ff, so it is transitive, too. It follows that G partitions into the left cosets of mutually equivalent elements, with respect to the subgroup H. The coset corresponding to an element a is denoted a ■ H, and a ■ H = {a - h; he H}, since an element b is equivalent to a if and only if it can be expressed this way. The corresponding partition of G (i.e. the set of all left cosets) with respect to H is denoted G/H. Similarly, right cosets H ■ a can be denned. The corresponding equivalence relation is given by a ~ b if and only if a ■ b-1 G H. Hence, H \ G = {H ■ a; a G G}. Proposition. Lef G be a group and H a subgroup of G. Then: (1) The left cosets with respect to H coincide with the right cosets with respect to H if and only if for each a G G, h G H a - h- a-1 G H. (2) Each coset (left or right) has the same cardinality as the subgroup H. Proof. Both properties are direct consequences of the definition. In the first case, for any a G G, h G H, an element h' G H is required so that h - a = a-h'. This occurs if and only if a-1 ■ h ■ a = hi G H. In the second case, if a ■ h = a ■ h!, then multiplication by a-1 from the left yields h = hi. □ As an immediate corollary of the above statement, there are the following extremely useful results: 12.3.10. Theorem. Let G be a finite group with n elements and H a subgroup of G. Then: (1) the cardinality n = |G| is the product of the cardinality of H and the cardinality of G/H, i. e. \G\ = \G/H\ ■ \H\, (2) the integer \ H\ divides n, (3) if a G G is of order k, then k divides n, (4) for each a e G, an = e, (5) if n is prime, then G is isomorphic to the cyclic group Zn. The second proposition is called Lagrange's theorem, The fourth proposition is called Fermat's little theorem. Special cases are discussed in the previous chapter on number theory. Proof. Each left coset has exactly \H\ elements. However, different cosets are disjoint. Hence the first proposition follows. The second proposition is a direct corollary of the first one. 884 CHAPTER 12. ALGEBRAIC STRUCTURES is of the form ak or akn, and since a and n commute, we get that actually all elements of G commute. □ 12.F.42. Decide whether the following holds: If the quotient G/N of a group G by a normal subgroup N is commutative, then G itself is commutative. O 12.F.43. Prove that any subgroup H of the symmetric group §„ contains either only even permutations or the same number of even and odd permutations. Solution. Consider the homomorphism p : H —> Z2 which maps each permutation to its parity (0 for even and 1 for odd). Then, p~x (0) = Ker(p) is a normal subgroup of H. let h £ Ker(p), then Pighg'1) = p(g)p(h)p(g~1) =p(g)p(g~1) = pigg'1) = = P(e) = 0, which means that ghg~x £ Ker(p), i. e., Ker(p) is normal. Since Z2 has only two elements, it follows that H/ Ker(p) has either only one coset (i. e., all permutations are even) or two cosets, which must be of equal size (i. e., there are the same number of even and odd permutations). □ 12.F.44. Describe the group of symmetries of a regular tetrahedron and find all of its subgroups. Solution. Let us denote the vertices of the tetrahedron as a, b, c, d. Each symmetry can be described as a permutation of the vertices (to which vertex each one goes). Thus, the group of symmetries of the tetrahedron is isomorphic to a certain subgroup of the symmetric group §4. Given any pair of vertices, there exists a symmetry which swaps this pair and keeps the other two vertices fixed (this is reflection with respect to the plane that is perpendicular to the line segment of the pair and goes through its center). Thus, the wanted subgroup is generated by all transpositions in §4. However, this is the group §4 itself. Thus, let us describe all subgroups of the group §4. This group has 24 elements, which means that the order of any subgroup must be one of 1, 2, 3, 4, 6, 8,12, 24 (see 12.3.10). Clearly, the only group of order 1 is the trivial subgroup {id}. Similarly, the only group of order 24 is the entire group §4. Now, let us look at the remaining orders of a potential subgroup H C §4. (i) \H\ =2. H must consist of the identity and another self-inverse element (x2 = id). These are transpositions and Each element a £ G generates the cyclic subgroup {a, a2,... ,ak = e}, and the order of this subgroup is exactly the order of a. Therefore, the order of a must divide the number of elements of G. Since the order k of any element a divides n and ak = e, then also an = (ak)s = e for any integer s. If n > 1, then there exists an element a £ G that is different from e. Its order k is an integer greater than one and it divides n. Therefore, k must be equal to n. This means that all the elements of G are of the form ae for £ = 1,..., n. □ 12.3.11. Normal subgroups and quotient groups. A sub- y-""'"'-:., group H which satisfies a ■ h- a-1 £ H for all \»f a £ G, h £ H, is called a normal subgroup. T\/f'i For normal subgroups, the operation on _JsiL— G/H can be defined by (a ■ H) ■ (b ■ H) = (a ■ b) ■ H. Choosing other representatives a ■ h,b ■ h' leads to the same result: (a-h-b-h')-H = ((a-b)-(b-1 -h-b)-h')-H = (a-b)-H. Moreover, cosets can be written as H ■ a ■ H, and the equation (H ■ a) ■ (b ■ H) = H ■ (a ■ b) ■ H is straightforward. On the other hand, the definition of the product on the cosets fails if H is not normal. Clearly, this new operation on G/H satisfies all group axioms: the identity is the group H itself (formally it is the coset e-H that corresponds to the identity e of G), the inverse of a ■ H is a-1 ■ H, and the associativity is clear from the definition. This is called the quotient group G/H of G by the normal subgroup H. Of course, in commutative groups, every subgroup is normal. The subset nL = {na; a £ Z} C Z is a subgroup of the integers, and the corresponding quotient group is the (additive) group Z„ of residue classes. It is clear from the definition that the kernel of every homomorphism is a normal subgroup. On the other hand, if a subgroup H C G is normal, then the mapping p : G -4- G/H, a^a-H is a surjective homomorphism, whose kernel is H. It can be seen directly from the definition of the operation onG/H that p is a homomorphism, and it is clearly surjective. It follows that normal subgroups are precisely the kernels of homomor-phisms. Moreover, for any group homomorphism / : G —> K with kernel H = ker /, there is a well-defined homomorphism f-.G/H^K, f(a-H) = f(a), which is injective. If H is any normal subgroup in G contained in ker /, the latter homomorphism is still well defined, but not necessarily injective. 885 CHAPTER 12. ALGEBRAIC STRUCTURES double transpositions (compositions of two disjoint transpositions). Geometrically, double transpositions correspond to rotation by 180° around the axis that goes through the centers of opposite edges). Thus, we get nine subgroups: {id, (a, 6)}, {id, (a, c)}, {id, (a, d)}, {id, (6, c)}, {id, (6, d)}, {id, (c, d)}, {id, (a, 6) o (c, d)}, {id, (a, c) o (6, d)}, {id, (a, d) o (6, c)}. (ii) \H\ = 3. By Lagrange's theorem, such a subgroup must be cyclic, i. e., it must be of the form {id, p,p2}, p3 = id. Thus, the factorization of p to independent cycles must contain a cycle of length 3, which means that p cannot contain anything else. By 12.F.17, there are 4 ■ 2 cycles of length 3, which give rise to the following four subgroups: {id, (a, b, c), (a, c, 6)}, {id, (a, c, d), (a, d, c)}, {id, (a, 6, d), (a, d, 6)}, {id, (6, c, d), (6, d, c)}. The cycles of length 3 correspond to rotation by 120° around the axis that goes through a vertex and the center of the opposite side. (iii) \H\ =4. Such a subgroup must be isomorphic to Z4 or Z2 x Z2. Considering the factorization to independent cycles, we find out that the only permutation of order 4 is a cycle of length 4. Thus, cyclic subgroups must contain a cycle of length 4, namely exactly two of them, since if p has order 4, then p~x = p3 is also of order 4 i. e., a cycle of length 4. Then, the permutation p2 has order 2, so it must be a double transposition (it is not a single transposition, since p2 clearly does not have a fixed point). There are six cycles of length 4 (see 12.F.17), and they pair up to the following three subgroups of this type: {id, (a, b, c, d), (a, c) o (b, d), (a, d, c, 6)}, {id, (a, c, 6, d), (a, 6) o (6, d), (a, d, 6, c)}, {id, (a, 6, d, c), (a, d) o (6c), (a, c, d, 6)}. As for subgroups isomorphic to Z2 x Z2, they must contain (besides the identity) only elements of order 2, which are transpositions and double transpositions. By 12.F.43, the subgroup must contain either no or exactly two transpositions. Moreover, it cannot cannot contain two dependent transpositions, since their composition is a cycle of length 3. Thus, the subgroup contains (besides the identity) either two independent transpositions and the double transposition which is their composition (this gives rise to three subgroups), or the three double transpositions. Altogether, we have found: {id, (a, 6), (a, 6) o (c, d), (c, d)}, {id, (a, c), (a, c) o (b,d),(b,d)}, {id,(a,d),(a,d) o (b,c),(b,c)} and {id, (a, 6) o (c, d), (a, c) o (6, d), (a, d) o (6, c)}. There is a seemingly paradoxical example of a group ho-momorphism C* —> C*, defined on the non-zero complex numbers by z i-> 2fe, where fc is a fixed positive integer. Clearly, this is a surjective homomorphism, and its kernel is the set of fc-th roots of unity, i.e. the cyclic subgroup Zj,. Reasoning as above, there is an isomorphism / : CVZfc -> C* for any positive integer k. This example illustrates that in the case of infinite groups, the calculations with cardinalities are not so intuitive as in the case of finite groups and theorem 12.3.10. 12.3.12. Exact sequences. A normal subgroup H of a group G yields the short exact sequence of groups e^H G ^ G/H -> e, where the arrows respectively correspond to the only homomorphism of the trivial group {e} into the group H, the inclusion 1 of the subgroup 77 C G, the projection v onto the quotient group G/H, and the only homomorphism of the group G/H onto the trivial group {e}. In each case, the image of one arrow is precisely the kernel of the following one. This is the definition of exactness of a sequence of homomorphisms. If there exists a homomorphism a : G/H —> G such that voa = idG it is said that the exact sequence splits. Lemma. Every split short exact sequence of commutative groups defines an isomorphism G —> H x G/H. Proof. Define a mapping f : H x G/H ^ G by f(a,b) = a ■ a(b). Since the groups are commutative, / is a homomorphism: / (aa', 66') = aa'a{b)a{b') = (aa(b)) (a'a (6')). If f(a, 6) = e, then tr(6) = a"1 G H, i.e. 6 = v{(j{b)) is the identity in G/H. However, its image is then a(b) = e, so a = e. Since the left and right cosets of commutative groups coincide, the mapping / is surjective. Hence / is an isomorphism. □ Now, the main idea of the proof of theorem 12.3.8 can be indicated. If it is known that every short exact sequence created by choosing cyclic subgroups H of a finite commutative group G splits, then it is easy to proceed with the proof by induction. If G is a group of order n which is not cyclic, then an element a of order p,p < n, can selected. The cyclic subgroup H generated by a can be found as well as the splitting of the corresponding short exact sequence. This expresses the group G as the product of the selected cyclic subgroup H and the group G/H of order n/p. The main technical point of the proof is the verification that in each finite commutative group, there are elements of order pr with appropriate powers of the primes p and that the short exact sequences for these groups really split. 886 CHAPTER 12. ALGEBRAIC STRUCTURES (iv) \H\ = 6. By 12.F.11, this subgroup is isomorphic to §3 (it cannot be isomorphic to Z6 since there is no element of order 6 in §4), so it contains (besides the identity) two elements x, x~x of order 3 and three elements of order 2. Thus, x and a;-1 are cycles of length 3 which fix the same vertex (say a). What are the other three elements? There cannot be a double transposition, since its composition with x yields another cycle of length 3. There cannot be a transposition which does not fix a since its composition with x yields a cycle of length 4. Thus, the only possibility is that there are the three transpositions which also fix a. Since there are four possibilities which vertex is the fixed one, we obtain four subgroups of order 6. (v) \H\ = 8. The group cannot be a subgroup of the group A4 of even permutations (since there are 12 of them, and 8 does not divide 12). Thus, by 12.F.43, H must contain four even and four odd permutations. The even permutations must form a subgroup of A4, and we could see in (iii) that the only such 4-element subgroup is {id, (a, 6) o (c, d), (a, c) o (b, d), (a, d) o (b, c)}, which is normal. Considering any odd permutation and the coset (with respect to the above normal subgroup) which contains it, we can see that the coset together with the above 4 elements form a subgroup of §4. We thus get three subgroups of §4. It is not hard to realize that each of them is isomorphic to the group of symmetries of a square (the so-called dihedral group Df). From the geometrical point of view, we can describe it as follows: Consider the orthogonal projection of the tetrahedron onto the plane that is perpendicular to the line that goes through the centers of opposite edges. The boundary of this projection is a square. Out of all the symmetries of the tetrahedron, we take only those which induce a symmetry of this square (for instance, it will not be a symmetry which only swaps adjacent vertices of the resulting square). Since there are three pairs of opposite edges in the tetrahedron, we get three 8-element subgroups, isomorphic to the dihedral group DA. (vi) \H\ = 12. By 12.F.43, such a subgroup contains either only even permutations, or six even and six odd permutations, and the six even permutations must form a subgroup of §4. However, we could see in (iv) that there is no 6-element subgroup of §4 consisting only of even permutations. Thus, the only possibility is the alternating group A4 of all even permutations in §4. From the 12.3.13. Return to finite Abelian groups. Below is a brief exposition of the complete proof of the classification theorem, broken into several steps. The following lemma suggests that cyclic subgroups with prime orders are required. Lemma (Claim 1). Let G be a finite Abelian group with n elements. If p is a prime which divides n, then there is an element g G G of order p.n Proof. The claim is obvious if n is prime, i.e. G = Zp (as proved above). If n is not prime, proceed by induction on n. Clearly G must have a proper subgroup H if 71 is not prime, \H\ = 771 < 71. Either p1771 or pI (ti/tti). In the former case, the claim follows from the induction hypothesis directly. Otherwise assumep\(n/m) (remind ti/tti is the order of G/H). Then there is an element j 6 G such that the order of g ■ H in the quotient group G/H is p. Thus gp e H, and therefore the order of g in G divides p\H\. Since p is a prime, the order of g is Ip for some integer I. Hence the element ge has the required order p. □ For any prime number p, G is called a p-group if each of its elements has order pk for some power k. Claim 1 has an obvious corollary: Lemma (Claim 2). A finite Abelian group G is a p-group if and only if its number of elements n is a power of p. Proof. One implication follows straight from the Lagrange's theorem since all proper divisors of a power of a prime p are just smaller powers of p. On the other hand, if 71 is not a power of prime, it has another prime divisor q and so there is an element of order q by Claim 1. □ Now it can be shown that a given finite Abelian group G can always be decomposed into a product of p-groups. Lemma (Claim 3). If G is a finite Abelian group than it is isomorphic to a product of p-groups. This decomposition is unique up to order. Proof. Consider a prime p dividing 71 = \G\. Define Gp to be the subgroup of all elements whose orders are powers of p, while G'p is the subgroup of all elements whose orders are not divisible by p (check yourself that subgroups are obtained in this way). By the above Claim 1, the subgroup Gp is not trivial. Next, consider an element g of order qpe with q not di-visible by p. Then gp has order q, so this element belongs to G'p, while gq e Gp. The Bezout equality guarantees there are integers r and s such that rpe + sq = 1. Hence 9 = 9rpl ■ 9sq is a decomposition of g into product of elements in Gp and G'p. This verifies G ~ Gp x G'p and Gp is a p-group. 1 ^This is a special version of the more general result valid for all finite groups, called the Cauchy theorem. The formulation remains the same, with the word Abelian omitted. 887 CHAPTER 12. ALGEBRAIC STRUCTURES geometric point of view, these are the so-called direct symmetries, which are realized by rotations (not reflections), and thus can be performed in the space. □ Remark. In general, the group of symmetries of a solid with n vertices is a subgroup of the symmetric group S„. 12.F.45. Which subgroups of the group §4 are normal? Solution. By definition, a subgroup H C §4 is normal iff it is closed under conjugation, i. e^ghg-1 e H for any g e §4, h e H. Since conjugation in symmetric groups only renames the permuted items but preserves the permutation structure (i. e. the cycle lengths in the factorization to independent cycles), we can see that H is normal if and only if it contains either no or all permutations of each type. Examining all the subgroups, which we found in the previous exercise, we find that the normal ones are the trivial group {id}, the so-called Klein group (consisting of the identity and the three double transpositions, which we already met in 12.F.8), the alternating group A4 of all even permutations, and the entire group §4. □ 12.F.46. Find the group of symmetries of a cube (describe all symmetries). Is this group commutative? Solution. The group has 48 elements; 24 of them are generated by rotations (these are the so-called direct symmetries), the other 24 are the composition of a direct symmetry and a reflection. The group is not commutative (consider the composition of a reflection with respect to the plane containing the centers of four parallel edges and a rotation by 90° around the axis that lies in the plane and goes through the centers of two opposite sides. The group is isomorphic to §4 x Z2. □ 12.F.47. In the group of symmetries of a cube, find the subgroup generated by a reflection with respect to the plane containing the centers of four parallel edges and the rotation by 180° around the axis that lies in the plane and goes through the centers of two opposite sides. Is this subgroup normal? o 12.F.48. For each of the following permutations, decide whether the subgroup it generates is normal in the corresponding group: . (1,2,3) in §3, • (1,2,3,4) in §4, • (l,2,3)inA4 This process can be repeated for the subgroup G'p and the remaining primes in the decomposition in order to complete the proof. The uniqueness claim is obvious. □ It remains to consider the p-groups only. The next claim shows that the p-groups which are not cyclic must have more than one subgroup of order p. Lemma (Claim 4). If a finite Abelian p-group G has just one subgroup H with \H\ = p, then it is cyclic. Proof. The case p = n = \G\ is obvious. Proceed by induction on n. Assume H is the only subgroup of order p and consider a : G —> G, a(g) = gv and write K = ker(tr). Then H C K and since p is prime, all elements in K have order p. For any jeJf, the cyclic group generated by g has order p and so coincides with H and consequently H = K. If G ^ K, then cr(G) is a non-trivial subgroup in G which must be isomorphic to G/K. By Claims 1 and 2, there is a subgroup in cr(G) of orderp. This yields a subgroup in G and by assumption it is again H. Finally, apply the induction hypothesis on the group cr(G) ~ G/H, which has to be cyclic. Choosing a generator g ■ H of the latter group, even in G the cyclic subgroup generated by g must have a subgroup of order p (again Claim 1). The uniqueness assumption ensures that this is again the subgroup H. Clearly, \G/H\ = \G\/p is the smallest exponent with g\G\/p e 11 ^ the same time the latter power is not equal to the unit since He (g). Consequently, the order of g in G is bigger than \G\/p and so the group G is cyclic. □ Finally, a splitting condition for the p-groups is proved, which provides the property discussed in the end of the previous paragraph on the exact sequences. This completes the entire proof of the classification theorem. Lemma (Claim 5). Let G be a finite Abelian p-group and let C be a cyclic subgroup of maximal order in G. Then G = C x L for some subgroup L. Proof. If G is cyclic, set G = C and of course G = C x L with L = {e}. Proceed by induction on n = \G\. Assume G is not cyclic. Then it contains more than one cyclic subgroup of order p. Of course the subgroup C is one such subgroup. Choose H to be another subgroup of order p which is not a subgroup in C. Since p is prime, the intersection of H and C is trivial. Consequently the quotient group (C x H)/H c G/H is isomorphic to C. Now consider the induction step. The order of the cyclic subgroup (C x H)/H in G/H must be maximal, since the orders of the elements g ■ H in the quotient group are divisors of the orders of the generators g in the group G. By the induction hypothesis, G/H = (C x H)/H x K for some subgroup K c G/H. Clearly the preimage of K under the quotient projection is a group L satisfying H C L c 888 CHAPTER 12. ALGEBRAIC STRUCTURES For the last case, find the right cosets of A4 by the considered subgroup. Find all n > 3 for which the subset of all cycles of length n together with the identity is a subgroup of §„. Show that if this is so, then it is even a normal subgroup. Solution. • It is a normal subgroup of A3. • It is not a normal subgroup ( (1,2) o (1,3) o (2,4) o (1,2) == (4,l)o(2,3)). • It is not a normal subgroup. The right cosets are {(l,2,4),(2,4,3),(l,3)o(2,4)}, {(l,4,2),(l,4,3),(l,4)o(2,3)}, {(2,3,4),(l,2)o(3,4),(l,3,4)}, {id, (1,2, 3), (1,3, 2)}. The mentioned subset is a subgroup only for n = 3. In this case, it is the alternating group A3 of all even permutations in §3, which is a normal subgroup. (For greater value of n, we can find two cycles of length n whose composition is neither a cycle of length n nor the identity.) □ 12.F.49. Find the subgroup of §6 that is generated by the permutations (1,2) o (3,4) o (5,6), (1,2,3,4), and (5, 6). Is this subgroup normal? If so, describe the set of (two-sided) cosets 5*6 /H. Solution. First of all, note that all of the generating permutations lie in the subgroup §4 x 6*2 C S§ (considering the natural inclusion of §4 x §2, i- e., for s e §4 x §2, the restriction of s to {1,2,3,4} is a permutation on this set and so is the restriction of s to {5,6}). This means that the group they generate is also a subgroup of §4 x §2- Moreover, since there is (5,6) among the generators, we can see that the subgroup is of the form H x §2, where H C §4. Thus, it suffices to describe H. This group is generated by the elements (1, 2) o (3,4) and (1,2,3,4) (projection of the generators on §4). We have (1,2,3,4)2 = (l,3)o(2,4), (1,2,3,4)3 = (4,3,2,1), (1,2,3,4)4 = id, [(l,2)o(3,4)]2 = id, [(1,2) o(3,4)] o(l,2,3,4) = (2,4), (l,2,3,4)o [(1,2) o(3,4)] = (1,3), [(1,2) o(3, 4)] o(4, 3, 2,1) = (1,3), (4, 3, 2, l)o [(1,2) o(3, 4)] = (2,4), G. Now, the latter identification of G/H with the product implies G = (C ■ H) ■ L = C ■ (H ■ L) = C ■ L. At the same time, L n (G ■ H) = H and so L n C = {e}. So G = C x L. □ The proof is complete up to the uniqueness claim. It is known already that the decomposition into p-groups is unique. Assume that a p-group G decomposes into two products of cyclic groups Hi x ... x Hk and H[ x ... x H'e with non-increasing orders of H or H'j. Then the orders of Hi and H[ coincide, since these are the maximal orders in G. By induction all the orders coincide and the work is complete. The classification theorem is a special case of a more general result on finitely generated Abelian groups. In additive notation, if gi,..., gt are generators of the entire G, then all elements of G are of the form aigi + • • • + atgt with integer coefficients a{. The general theorem provides a severe restriction for possible relations between such combinations. In fact it says that all finitely generated Abelian groups are products of cyclic groups, hence G = Ze®ZPl x... x ZPk. This means there is always a finite number of completely independent generators of G and each of them generates a cyclic subgroup in G. (Compare this to the description of finite dimensional vector spaces via their basis, as discussed in chapter 2.) 12.3.14. Group actions. Groups can be considered as sets of transformations of a fixed set. All the transformations are invertible, and the set of transformations must be closed under composition. The idea is to work with a fixed group whose ele- ments are represented as mappings on a fixed set, but the mappings corresponding to different elements of the group need not be different. For instance, the rotations around the origin over all possible angles correspond to the group of real numbers. On the other hand, the rotation by 27r is the identity as a mapping. Formally, this situation can be described as follows: 889 CHAPTER 12. ALGEBRAIC STRUCTURES [(l,2)o(3,4)]o[(l,3)o(2,4)] = (l,4)o(2,3), [(l,3)o(2,4)]o[(l,2)o(3,4)] = (l,4)o(2,3), [(1, 2) o (3,4)] o (4, 2) = (1,2, 3,4), (l,3)o(4,2) = (1,3) o (2,4). Now, we can note that the generating permutations (1,2,3,4) and (1,2) o (3,4) are symmetries of a square on vertices 1,2,3,4. Therefore, they cannot generate more than 8-element D4, which has already happened. This means that no more permutations can be obtained by further compositions. Thus, the subgroup H C §4 has 8 elements (which is possible by Lagrange's theorem, since 8 divides 24). H = { id, (1, 2, 3,4), (1, 3) o (2, 4), (4, 3, 2,1), (1, 2) o (3,4), (1,3),(2,4),(1,4) o (2,3)}. Altogether, the examined subgroup in §6 has 16 elements: for each h e H, it contains (ft., id) and (h, (56)). □ 12.F.50. Find the subgroup in §4 that is generated by the permutations (1, 2) o (3,4), (1,2, 3). Solution. Since both the generating permutations are even, they can generate only even permutations. Thus, the examined group is a subgroup of the alternating group A4 of all even permutations. We have [(1,2) °(3,4)f = id, (1,2,3)2 = (3,2,1), [(1.2) 0 (3,4)]< 3(1,2,3) = (2,4,3), (1,2,3) o[(i,2; )°(3,4)] = (1,3,4), [(1,2) 0 (3,4)]< = (3,2,1) = (3,1,4), (3,2,1) o[(i,2; )°(3,4)] = (2,3,4), and now, we already have seven elements of the examined subgroup of A4, and since A4 has 12 elements and the order of a subgroup must divide that of the group, it is clear that the subgroup is the whole A4. □ 12.F.51. Find all subgroups of the group of invertible 2-by-2 matrices over Z2 (with matrix multiplication). Is any of them normal? Solution. In exercise 12.E.1, we built the table of the operation in this group. By Lagrange's theorem (12.3.10), the order of any subgroup must divide the order of the group, which is six. Thus, besides the trivial subgroup {^4} and the entire group, each subgroup must have two or three elements. In Group actions A left action of a group G on a set S is a homomorphism of the group G to the subgroup of invertible mappings in the monoid Ss of all mappings S —> S. Such a homomorphism can be viewed as a mapping p : G x S ^ S which satisfies p(a ■ b, x) = p(a, p(b, x)), hence the name "left action". Often, the notation a-x is used to refer to the result of an a e G applied to a point x e S (although this is a different dot than the operation inside the group). Then, the definition property can be expressed as (a ■ b) ■ x = a ■ (b ■ x). The image of a point x e S in the action of the entire group G is called the orbit Sx of x, i.e. Sx = {y = p(a,x); a e G}. For each point x e S, define the isotropy subgroup Gx C G of the action ip (also called the stabilizer subgroup): Gx = {a £ G; p(a, x) = x}. If for every two points x,y e S, there is an a e G such that p(a, x) = y, then the action p is said to be transitive. Choosing any two points x,y e S and a g e G which maps x to y = g ■ x, then the set {ghg-1; h e Gx} is clearly the isotropy subgroup Gy. In addition, the mapping h i-> ghg~x is a group homomorphism Gx —> Gy. In the case of transitive actions, the entire space forms a single orbit and all isotropy subgroups have the same cardinality. As an example of a transitive action of a finite group, consider the apparent action of the symmetric group of a fixed set X on X itself. The natural action of all invertible linear transformations on the non-zero elements of a vector space V is also transitive. However, if the entire space V is selected, then the zero vector forms its own orbit. The mentioned example of the action of the additive group of real numbers that acts as rotations around a fixed center O in the plane is not transitive. The orbit of each point M is the circle which is centered at O and goes through M. A typical example of a transitive action of a group G is the natural action on the setG/H of left cosets for any subgroup H. It is denned by g-(aH) = (ga)H. This is the form of every transitive group action. For any transitive action GxS^S and a fixed point x e G, S can be identified with the set G/Gx of left cosets by gGx i-> g-x. Clearly, this mapping is surjective, and the images g-x = h-x coincide if and only if h~xg e Gx, which is equivalent to gGx = hGx. Finally, note that this identification transforms the original action of G on S just to the mentioned action of G on G/Gx. 12.3.15. Theorem. Let an action of a finite group G on a finite set S be given. Then: 890 CHAPTER 12. ALGEBRAIC STRUCTURES a 2-element subgroup, the non-trivial element must be self-inverse, which is also sufficient for the subset to be a subgroup. We thus get the subgroups {A, B}, {A, C}, {A, F}, which are not normal, as can be easily verified. The identity is A. Since B, C, F have order 2, they cannot be in a 3-element subgroup. Thus, the only remaining possibility is P = {A, D, E], which is indeed a subgroup. Moreover, checking the conjugations BDB = E, CDC = E, FDF = E (whence it follows that BEB = D, CEC = D, FEF = D), we find out that this subgroup is normal. □ 12.F.52. Find all subgroups of the group (Zio, +). Solution. The subgroups are isomorphic to (Z^, +), Where d|10,i. e., {0} =■ Zi, {0,5} ^ Z2, {0,2,4,6,8} ^ Z5, and Zio. □ 12.F.53. Find the orders of the elements 2,4,5 in (Zg5, ■) and in (Z35,+). Solution. By definition, the order of x in the group (Zg5, •) is the least positive integer k such that xk = 1 mod 35. By Euler's theorem, the order of x = 2 and x = 4 is k < i^(35) = 24. Computing the corresponding modular powers, we find out that the order of x = 2 is 12. Hence it immediately follows that the order of x = 4 is 6. The number x = 5 does not lie in the group (Zg5, ■). Specifically, we have (modulo 35): 2 4 8 16 32 29 23 11 22 9 18 4 16 29 11 9 1 In the group (Z35, +), the order of x is the least positive integer k such that k ■ x = 0 mod 35. This can be calculated simply as k = pf^y. Therefore, the order of 2 and 4 is 35, while the order of 5 is 7. □ 12.F.54. Find all finite subgroups of the group (R*, ■)1 Solution. If a given subgroup of the group (R*, ■) contains an element a, \a\ ^ 1, then the elements a, a2, a3,... form an infinite geometric progression of pairwise distinct elements all of which must lie in the considered subgroup, so it is infinite. Thus, a finite subgroup may contain only the numbers 1 and —1, which means that there are two finite subgroups: {1},{-M}. □ (1) for each point x G S, \G\ = \GX\ ■ \SX\, (2) (Burnside's lemma) if N denotes the number of orbits, then g&G where S9 = {x G S; g ■ x = x} denotes the set of fixed points of the action corresponding to an element g. Proof. Consider any point x G S and its isotropy subgroup Gx C G. The same argument as the one at the end of the previous paragraph for transitive group actions can be applied to each action of the group G. This gives the mapping G/Gx —> Sx, g ■ Gx i-> g ■ x. If g ■ x = h ■ x, then clearly g~xh G Gx, so this mapping is injective. Clearly, it is also surjective, which means that the cardinalities of the finite sets must satisfy \G/GX\ = \SX\. The first proposition follows,because \G\ = \G/GX \ ■ \GX\. The second proposition is proved by counting the cardinality of the set of fixed points of individual group elements in two different ways: F = {(x, g) G S x G; g ■ x = x} C S x G. Since these are finite sets, the elements of the Cartesian product S x G can be considered as entries of a matrix (columns are indexed by elements of S, rows are indexed by elements of G). Summing this matrix up, either by rows or by columns, yields 1*1 = £15*1 = £10*1. g Z12, p(([a]A, [b]3)) = [a - b]i2, Z4 x Z3 -> Z12, v(([a]4, [b]3)) = [6a + 4b]i2, Z4 x Z3 -> Z12, v((W4, [b]3)) = [0]i2. i) Not a mapping. For instance, if we take two representatives ([6]4, [1]3) = ([2J2, [1]3) of the same element in Z4 x Z3, then we get <£>([6]4, [1]3) = [5]i2 and p([2]4, [1]3) = [l]i2). so this is not a correct definition of a mapping. ii) A homomorphism, neither injective, nor surjective. Its kernel Kei(p) is the set {([2]4, [0]3), ([0]4, [0]3)}. iii) A homomorphism, neither injective, nor surjective. Its kernel is the entire group Z4 x Z3. ^ 12.F.56. For each of the following formulas, decide whether it correctly defines a mapping p. If so, decide whether it is a homomorphism, and if so, find its kernel. Moreover, decide whether it is surjective and injective: i) p : Z4 ii) P ■ Z5 iii) p : Z4 iv) p : Z - Solution. i) We have p ([a] 4 + [6] 4 *,p{[a]A)=i\ *,p([a]5)=i\ *>V>([a]4) = (-l)°, , p(a) = ia. ■a+b p{[a])-p{[b]), and p([4]) 1, which means that if [c]4 = [) = 4Z = {4k | k £ Z}. n 12.F.57. For each of the following formulas, decide whether it correctly defines a mapping If so, decide whether it is a homomorphism. Moreover, decide whether it is surjective and injective: Notice that coding is quite different from encrypting. If no one but the addressee is meant to be able to read the message, then it should be encrypted. This topic is discussed briefly at the end of the previous chapter. 12.4.1. Codes. Data transfer is usually prone to errors. Since any information may be encoded as a sequence of bits (zeros or ones), the work is ''j^' 'Ctr with Z2 although the theory may be developed s>*"s*-^ even for a general finite field. Furthermore, the length of the message to be transferred is assumed to be known in advance. Thus, one transfers fc-bit words, where k e N is fixed. It is desired to detect potential errors, and if possible, recover the original data. For this reason, further (n—k) bits are added to the fc-bit word, where n is also fixed (and of course n > k). These are called (n, k)-codes. There are 2k binary words of length k and each should be mapped to one of the 2n possible codewords. For an (n, k)-code, there remain 2n - 2k = 2k(2n -1) words which are not codewords (if such a word is received, then an error has occurred). Thus, even for a large value of k, only a few bits added provide much redundant information. The simplest example is the parity check code. Having a message of length k, the codeword is created by adding a bit whose value is determined so that the total number ones would be even. This is an example of a (k + 1, fc)-code. If there occur an odd number of errors during the transfer, then it can be detected with this simple code. Every two codewords differ in at least two bits, but an error word differs from at least two codewords in only one bit. Therefore, this code is unable to recover the original message, even with the assumption that only one bit was changed. The following diagram illustrates all 2-bit words with the parity bit added. The codewords are marked with a bold dot. 011^ 111 010 001 101 110 100 000 Moreover, the parity check code is unable to detect the error of interchanging a pair of adjacent bits, which often happens. 12.4.2. Word distance. In the diagram of the parity check (3,2)-code, each error word is at the "same" distance from three codewords - those which differ in exactly bits. The other words are farther. Formally, this observation can be described by the following definition of distance: 892 CHAPTER 12. ALGEBRAIC STRUCTURES i) p:Q*^ Q*, p (f) = § 11) ^ : Q* -J- Q*, (§) = £ ill) ^ : Q* -J- Q*, ^ (f) = ^±2! O 12.F.58. For each of the following formulas, decide whether it correctly defines a mapping p. If so, decide whether it is a homomorphism. Moreover, decide whether it is surjective and injective: i) R, p(a + bi) = a + b, ii) R, p(a + bi) = a, iii) C* - > R*, p(a + bi) = a2 + b2 iv) C* - > R*, p(c) = 2\c\, v) C* - > R*, p(c) = |c|3, vi) C* - > R*, p(c) = l/\c\. o 12.F.59. For each of the following formulas, decide whether it correctly defines a mapping p. If so, decide whether it is a homomorphism. Moreover, decide whether it is surjective and injective: i) p : GC2(R) ->• R*, p(A) = \A\ ii) p: GC2(R) ->R*,v>((" h^j=a2 + b2. iii) p: &C2(R) -+R*,([a]3) = (1, 2, 4) o (1, 3, 2f o (1, 4, 2) ii) p : Z3 -> A4, vj([a]3) = (1, 2) o (1, 3, 2)a O 12.F.61. For each of the following formulas, decide whether it correctly defines a mapping p. If so, decide whether it is a homomorphism. Moreover, decide whether it is surjective and injective: Word distance The Hamming distance of a pair of words (of equal length) is the number of bits in which they differ. Consider words x, y, z such that x and y differ in r bits, and y and z differ in s bits. Then, x and z differ in at most r + s, which verifies the triangle inequality for distances. If the code is to detect errors in r bits, then the minimum distance between each pair of codewords must be at least r+1. If the code is to recover errors in r bits, then there exists only one codeword whose distance from the received word is at most r. Thus, the following propositions are verified: Theorem. (1) A code reliably detects at most r errors if and only if the minimum Hamming distance of the codewords is r + 1. (2) A code reliably detects and recovers at most r errors if and only if the minimum Hamming distance of the codewords is 2r + 1. 12.4.3. Construction of polynomial codes. For practical applications, the codewords should be constructed efficiently so that they can be easily recognized among all the words. The parity check code is one example; another trivial possibility is to simply repeat the bits. For instance, the (3, l)-code can be considered which triplicates each bit. A systematic way for code construction is to use division of polynomials. Amessage 60^1 • • • H-i is understood as the polynomial m(x) = 6q + bix + ■ ■ ■ + bk-ixk~1 over the field Z2. The encoded message should be another polynomial v(x) of degree at most n — 1. Polynomial codes Letp(x) = a0 + ■ ■ ■ + an-kXn~k £ Z2[a;] be a polynomial with coefficients a0 = 1, a„_j; = 1. The polynomial code generated by a polynomial p(x) is the (n, fc)-code whose codewords are polynomial of degree less than n divisible by p(x). A message m(x) is encoded as v(x) = r(x) + xn~km(x), where r (x) is the remainder in the division of the polynomial xn~km(x) byp(x). Check the claimed properties first. By the very definition of the codeword v(x) of a message m(x): v(x) = r(x) + xn~km(x) = = r(x) + q(x)p(x) + r(x) = q(x)p(x), since the sum of two identical polynomials is always zero over Z2. Therefore, all codewords are divisible by p(x). On the other hand, if v (x) is divisible by p(x), the above calculation can be read from right to left (setting r(x) = xn~km(x) — q(x)p(x)), and so it is a codeword created by the above procedure. 893 CHAPTER 12. ALGEBRAIC STRUCTURES i) p : Z —> Z, y>(a) = 2a iii) 1,2 : Z —> Z, ii) p : Z —> Z, v(a) = 3|a cc(a) = a + 1 iv) Z, cc(a) = 1 o 12.F.62. For each of the following formulas, decide whether it correctly defines a mapping p. If so, decide whether it is a homomorphism. Moreover, decide whether it is surjective and injective: i) V ii)

((a, &, c)) = 2a3b12c ZJ x Z5 -4- Z5, p((a,b)) = ba Z2 x Z -> Z, y>(([a]2,&)) = & O 12.F.63. Prove that there exists no isomorphism of the multiplicative group of non-zero complex numbers onto the multiplicative group of non-zero real numbers. Solution. Every homomorphism must map the identity of the domain to the identity of the codomain (see 12.3.5). Thus, 1 must be mapped to itself. And what about — 1 ? We know that /(-l)2 = /((-l)2) = /(l) = 1. Therefore, the image of —1 is a square root of 1. Since we are interested in bijective homomorphisms only, we must have /(—1) = — 1. However, then f(i)2 = f(i2) = /(—1) = —1, so that f(i) is a square root of —1 in R; however, no such real number exists. Therefore, no bijective homomorphism may exist. □ Remark. The mapping which assigns the absolute value to each non-zero complex number is a surjective homomorphism of C* ontoR+. G. Burnside's lemma 12.G.1. How many necklaces can be created from 3 black and 6 white beads? Beads of one color are indistinguishable, and two necklaces are considered the same if they can be transformed to each other by rotation and/or reflection. Solution. Let us assume the necklace as coloring of the vertices of a regular 9-gon. Let S denote the set of all such colorings. Since each coloring is determined by the positions of the 3 black beads, we get that S has (9) = 84 elements. We know that the group of symmetries is Dg, which contains 9 rotations (including the identity) and 9 reflections. Two colorings are the same if they lie in the same orbit of From the definition, the codeword is created by adding n — k bits given by r(x) at the beginning of the word (and simply shifting the message to the right by that). It follows that the original message is contained in the polynomial v(x) and the decoding is easy. Consider the two simple examples, already mentioned. First, note thatp(a;) = l+a;dividesw(a;)ifandonlyifw(l) = 0. This occurs if and only if v (x) has an even number of nonzero coefficients. So the polynomial p(x) = 1 + x generates the parity check (n,n — l)-code for any n > 2. Similarly, it is easily verified that the polynomial p{x) l + x + - + xn generates the (n, l)-code of n-fold bit repetition. Dividing the polynomial box71-1 by p(x), gives the remainder &o(l + ), so the corresponding codeword is b0p(x). 12.4.4. Error detection. Let e(x) denote the error vector. This is, the difference between the original message v e (Z2)n and the received data u: u(x) = v(x) + e(x). The error is detected if and only if the generator of the code (i.e. the polynomial p(x)) does not divide e(x). Therefore, polynomials over Z2 [x] which do not happen to be divisors too often are of interest. Definition. An irreducible polynomial p(x) e Z2[x] of degree m is said to be primitive if and only if p(x) divides (1 + xk) for k = 2m — 1, but not for any smaller value of k. Theorem. Let p(x) be a primitive polynomial of degree m and n < 2m — 1. Then the polynomial (n,n — m)-code generated by p(x) detects all simple and double errors. Proof. If exactly one error occurs, then e(x) = x1 for some i, 0 < i < n. Since p(x) is irreducible, it cannot have a root in Z2. In particular, it cannot divide x1 since the factorization of x1 is unique. It follows that every simple error is detected. If exactly two errors occur, then e(x) = xi + xj = xi (l + xj~{) for some 0 < i < j' < n. p(x) does not divide any x1, and since it is primitive, it does not divide 1 + x^~l either, provided j—i < 2m — \. Atthe same time,p(a;) is irreducible, which means that it does not divide the product e(x) = x1 (1+ xi~l), which completes the proof. □ 12.4.5. Corollary. Let q(x) be a primitive polynomial of degree m andn < 2m —1. Then the polynomial (n, n—m—1')-code generated by the polynomial p(x) = q(x)(l+x) detects all double errors as well as all errors with an odd number of bit flips. Proof. The codewords generated by the chosen polynomial p(x) are divisible by both x + 1 and the primitive polynomial q(x). As verified, the factor x + 1 is responsible for 894 CHAPTER 12. ALGEBRAIC STRUCTURES the action of Dg on the set S. Thus, we are interested in the number of orbits (let us denote it N). In order to find N, it suffices to compute the sizes of Sg for all elements g of Dg: The identity is the only element of order 1, we havelS'idl = 84, so the contribution to the sum is 84. There are 9 reflections g, each of order 2. Clearly, we have \ S„ 4, so the total contribution is 4 ■ 9 = 36. There are 2 rotations g by 27r/3 or 47r/3, both of order 3, and Sg\ =3. Their contribution is 6. Finally, there are 6 rotations of order 9, and no coloring is kept unchanged by them, so they do not contribute to the " sum. Altogether, we get by the formula of Burnside's lemma that parity checking, i.e. all codewords have an even number of non-zero bits. This detects any odd number of errors. By the above theorem, the second factor is able to detect all double errors. □ The following table illustrates the power of the above theorems for several primitive polynomials of low degrees. For instance, the last row says that by adding only 11 redundant bits to a message of length 1012, and employing the polynomial (x+l)p(x), all single, double, triple, and odd-numbered errors in the transfer can be detected. These are already quite large numbers, with over 300 decimal digits. primitive polynomial p(x) redundant bits codeword length ^ geDa Draw the seven necklaces! □ 12.G.2. Find the number of colorings of a 3-by-3 grid with three colors. Two colorings are considered the same they can be transformed to each other by rotation and/or reflection. Solution. The group of symmetries of the table is the same as for a square, i. e., it is the dihedral group D4. Without any identification, there are clearly 39 colorings of the table. Now, the group G = D4 acts on these colorings. For each symmetry g e G, we find the number of colorings that g keeps unchanged: . 5 = Id: \Sg\ = 39. • g is a rotation by 90° or 270° (= -90°): In this rotation, every corner tile is sent to an adjacent corner tile. This means that if the coloring is to be unchanged, all the corner tiles must be of the same color. Similarly, all the edge tiles must be of the same color. Then, the center tile may be of any color. Altogether, we get that there are 33 which are not changed by the considered rotations. • g is a rotation by 180°: There are four pairs of tiles that are sent to each other by this symmetry, which means that the two tiles of each pair must be of the same color. Then, the center tile may be of any color. Altogether, we have \Sg\ = 35. • g is one of the four reflections: There are three pairs of tiles that are sent to each other by the reflection, so again the tiles within each pair must be of one color. The three tiles that are fixed by the reflection may be each of an arbitrary color. Altogether, we get \Sg\ = 36. 1 + x 1 1 1 + x + x2 2 3 1 + X + X3 3 7 1 + x + x4 4 15 1 + x2 +x5 5 31 1 + x + x6 6 63 1 + x3 +x7 7 127 1 + X2 + X3 + X4 + X8 8 255 l + x4+x9 9 511 1+X3 + X10 10 1023 Note that quite strong results on the divisibility are used for the decomposition of polynomials derived in the second part of this chapter. But tools which would assist in constructing primitive polynomials are not mentioned. Such tools come from the theory of finite fields. The name "primitive" reflects the connection to the primitive elements in the Galois fields G(2m). This theory also provides a convenient way of applying the Euclidean division, that is, of verifying whether or not the received word is a codeword, using the delayed registers. This is a simple circuit with as many elements as is the degree of the polynomial.12 12.4.6. Linear codes. Polynomial codes can also be described using elementary matrix calculus. Recall that when working over the field Z2, caution is required when applying the results of elemen--fe"*^*-^— tary linear algebra, since then the property that v = — v implies v = 0 is often used. This is not the case now. However, the basic definition of vector spaces, existence of bases and descriptions of linear mappings by matrices are still valid. It is useful to recall the general theory and its applicability. Start with a more general definition of codes, which only requires linear dependency of the codeword on the original message: 12More about the beautiful theory and its connection with codes can be found in the book Gilbert, W., Nicholson, K., Modern Algebra and its applications, John Wiley & Sons, 2nd edition, 2003, 330+xvii pp., ISBN 0-471-41451-4. 895 CHAPTER 12. ALGEBRAIC STRUCTURES By Burnside's lemma, the wanted number of colorings is equal to g (39 + 2 ■ 33 + 35 + 4 ■ 36) = 2862. □ 12.G.3. a) Find all rotational symmetries of a regular octahedron. b) Find the number of colorings of its sides. Two colorings are considered the same they can be transformed to each other by rotation. Solution. a) Placing the octahedron into the Cartesian coordinate system so that pairs of adjacent vertices are on the axes and the center of the octahedron lies in the origin, then every rotational symmetry is given by which of the six vertices is on the positive x-semiaxis and which of the four adjacent vertices is on the positive y-semiaxis. Thus, the group has 24 elements. These are (besides the identity) rotations by ±90° and 180° around axes going through opposite vertices, rotations by 180° around axes going through the centers of opposite edges, and finally rotations around ±120° around axes going through the centers of opposite sides. b) Without any identifications, there are 38 colorings. For each rotational symmetry g, we compute the number of colorings that are kept unchanged by it: - g is a rotation by ±90° around an axis going through opposite vertices. Then, g fixes 32 colorings, and there are 6 such rotations. - g is a rotation by 180° around an axis going through opposite vertices or the centers of opposite edges. Then, g fixes 34 colorings. There are 3 + 6 = 9 of these. - g is a rotation by ±120°. Then, g also fixes 34 colorings, and there are 8 such rotations. Together with 38 for the identity, we get that the number of colorings is A (38 + 6 ■ 32 + 17 ■ 34) = 333. □ 12.G.4. How many necklaces can be created from 9 white, 6 red, and 3 black beads? Beads of one color are indistinguishable, and two necklaces are considered the same if they can be transformed to each other by rotation and/or reflection. Linear codes Any injective linear mapping g : (Z2)k —> (Z2)n is a linear code. The fc-by-n matrix G that corresponds to this mapping (in the canonical bases) is called the generating matrix of the code. For each message v, the corresponding codeword is given by u = G ■ v. Theorem. Every polynomial (n, k)-code is a linear code. Proof. Use elementary properties of Euclidean division. Apply the assignment of the polynomial v(x) = r(x) + xn~km(x) determined by the original message m(x) to the sum of two messages m(x) = m1(x) + m2(x). The remainder in the division xn~k(mi (x) + m2(x)) is, by uniqueness, given as the sum r1 (x) + r2(x) of the remainders for the individual messages. It follows that v(x) = n(x) + r2{x) + /"^[(i) + m2{x)), which is the desired additivity. Since the only non-zero scalar in Z2 is 1, the linearity of the mapping of the message m(x) to the longer codeword v(x) is proved. Moreover, this mapping is clearly injective, since the original message m(x) is simply copied beyond the redundant bits. □ For instance, consider the (6,3)-code generated by the polynomial p(x) = 1 + x + x3, for encoding 3-bit words. Evaluate it on the individual basis vectors mi(x) = x1-1, i = 1,2,3, to get v0 = (1 + x) + X3, vi = (x + x2) + xA, v2 = (1 + x + x2) + x5. It follows that the generating matrix of this (6,3)-code is 0 1\ i 1 1 0 1 1 i 0 0 0 1 0 0 V Polynomial codes always copy the original message beyond the redundant bits. So the generating matrix can be split into two blocks P and Q consisting respectively of n — k and k rows. Then Q equals the identity matrix \2A.1. Theorem. Let g : (Z2)k -> (Z2)n be a linear code with generating matrix G, written in blocks as Then, the mapping h : (Z2)n —> (Z2)n k with the matrix H = P) has the following properties: (1) Ker h = Im g, 896 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. The group of symmetries of the necklace is the dihedral group Dls, which has 36 elements. It acts on the set of necklaces, where we can number each place (1 through 18), resulting in 18!/(9!6!3!) = 4084080 necklaces (without any identification). The only symmetries that fix a non-zero number of necklaces are rotations by 120° and 240°, reflections, and of course the identity. By Burnside's lemma, the wanted number of necklaces is equal to 1 (4084080 + 2- qq+9.q (j 113590. □ 12.G.5. How many necklaces can be created from 6 white, 6 red, and 6 black beads? Beads of one color are indistinguishable, and two necklaces are considered the same if they can be transformed to each other by rotation and/or reflection. o 12.G.6. How many necklaces can be created from 8 white, 8 red, and 8 black beads? Beads of one color are indistinguishable, and two necklaces are considered the same if they can be transformed to each other by rotation and/or reflection. o 12.G.7. How many necklaces can be created from 3 white and 6 black beads? Beads of one color are indistinguishable, and two necklaces are considered the same if they can be transformed to each other by rotation and/or reflection. O H. Codes 12.H.1. Consider the (5,3)-code over Z2 generated by the polynomial x2 + x + 1. Find all codewords as well as the generating matrix and the check matrix. H G P + P = 0. (2) a received word u is a codeword if and only if H ■ u = 0. Proof. The composition hog: (Z2)k -> [Z2)n~k is given by the product of matrices (computing over Z2): Hence it is proved that Im g C Ker h. Since the first n — k columns of H are basis vectors of (Z2)n~k, the image Im h has the maximum dimension n — k, which means that this image contains 2n~k vectors. Vector spaces over Z2 are finite commutative groups, so the formula relating the orders of subgroups and quotient groups from subsection 12.3.10 can be used, thus obtaining \Kerh\ ■ \ Imh\ = \{Z2)n\ = 2n. Therefore, the number of vectors in Ker h is equal to 2n ■ 2k~n = 2k. In order to complete the proof of the first proposition, it suffices to note that the image Img also has 2k elements. The second proposition is a trivial corollary of the first one. □ The matrix H from the theorem is called the parity check matrix of the corresponding linear (n, fc)-code. For instance, the matrix H = (1 1 1) is the parity check matrix for the parity check (3, 2)-code, encoding 2-bit words. It is easily obtained from the matrix G= 1 0 \0 lj that generates this code. For the (6,3)-code mentioned above, the parity check matrix is /l 0 0 1 0 1\ H = 0 1 0 1 1 1 . \0 0 1 0 1 1/ 12.4.8. Error-correcting codes. As seen, transferring a message u gives the result v = u + e. Over Z2, this is equivalent to e = u + v. It follows that if the error e to be detected fixed, all the received words determined by the correct codewords u fill one of the cosets in the quotient space (Z2)n /V, where V is the vector subspace V C (Z2)n of the codewords. The mapping h : (Z2)n —> (Z2)n~k corresponding to the parity check matrix has V as its kernel. Therefore, it induces the injective linear mapping h : (Z2)n/V —> (Z2)n~k. Clearly, the value h(v + V) on the coset generated by v is determined uniquely by the value H ■ v. Syndromes The expression H ■ v, where H is the parity check matrix for the considered linear code, is called the syndrome of v. 897 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. p(a + x + 1. The code words are precisely the multiples of the generating polynomial: 0-p, l-p,x-p, (x+l)-p,x2-p, (x2+l)-p, (x2+x)-p, (x2+x+l)-p or 0, xz + x + 1, xä + xz + x, xä + 1, x4 + xä + X „2 + 1 or 00000,11100, OHIO, 10010,00111,11011,01001,10101. The basis vectors multiplied by x5 3 x2 = x + 1, a;3 = x.x2 = x(x + 1) = x2 + x This means that the basis vectors are encoded as follows: 1 i-> xz +x + 1, x i-> a;3 + 1, x2 H> xA + a;, Thus, the generating matrix is G = and the check matrix is 100 m-11100, 010 m-10010, 001 m- 01001. The following claim is a direct corollary of the construction and the above observations: Theorem. Two words are in the same class u + V if and only if they have the same syndromes. It follows that self-correcting codes can be constructed 9 A o by choosing, for every syndrome, the element of the corre-x -\- x -\- x -\-1 sponding coset which is most likely to be the sent codeword. Naturally, when choosing the code, it is desirable to maximize the probability that it can correct single errors (and possibly even more errors). Try it on the example of the (6,3)-code for which the matrices G and H are already computed. Build the table of all syndromes and the corresponding words. The syndrome 000 is possessed exactly by the codewords. All words with a given different syndrome are obtained by choosing one of them and adding all the proper codewords. The following two tables display the syndromes in the first rows; the second rows then display the vector which has the least number of ones among the vectors of the corresponding coset. In almost all cases, this is just one value one there; only in the last column, there are two ones, and the element is chosen where the ones are adjacent (because, for instance, multiple errors are more likely to be adjacent). ; yield mod (p): 1 l 0 1 l 0 0 0 1 0 \o 0 V H = 10 110 0 110 1 □ 12.H.2. Consider the (5,3)-code over Z2 generated by the polynomial x2 + x + 1. Find the generating matrix and the check matrix of the (7,4)-code over Z2 generated by the polynomial x3 + x + 1. O 12.H.3. A 7-bit message a0 ai... aG, considered as the polynomial ag+a^-l-----ha6a;6,is encoded using the polynomial code generated by x4 + x + 1. i) Encode the message 1100011. ii) You have received the word 10111010001. What was the message if you assume that at most one bit was nipped? iii) What was the message in ii) if you assume that exactly two bits were nipped? 000 100 010 001 000000 100000 010000 001000 110100 010100 100100 111100 011010 111010 001010 010010 111001 011001 101001 110001 101110 001110 111110 100110 001101 101101 011101 000101 100011 000011 110011 101011 010111 110111 000111 011111 110 011 111 101 000100 000010 000001 000110 110000 110110 110101 110010 011110 011000 011011 011100 111101 111011 111000 111111 101010 101100 101111 101000 001001 001111 001100 001011 100111 100001 100010 100101 010011 010101 010110 010001 All the columns in the tables are affine subspaces whose modelling vector spaces are always the first column of the first table. This is because the code is linear, so that the set of all codewords forms a vector space, and the individual cosets of the quotient space are consequently affine subspaces. In particular, the difference of each pair of words in the same column is a codeword. The words in bold are the leading representatives of the coset (affine space) that correspond to the given syndromes. These are the words with the least number of ones in their column. They correspond to the least number of bit flips which must be made to any word in the given column in order to get a codeword. 898 CHAPTER 12. ALGEBRAIC STRUCTURES Solution, i) x „10 = x + 1, = X° + X, = x2 + x + l, For instance, if a word 111101 is received, compute that its syndrome is 110. The leading representative of the coset of this syndrome is 000100. Subtract it from the received word, to obtain the codeword 111001. This is the codeword with the least Hamming distance from the received word. So the original message is most likely to be 001. whence 1 + X + x5 + X6 l-> x4 + x5 + x9 + x10 + X + 1 + X2 + x+ x3 + x + x2 + X + 1 = = x3 + x4 + x5 + x9 + x10 Thus, the code is 00011100011. ii) 1 + x2 + x3 + x4 + x6 + x10 divide by x4 + x + 1 gives remainder x2 + 1 = xs. Thus, the ninth bit was flipped and the original message was 1010101. iil) Either the first and third bits were flipped (x2 +1), or 5. Systems of polynomial equations In practical problems, objects or actions are often en-countered which are described in terms of polynomials or systems of polynomial equations. ' ^>y£ / For instance, the set of points in R3 denned by two equations x2 + y2 the fifth and sixth were (x4 + x5 x2 + 1). In the first case, the message was 1010001, while in the second case, it was 0110001. □ 12.H.4. A 7-bit message a0 ai... aG, considered as the polynomial a0+ai aH-----h a6 a;6, is encoded using the polynomial code generated by x4 + x3 + 1. i) Encode the message 1101011. ii) You have received the word 01001011101. What was the message if you assume that at most one bit was flipped? iii) What was the message in ii) if you assume that exactly two bits were flipped? Solution, i) x4 = X3 + 1, X5 = X3 + X+1, xr = x2 + X+1, x9 = x2 + 1, x10 = X3 + X, thus we get 1 + X + x3 + x5 + x6 M- x4 + x5 + xr + x9 + x10 + x° + + 1+a;3+ 2 + 1 +a;2+ 2 + 1 +a;2+ 1+a;3+ 2;: X3 + x4 + x5 + x7 + x9 + x10 + x3 + x. Therefore, the code is 0101 1101011. redundancy message •m,:^ t„,„ «™otj™« ^ _i_ _ i = o and z = 0 is the circle which is centered at (0,0,0), has radius 1 and lies in the plane xy. Similarly, equations xz = 0 and yz = 0 considered in R3 define the union of the line x = 0, y = 0 and the plane 2 = 0. Notice we have to specify the space carefully, since x2 +y 2 = 0 defines a circle in R2, but it is a cylinder if viewed inR3. Deciding whether or not a given point lies within a given body, finding extrema of algebraically denned subsets of multidimensional spaces, analyzing movements of parts of some machine, etc. are examples of such problems. 12.5.1. Afline varieties. For the sake of simplicity (existence of roots of polynomials), the work is mainly with the field of complex numbers. Some ideas are extended to the case of a general field K. An affine n-dimensional space over a field K is understood to be K" = K x ■ — x K with the standard affine struc- n ture (see the beginning of Chapter 4). As seen, a polynomial / = J2a aaXa £ K[x1,..., xn] can be viewed naturally as a mapping /: K" —> K, denned by := £aQuQ, where ua = u"1 In dimension n = 1, the equality j(x) = 0 describes only finitely many points of K. In higher dimension, the similar equality f(x1,..., xn) = 0 describes subsets similar to curves in the plane or surfaces in the space. However, they may be of quite complicated and self-intersecting shapes. For instance, the set given by the equation (a;2 + y2)3 — 4a;2y2 = 0 look as a quatrefoil (see the illustration in the beginning of the part 1 in the other column). Another illustration of a two-dimensional surface is given by Whitney's umbrella x2 — y2z = 0, which, besides the part shown in the diagram, also includes the line {x = Q,y = 0}. 899 CHAPTER 12. ALGEBRAIC STRUCTURES ii) x + x4 + x6 + x7 + xs + x10 divided by x4 + x3 + 1 gives remainder x2 + x + 1 = xr. Thus, the eighth bit was nipped, and the original message was 1010101. iil) Either the second and tenth bit were nipped (x+x9 = a;2+a;+l), orthe fourth and seventh (a;3+a;6 = x2+x+l),or the fifth and ninth (x4 +x8 = x2+x+l). The respective messages are 00001011111,01011010101, and 01000011001. □ 12.H.5. Consider the (15, ll)-code generated by the polynomial 1 + a;3 + x4. We have received the word 011101110111001. Find the original 11-bit message, provided exactly one bit was flipped. Solution. The word is a codeword if and only if it is divisible by the generating polynomial 1+x3 +x4. The received word corresponds to the polynomial x + x2 + x3 + x5 + x6 + x7 + x9 + x10 + x11 + x14. When divided by 1 + a;3 + x4, it leaves remainder x + 1. This means that an error has occurred. If we assume that only one bit was flipped, then there must be a power of x which is equal to this remainder modulo 1 + x3 + x4. Thus, we compute x4 = x3 + 1, x5 = x3 + x + 1,..., a;12 = x + 1 and find out that the thirteenth it was flipped, and the original message was 01110111101. Let us look at the exercise more thoroughly. Computing all powers of a;, we obtain x4 = a;3 + l, x5 = x3 + X + 1, x6 = x3 +x2 + X + 1, x7 = x2 + X + 1, xs = x3 + x2 + X, x9 = x2 + l, x10 = x3 + X, x11 = x3 +x2 + 1, x12 = x + l, x13 = x2 + X, x14 = x3 +x2, The diagram was drawn using the parametric description x = uv, y = v, z = u2, whence the implicit description a;2 — y2z = 0 is easily guessed. In the following illustration, there is the Enneper surface with parametrization x = 3u+3uv2—u3,y = 3v+3u2v— v3, z = Zu2 - 3v2. It is hard to imagine how to obtain the implicit descrip-^gs ^ tion from this parametrization in hand. Never-•si bt gK}' i where only finite sums are considered. (Check that this is the intersection of all ideals containing the set of the generators!) The set of generators may be infinite, too. If there are only finitely many generators, the ideal is said to be finitely generated. It is easy to verify that each variety defines an ideal in the ring of polynomials in the following way: The ideal of a variety For a variety V = ,... ,/<,), set 3(V) := {/€%.....!„]; f(au ...,an) = 0,V (ai,. .., a„) g V}. Lemma. Let fu ... ,fs,gi, ■■■ ,gt £ K[xu ... ,xn] be polynomials. Then: (1) if(h,---Js) = (gi,...,gt), then V = %(&,..., fa)=*0(gi,...,gt); (2) 3(V) is an ideal, and (fi,..., fs) C 3(V). Proof. If a point a = (ai,..., a„) lies in a variety • • •, fs), then any polynomial of the form / = fti/i + --- + hsfs (i.e. any member of the ideal I = (/i,..., fs)) takes zero at a. In particular, this means that all the polynomials gi take zero at a. Hence ■■■,/.) C%(gi,...,gt). The other inclusion can be proved similarly. 903 CHAPTER 12. ALGEBRAIC STRUCTURES 12.H.9. Consider the linear (7,4)-code (i. e., the message has length 4) over Z2 denned by the matrix A 1 0 1\ 0 0 1 1 i 0 1 0 i 0 0 0 0 1 0 0 0 0 1 0 V> 0 0 V Decode the received word 1101001 (i. e., find the sent message) assuming that the least number of errors occurred. Solution. Syndrome 101, leading representative 0001000, sent message (110)0001 □ 12.H.10. Consider the linear (7,4)-code (i. e., the message has length 4) over Z2 defined by the matrix A 0 1 1\ 0 1 0 1 0 1 1 0 i 0 0 0 0 1 0 0 0 0 1 0 0 0 V Decode the received word 0000011 (i. e., find the sent message) assuming that the least number of errors occurred. Solution. Syndrome 011, leading representative 0000100, sent message (000)0111. □ 12.H.11. Consider the linear (7,4)-code (i. e., the message has length 4) over Z2 defined by the matrix 1 1 1\ 1 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 V> 0 0 V Decode the received word 0001100 (i. e., find the sent message) assuming that the least number of errors occurred. Solution. Syndrome 110, leading representative 0000010, sent message (000) 1110. □ 12.H.12. We want to transfer one of four possible messages with a binary code which should be able to correct all 'Af/Vc single errors. What is the minimum possible length IP of the codewords (all codewords have to be of the same length)? Why? In order to verify the second proposition, choose g, g' e 3(v), h e K[x1, ..., xn]. Then, for any point a e V that (gh)(a) = 0=>ghe3(V), Hence 3(V) is an ideal in K[x1,..., xn]. For any polynomial / = /ii/i + ■■■ + hsfs e (/i,..., fs) and a point a e V, f(a) = 0, which proves the desired inclusion. □ The simplest examples are trivial varieties - a single point and the entire affine space: 3({(0,0,...,0)}) = {xu...,xn), 3(Kn) = {0} for any infinite field k. The other inclusion of the second part of the lemma does not hold in general. For instance, the variety %}(x2, y2) contains only the single point - (0,0). This means that 3(V) = (x,y) D (x2,y2). If V, W C k" are varieties, then V C W => 3(V) D 3(W). In other words, a polynomial which takes zero at each point of a given variety clearly takes zero at each point of any of the variety's subsets. Now, further natural problems can be formulated: Q6. Is every ideal I G k[xi,..., xn] finitely generated? Q7. Is there an algorithm which decides whether f e 7 Q8. What is the precise relation between (/i,..., fs) and a(2J(/i,...,/.))? 12.5.4. Dimension 1. Consider univariate polynomials first / = aoxn + a\xn~x + ■ ■ ■ + an, where ao ^ 0. The leading term of a polynomial is defined to be LT(f) := a0xn. Clearly, deg/ 5. Thus, the codewords must be at least 5 bits long. And indeed, there are four codewords of length 5 with minimum Hamming distance 3, for instance 00111, 01001, 10100,11010. □ 12.H.13. We want to transfer 4-bit messages with a binary \\N code which should be able to correct all single and fr double errors. What is the minimum possible length of the codewords (all codewords have to be of the same length)? Why? Solution. We proceed similarly as in the above exercise. If the code is to correct double errors as well, then the minimum Hamming distance of any two codewords must be at least three. This means that if we take two different codewords and flip up to two bits in each, the resulting words must also be different. Denoting by n the length of the words, we get the inequality 24(l + 7i+(;))<2". The least value of n for which it is satisfied is n = 12, so the codewords must be at least 12 bits long. □ I. Extension of the stereographic projection Let us try to extend the definition of the stereographic projection so that a circle would be parametrized by points of Pi(R). Let us look at the corresponding mapping Pi(R) —> P2(R2). The points in projective extensions will be denned in the so-called homogeneous coordinates, which are given up to a multiple. For instance, the points in P2 (R) will be (x :y : z). A circle in the plane z = 1 is given as the intersection of the cone of directions denned by x2 + y2 — z2 = 0 with this plane. The inversion of the stereographic projection (i. Proof. Consider an ideal I C K[x]. HI = {0}, then the ideal is generated by the zero polynomial. If I contains a non-zero polynomial /, then choose any with the lowest degree. Clearly (/} C I. For any polynomial g e I, consider the Euclidean division of g by /, i.e. g = qf + r. Clearly, qf G 7, which means that r £ las well. However, the degree of / is as small as possible, so r = 0. Therefore, g is a multiple of /, and/= 2, then GCD(h, ...,/.):= GCD{h,GCD(h, ...,/.)). Lemma. Let f\,..., fs be polynomials. Then, {GCD(f1,...,fs)) = {fi,...,fs). Proof. GCD(f±fs) divides all the polynomials f. Hence the principal ideal (GCD(fi,..., fs)) contains the ideal (fi, ■ ■ ■, fs). The other inclusion follows immediately from Bezout's identity. □ Earlier, eight questions were formulated. Here are some answers for dimension 1: . Since 9J(/i,. ..,/,)= V3(GCD(fu /,)), the problem of emptiness of a given variety reduces to the problem of existence of a root of a single polynomial. • For the same reason, each variety is a finite set of isolated points - the roots of the polynomial GCD(fi,..., fs), except for the case when GCD(f±,..., fs) = 0. This can happen only if /1 = f2 = • • • = fs = 0, and then the variety is the entire K. 905 CHAPTER 12. ALGEBRAIC STRUCTURES e., our parametrization of the circle) can be described as: For t ^ 0, we have (t : 1) = (2t2 : 2t), and the original stenographic projection (i. e., the inverse of the above mapping) can be written linearly as (x : y : z) H> (y + z : x), which extends our parametrization to the improper point (0 : 1) m- (0 : 1 : 1). Then, the mapping of Pi (R) onto the circle has the "linear" form Pi(R) 3 (x : y) ^ (2x : x - y : x + y) G P2(R). Now, let us look at how simple it is to calculate the formula for the stereographic projection in the projective extensions directly (see 4.3.1): We include PX(R) as points with homogeneous coordinates (t : 0 : 1), and among the linear combinations of point (0 : 1 : —1) (i. e., the pole from which we project) and (x : y : z) (a general point of the circle), we must find the one whose coordinates are (u : 0 : v). The only possibility is the point (x : 0 : z + y), which is our previous formula. J. Elliptic curves A singular point of a hypersurface in Pn, denned by a homogeneous polynomial F(x0,x1,... ,xn) = 0, is such a point which satisfies = 0 for i = 1,..., n. From the geometric point of view, "something weird" happens at the point. In the case of a curve in the projective space P2(R), the condition that the partial derivatives must be zero means that there is no tangent line to the curve at the given point. This means that the curve has the so-called cusp there or it intersects itself. A "nice" singularity can be seen in the "quatrefoil", i. e., the variety given by the zero points of the polynomial (x2 + y2)3 — Ax2y2 v R2: • The concept of dimension is not of much interest in this case; each variety has dimension zero, being a discrete set of points. • Each ideal can be generated by a single polynomial. • /e (h,...,/.) ^ GCD(h,...Js)\f. • Denoting (/} := 3(2J(/i,...,/,)), then / and GCD(fifs) may differ only in root multiplicities. 12.5.5. Monomial ordering. In order to generalize the Eu-W clidean division of polynomials for more vari-ables, one must first find an appropriate analogy u^V*sr:^ for the degree of a polynomial and its leading term. The Euclidean division of a polynomial / G K[x1,...,xn] by polynomials g±,...,gs is to be an expression of the form / = aiSi H-----h asgs + r where no term of the remainder r is divisible by the leading term of any of the polynomials gi. Try this with / = x2y + xy2 + y2, gi = xy — 1, and §2 = y2 — 1. The first division yields / = (x + y) ■ gi + (x + y2 + y). LT(y2 — 1) does not divide x (the leading term of the remainder), so, theoretically, continuation is not possible. However, x can be moved into the remainder, thus obtaining the result / = (x + y)-gi+g2 + (x + y + l). No term of the remainder is divisible by either LT(g±) or LT(g2). How are the leading terms determined? Monomial ordering A monomial ordering on K[xi,..., xn] is a well-ordering (every non-empty subset has a least element) < on Nn which satisfies Vq, /3,7 G Zn : a < /3 =>■ a + 7 < /3 + 7. An ordering on Nn induces an ordering on monomials as soon as the order of the variables x\ < x2 < ... xn is fixed. Each polynomial can be rearranged as a decreasing sequence of monomials (ignoring the coefficients for now). The following three definitions introduce the most common monomial orderings. Each ordering assumes that the order of the variables is fixed, usually x1 > x2 > ■ ■ ■ > xn. Definition. Let a, (3 G Nn. The lexicographic ordering iex P the left-most non-zero term of a — (3 is positive. The graded lexicographic ordering griex P <^ H > \P\ or |a| = \P\ and a >iex j3. 906 CHAPTER 12. ALGEBRAIC STRUCTURES The cusp can be found on the curve in R2 given by a;3 — y2 = o. An elliptic curve C is the set of points in K2, where K is a given field, which satisfy an equation of the form y2 = a;3 + ax + b, where a, b G K. In addition, we require that there are no singularities, which means, over the field of real numbers, that A = -16(4a3 + 27&2) ^ 0. This expression A is called the discriminant of the equation. Note that the right-hand side contains a cubic polynomial without the quadratic term. This form of the equation is called the Weierstrass equation of an elliptic curve. The graded reverse lexicographic ordering grevlex P <^ H > \P\ ox a | = |/3| and the right-most nonzero term of a — j3 is negative. If x > y > z, then x >greviex y >greviex z, but x2yz2 >griex a;y3z, yet x2yz2 iex, >griex, >greviex are monomial or-derings! 12.5.6. Multivariate division with remainder. Consider a non-zero polynomial / = £QGNn aaxa in K[xi,... ,xn] and < a monomial ordering. Then define the degree, leading coefficient, leading monomial, and leading term of / as follows: • multideg/ := max{a G W1, aa 0}, • LC f := amuitideg/, • LMj:= a;™ltideg/; • LTf := LC/■ LM f. Of course, these concepts depend on the underlying monomial ordering. Lemma. Let f, g G K[a;i,..., xn] and < be a monomial ordering. Then, (1) multideg(/- g) = multideg / + multideg g, (2) f + g^0 multideg(/ + g) < max{multideg /, multideg g}. Proof. Both claims are straightforward corollaries of the definitions. □ Theorem. Let < be a monomial ordering and F = (/i,...,/s) be an s-tuple of polynomials in K[xi,..., xn\. Then, every polynomial f G K[a;i,..., xn] can be expressed as f = ai/i H-----\-asfs +r, where a„r £ K[a;i,..., xn] for alii = 1,2,... ,s. Moreover, either r = 0 or r is a linear combination of monomials none of which is divisible by any ofLT fi,..., LT fs, and if aifi 7^ 0, then multideg / > multideg aifi for each i. The polynomial r is called the remainder of the multivariate division f/F. Proof. The theorem says nothing about uniqueness of the result. The following algorithm produces a possible solution and thus proves the theorem. In the sequel, consider the output of this algorithm to be the result of the division. (1) ai := 0,. .. ,as := 0,r := 0,p := f (2) while p=£0 (a) i := 1 (b) d :=false (c) while i < s A not d 907 CHAPTER 12. ALGEBRAIC STRUCTURES 12.J.1. Prove that the curve y2 = a;3 + ax + b in R2 has a singularity if and only if 4a3 — 27&2 = 0. Solution. The equation of the curve in homogeneous coordinates (see 4.3.1) is F(x, y, z) = 0, where (1) F(x, y, z) = y2z — x3 — axz2 — bz3. We have 9F „2 2 —— = —3a; — az , ox OF 9F 2 o of, 2 —— = y — 2aa;z — ibz . az Let [x, y, z] be a singular point of the given curve. If 2 = 0, then since the partial derivatives of F with respect to x and z must be zero, we get x = 0 and y = 0, respectively. However, this is "out", because the point [0,0,0] does not lie in the considered projective space P2 (R). Thus, the singular points has z 0, so that ^ = 0 implies y = 0. Denoting 7 = §, then —3a;2 — az2 = 0 implies 3^ = —a, and y2 — 2axz — 3bz2 = 0 implies 2aj = — 3b. We can see that the equality a = 0 implies that & = 0, i. e., the equality 4a3 = 27&2 is satisfied trivially. If a ^ 0, then we can express 7 from the two obtained equations. From the first one, we have 7 = —and from the second one, -y2 = — |. Altogether, f = -^£-4fl3 + m2 = o. Thus, we have proved one of the implications. On the other hand, if 4a3+27&2 = 0, then defining 7 = — |^, we can see that the point [7,0,1] satisfies the equation of the elliptic curve: 3b" "2a, b 3b 2 ~ T Thanks to the choice of 7, all the three partial derivatives of F at the point [7 0,1] are zero. □ In order to define a group operation on the points of an elliptic curve, it is useful to consider the curve in the projective extension of the plane (see 4.3.1), and we define a point OeCas the direction (0,1) (which is the point [0,1,0] in the homogeneous coordinates). Then, the addition of two points A, B £ C is geometrically defined as the point —C, where C is the third intersection point of the line AB with the elliptic T2 + aj + b a -3]+a -IT \+b- 2a , 0. (i) if LT ji\LTp di : = ai +LTp/LT fi p:=p-{LTp/LT ft)- ft d := true (ii) else i := i + 1 (d) i f not d (i) r := r + LTp (ii) p := p — LTp In every iteration of the outer loop, exactly one of the commands 2(c)i, 2(d)ii is executed, so the degree of p decreases. Therefore, the algorithm eventually terminates. When checking the loop condition, the invariant / = aifi + - ■ - +p + r holds, and each term of each a{ is the quotient LTp/LT fi from some moment. The degrees of these terms are less than the current degree of p, which is at most the degree of /. Altogether, the degree of each a,/, is at most the degree of /. □ ,xn], the following implication In the ring clearly holds: / = ai/i + ■ ■ ■ + asfs + 0 => f £ {fx,..., f„). However, the converse is generally not true for multivariate division: Consider / = a;y2 — x, f\ = xy + 1, /2 = y2 — 1. The algorithm outputs / = y(xy + 1) + 0(y2 - 1) + (-a; - y), but/ = a;(y2-l),sothat/£(/1,/2}. The next goal is to find some distinguished generators of the ideals I = ..., fs) which would behave better. In a certain sense, this is a similar procedure to the Gaussian elimination of variables for systems of linear equations. Begin with some special assumptions about the ideals. 12.5.7. Monomial ideals. An ideal I C K[a;i,... ,xn] is called monomial if and only if there is a set of multi-indices A C N™ such that I is generated by the monomials xa with a £ A. This means that all polynomials in I are of the form ^aeA haxa, where ha £ K[xi,..., xn]. Clearly, for a monomial ideal I x13 £ I if and only if there exists an a £ A such that xa divides x13. Lemma. Let I C K[xi,... ,xn] be a monomial ideal and f £ K.[xi,... ,xn] a polynomial. Then, the following propositions are equivalent: (1) fel, (2) each term of f lies in I; (3) the polynomial f is a linear combination of monomials from I with coefficients from K. Proof. The implications (3) => (2) => (1) are obvious. It remains to prove (1) => (3). Write the polynomial / as / = ^a aaxa, where aa £ K. It follows from the assumption / £ I that / = J3,3ga hpx13, where x13 £ I and hp £ K[a;i,... ,xn Each 908 CHAPTER 12. ALGEBRAIC STRUCTURES curve. If A = B, then the result is given by the other intersection point with the tangent line of the elliptic curve that goes through A. 12. J.2. Prove that the above definition correctly defines an operation on the points of an elliptic curve. Solution. The intersections of the line with the elliptic curve are obtained as the roots of a cubic equation. If it has two real roots, corresponding to the points A and B, then it must have a third real root as well, i. e., the line AB must have another intersection points with the curve. In the case of a tangent line, the point A corresponds to a double root, so there also exists another intersection point. As for improper points (the last homogeneous coordinate is zero; they correspond to directions in the plane), the only improper point that belongs to the curve given by the equation (1) is the point O = [0,1,0]. Addition with the point O means looking for a second intersection of the elliptic curve (besides the point A itself) and the line which goes through A and is parallel to the y-axis. The improper line z = 0 has triple intersection point O with the curve, i.e.,0 + 0 = 0. □ Remark. Thus, the operation is well-defined. Moreover, it follows directly from the definition that it is commutative. It even follows from the above that O is a neutral element of the operation. However, the proof of associativity is far from trivial. term aaxa must equal some term of the other equality. Hence each term aaxa of the polynomial / can be expressed as the sum of expressions dx,3+s, where d £ K, x13 £ I. However, this means that xa £ I, so that (3) holds. □ Corollary. Two monomial ideals coincide if and only if they contain the same monomials. The following theorem goes much further. It says that every monomial ideal is finitely generated and, moreover, the finite set of generators may be chosen from any given set of generators. 12.5.8. Theorem (Dickson's lemma). Every monomial ideal I = (xa, a £ A) C K[xi,... ,xn] can be written in the form I = (xai,..., xa3}, where Qi,..., as £ A Proof. Proceed by induction on the number of variables. y~-.^ If;/ - 1, then I C K[x], I = (xa, a £ AC \f N). The set of exponents in A has a minimum, rv/Tlp ' so denote it by (5 := min A. Then, x13 divides _Js^L— each monomial xa with a £ A, so I = {x13). Now suppose n > 1 and assume that the proposition is true for fewer variables. Denote the variables x1,..., xn-i, y, and write monomials in the form xaym where a £ Nn_1, m £ N. Suppose that I C K[x1,..., xn-i, y] is monomial, and define J C K[ ]by J := (xa, 3m £ N, xaym £ 7). Clearly, J is a monomial ideal in n — 1 variables, so by the induction hypothesis, J = (xai, x°s). It follows from the definition of J that there are minimal integers m, £ N such that xa'ym' £ I a- Denote m := max{mj} and define an analogous system of ideals Jk C K[x1, xn-i] for 0 < k < m — 1 Jk := {xf3; x'3yk £ IA). Again, all the ideals Jk satisfy the induction hypothesis, so they can be expressed as Jk = (xa^,...,xa^). It remains to show that I is generated by the following finite set of monomials: 12.J.3. Define the above operation algebraically. Consider a monomial xay? £ I. Either ofthe following cases Solution. For any point A £ C, we define A + O = 0 + A = A. occurs: • p > m. Then, xa £ J, k = p, so one of xaiym,... ,xaaym divides xayp. ForapointA£C,A=(a,/3,l),weclearlyhaveB£C, . p < m. Then, analogously, xa £ Jk, and one of and we define A + B = 0, i. e., A = — B. xOLk-1yk,..., xak-3kyk divides xayp. 909 CHAPTER 12. ALGEBRAIC STRUCTURES For a point A ^ -B, A = [a, /3,1] and a point B e C, B = [7, S, 1], we set k 0-S q —7 for A^B, [5pf]22l±2 for ,4 = 5, it = k — a — 7, r = —/3 + fc(a — a). Then, we define A + B = [7, r, 1]. We leave it for the reader to verify that this is indeed the operation that we have defined geometrically. □ K. Grobner bases 12.K.1. Is the basis gi = x2, g2 = xy + y2 a Grobner basis for the lexicographic ordering x > yl If not, find one. Solution. Clearly, the leading monomials are LT(gi) = x2, LT(g2) = xy, so the S-polynomial is equal to S(gu 92) = ygi - xg2 = -xy2. By theorem 12.5.12, g±, g2 is a Grobner basis if and only if the remainder in the multivariate division of this S-polynomial by the basis polynomials is zero. Performing this division (see 12.5.6), we obtain S{gi,g2) = ygi -xg2 + yg2 -y3. The remainder y3 shows that g±, g2 do not form a Grobner basis. By 12.5.13, in order to get one, we must add the remainder polynomial g3 = y3 to gi,g2. Now, we calculate that $(91,93) = y39i - x2g3 = 0 and S{g2, g-i) = y252 - xg3 = y4 = yg3. Hence it follows by theorem 12.5.12 that gi,g2, g3 is already a Grobner basis. □ 12.K.2. Is the basis g\= xy — 2y, g2 = y2 — x2 a Grobner basis for the lexicographic ordering y > xl If not, find one. Solution. Since LT(gi) = xy and LT(g2) = y2, the corresponding S-polynomials is S(g±,g2) = ygi — xg2 = a;3 — 2y2 = — 2g2 + x3 — 2x2. The leading term a;3 is a multiple of neither xy nor y2, which means that gi, g2 do not from a Grobner basis. We can obtain one by adding the polynomial g3 = x3 — 2x2. Then, we have S{gi,g-i) = x2gi -yg3 = 0 and By the previous lemma, each polynomial / G / can be expressed as a linear combination of monomials from /. Each of these is divisible by one of the generators; hence / lies in the ideal generated by them. Therefore, / is a subset of that. The other inclusion is trivial, which completes the proof of Dickson's lemma. □ 12.5.9. Hubert's theorem. Everything is now at hand for the discussion of ideal bases in polynomial rings. The main idea is the maximal utiliza-tion of the information about the leading terms among the generating polynomials and in the ideal. For a nonzero ideal / C K[xi,..., xn], denote LTI := {axa- 3fel: LT f = axa}. Clearly, (LT 7) is a monomial ideal, so by Dickson's lemma, (LTI) = (LT gi,... ,LTgs) for appropriate gu ..., gs G I. Theorem. Every ideal I G ated. ■[Xl, ., xn] is finitely gener- Proof. The statement is trivial for I = {0}. So suppose I 7^ {0}. By Dickson's lemma and the above note, there are 91,..., gs G I such that (LT 1) = (LT gu ... ,LT gs). Clearly, (gi,..., gs) CI. Choose any polynomial / G I and divide it by the s-tuple gi,..., gs. f = aigi H-----h asgs + r, is obtained where no term of r is divisible by any of LTgi,... ,LTgs. Since r = j — aigi-----asgs, re/, and also LT r G LTI. This means that LTr e (LTI). Admit that r ^ 0. Since (LT I) is monomial, LT r must be divisible by one of its generators, i.e. LT gi,... ,LT gs. This contradicts the result of the multivariate division algorithm. Therefore, r = 0 and I is generated by gi,..., gs. □ 12.5.10. Grobner bases. The basis used in the proof of Hilbert's theorem has the properties stated in the following definition: Grobner bases of ideals Definition. A finite set of generators gi,..., gs of an ideal I C K[xi,..., xn] is called Grobner basis if and only if (LTl) = (LTgi,...,LTgs). Corollary. Every ideal I C K.[xi,..., xn] has a Grobner basis. Every set of polynomials gi,... ,gs G I such that (LT I) = (LT gi,..., LT gs) is a Grobner basis of I. Example. Return to the remark on similarity with the Gauss-ian variable elimination for systems of linear equations. That is, illustrate the general results above on the simplest case of polynomials of de-gree one with the lexicographic ordering. 910 CHAPTER 12. ALGEBRAIC STRUCTURES S{g2,93) = x3g2 - y2g3 = 2y2x2 - x5 = = (4y + 2xy)gx - (x2 + 2x + 4)g3 + 8g2. □ 12.K.3. Eliminate variables in the ideal I = {x2 + y2 + z2 - 1, x2 + y2 + z2 - 2x, 2x - y - z). Solution. The variable elimination is obtained by finding a Grobner basis with respect to the lexicographic monomial ordering. Let us denote the generating polynomials of I as 91,92,93, respectively. The reduction g2 = g\ + 1 — 2x yields the reduced polynomial /1 = 2x — 1. Now, we use this polynomial to reduce 53 = /1 + 1 — y — z to f2 = y + z — 1. Now, we reduce g\, dividing it by f\ and f2, which leads to 9i = {\x + ^)h+y2 + z2 -1 and y2 + z2 - 1 = (y - z + l)/2 + 2z2 -2z + 1 Hence, f3 = 8z2 — 8z + 1. We can see that we could do with polynomial reduction and we did not have to add any other polynomials. The basis of I with eliminated variables isl= {2x-l,y + z-l,8z2 -8z + l). □ 12.K.4. Solve the following system of polynomial equations: x2y — z3 = 0, 2xy — 4z = \, « - y2 = 0, X3 _ ^yZ — 0 Solution. Using appropriate software, we can find out that the corresponding ideal (x2y — z3, 2xy — 4z — 1, z — y2, x3 — 4yz) has Grobner basis (1} with respect to the lexicographic monomial ordering, which means that the system has no solution. □ 12.K.5. Find the Grobner basis of the variety in R3 denned param^tTKHfyTas = 3u + 3uv2 — u3, y = 3v + 3u2v — v3, = 3u2 - 3v2. This is the so-called Enneper surface, and it is depicted in the picture on page 900. Denote the generators /, = JA a^Xj + ai0. Consider the matrix A = (a^), where i = 1,..., s and j = 0,..., n, and apply the Gaussian elimination to it. This gives a matrix B = (pij) in echelon form. Zero rows can be omitted from it. Hence there is a new basis gi,..., gt, where t < s. Due to the performed steps, each /, can be expressed as a linear combination g\,..., gt, which means that (fl,---,fs) = (si, ■■■,&)■ Now, verify that these polynomials gi,... ,gt form a Grobner basis. Without loss of generality, assume that the variables are labeled so that LM gi = x{ for i = 1,..., t. Any polynomial / G I can be written as / = hxfx H-----h hsfs = /ligi H-----h h'tgt. It is required that LT j e (LT gx,..., LT gt}. That is, LT j is divisible by one of xi,..., xt. Suppose that / contains only the variables xt+1,..., xn. However, then h[ = 0, since x\ is only in g\ by the echelon form of B. Analogously, h'2 = ■ ■ ■ = h't = 0, and so / = 0. The existence of the very special bases is now proved. 1 However, they cannot yet be constructed algorithmically. This is the goal of the following subsections. 12.5.11. Theorem. Let G = {gi,... ,gt} be a Grobner basis of an ideal I C K[xi,... ,xn] and f a polynomial in K[xi,... ,xn}. Then, there is a unique r = J2a aaxa G K[xi,..., xn] such that: (1) no term of r is divisible by any ofLTgi,... ,LTgt, i.e. VaVi: LTgilaaxa; (2) 3gel:f = g + r. Proof. The algorithm for multivariate division produces / = aigi H-----h atgt + r, where r satisfies the condition (1). Select g as aigi + • • • + atgt, which of course lies in /. It remains to prove uniqueness. Suppose that f = 9 + r = g'+r', where r ^ r'. Clearly, r — r' = g' —g G /. Since G is a Grobner basis, LT(r — r') is divisible by one of LT gi,..., LT gt. There are two possibilities: • LM r ^ LM r'. The one with the higher degree must be divisible by one of LT gi,... ,LT gt, which contradicts condition (1). • LMr = LM r' and LC r ^ LCr'. Then both the monomials LM r and LM r' must be divisible by one of LT gi,... ,LT gt, which is again a contradiction. It follows that LT r = LTV and the inductive argument shows that r = r'. □ 911 CHAPTER 12. ALGEBRAIC STRUCTURES Solution. Applying the elimination procedure (e. g. int the MAPLE system, using gb a s i s with p 1 e x ordering), we obtain the corresponding implicit representation, i. e., an equation with a single polynomial of degree nine: - 59049« - 104976z2 - 6561y2 - 72900z3 - 18954y2z- - 23328z4 + 32805z-V + 14580z3 + 3645zzV - 1296y4.zGrobnerbasis The previous theorem generalizes the Euclidean division, with an ideal instead of a divisor. In the univariate case, this is no generalization, since every ideal is generated by a single polynomial. If it is only the remainder which is of interest, the order of polynomials in the Grobner basis does not mat-ter. Hence it makes sense to define the notation / for the remainder in the division f/G, provided G = (gi,..., gs) is 16767y2z2 6156y2z3 783y2z4 + 39366za;2 + 19683x2Corollaiy> ief G {gi,...,gt} be a Grobner basis xn] and f a polynomial in G 1296y4 - 2430z5 + 432z6 + 108z7 + 486z5a;2 - 432y4z2-P/a" Meal 1 ^ K[a+ 9 K[xi,..., xn\. Then, f £ I if and only if the remainder f is zero. + 54y2z5 + 27z6x2 - 48y4z3 + 15y2z6 64y6 □ As we illustrate on the following simple exercise, the Grobner basis can be used for solving integer optimization problems. 12.K.6. What is the minimum number of banknotes that are necessary to pay 77700 CZK? Solve this problem for three scenarios: First, assume that the banknotes at disposal are 12.5.12. Syzygies. The next step is to find a sufficient "test- fing set" of polynomials of a given ideal which allows us to verify whether the considered system is a Grobner basis. Again, we wish to test this by means of > multivariate division only. For a = multideg / and (3 = multideg g, consider 7 := ill, ■ ■ ■, In), where 7, = max{a,, of values 100 CZK, 200 CZK, 500 CZK, 1000 CZK. Then, The monomial a;7 is called the least common multiple of the monomials LM f and LM g and is denoted LCM(LM f LMg) := a;7. The expression assume that there are also banknotes of value 2000 CZK. Finally, assume that there are no banknotes of value 2000 CZK, but there are banknotes of value 5000 CZK. Solution. Let us denote the respective banknotes by variables s,d,p,t, D, P. The banknotes to be used will be represented as a polynomial in these variables so that the exponent of each variable determines the number of the corresponding banknotes. For instance, if we decide to use only the 100 S(fg) f CZK banknotes, then the polynomial will be q '. If we pay with ten 1000 CZK banknotes, ten 500 CZK banknotes, and the remaining amount with 100 CZK banknotes, then the polynomial will be q = t10p10s627. In the former case, the number of banknotes will be 777. In the latter case, it will be 10 + 10 + 627 = 647. If we have only the banknotes s, d,p, t, then the ideal that describes the relation of the individual banknotes is Ix = {s2 -d, s5 -p,s10 -t). In order to minimize the number of used banknotes, we compute the Grobner basis with respect to the graded reverse lexicographic ordering (we want to eliminate the small banknotes): d = (p2 -t,s2 - d, d3 - sp, sd2 - p). Now, we take any polynomial that represents a given choice of banknotes. Reducing this polynomial with respect to the basis Gi, we get a polynomial whose degree is minimal for our LTf " LTg is called the S-polynomial (also syzygy, or pair) of the polynomials /, g. This is a tool for the elimination of leading terms. The Gaussian elimination is a special case of this procedure for polynomials of degree one. However, during the general procedure, it may happen that the degrees of the resulting polynomials are higher even though the original leading terms are removed. For instance, consider the polynomials of degree 5 in (4,2) and / = x3y2 - x2y3 +x, g = 3x4y + y2 x, y] a with the 9) = —J —!l = xf--yg = -x y +x --y ,3 x3y2 • which is a polynomial of degree 6. Theorem. Let I C K[xi,..., xn] be an ideal. Then, G = {gi,..., gt} is a Grobner basis of I if and only if for each i 3, the remainder of the division S(gi ,gf)/G is zero. Proof. Begin with a technical lemma which describes which cancellations may occur when expressing polynomials in terms of generators. More precisely, they can always be expressed in terms of S-polynomials. Lemma. Consider a polynomial f = Y^i=i ciXa,gi, where ci, ..., ct £ K and on + multideg gt = 8 for a fixed 8 whenever Ci 0. If multideg / < 8, then there are Cjk £ K such that '1 912 CHAPTER 12. ALGEBRAIC STRUCTURES monomial ordering, at it is easy to show that it is the polynomial corresponding to the optimal choice. For instance, take q = srrr. Reduction with respect to Gi yields trrpd. This means that the optimal choice is seventy-seven 1000 CZK banknotes, one 500 CZK banknote, and one 200 CZK banknote. Altogether, it is 79 banknotes. In the second scenario, when we also have the banknote D, the ideal is I2 = (s2 — d, s5 — p, s10 — t, s20 — D) and its Grobner basis is G2 = (t2 -D,p2 -t,s2 - d,d3 - sp,sd2-p). Reduction of q with respect to G2 gives D3Stpd, so this time we pay with 41 banknotes. In the third scenario, we have I3 = (s2 - d, s5 - p, s10 - t, s50 - P) and G3 = (t5 -P,p2 - t, s2 -d,d3 - sp, sd2 - p), and the reduction is equal to P15t2pd. In this case, we need only 19 banknotes. Of course, this simple problem can be solved quickly with common sense. However, the presented method of Grobner bases gives a universal algorithm which can be automatically used for higher amounts and other, more complicated cases. □ Grobner bases have applications in robotics as well. In particular, it is in inversion kinematics, where one must find how to set individual joints of a robot so that it could reach a given position. This problem often leads to a system of nonlinear equations which can be solved by finding a Grobner basis, as in the following problem. 12.K.7. Consider a simple robot, as shown in the picture, which consists of three straight parts of length 1 which are connected with independent joints that enable arbitrary angles a, (3,7. We want this robot to grasp, from above, an object which lies on the ground in distance x. What values should the angles a, j3,7 be set to? Draw the configuration of the robot for x = 1, l,5a^3. Solution. Consider a natural coordinate system, where the initial end of the robotic hand lies in the origin, and the ground corresponds to the x-axis. It follows from elementary trigonometry that the total x-range of the robot at angels a, /3,7 is equal to x = sin a + sin(a + 0) + sin(a + /3 + 7). Similarly, the range of the robot in the vertical direction is y = cos a + cos(q + 0) + cos(q + /3 + 7). i=l j,k=l where x1^ = LCM(LM 9j,LM gk), and the degree of each monomial xs~~fjk S(gj, gk) is less than S. Proof. Let d{ := LCgi and pi = xa'gi/di. Clearly, c{di = LC(ciXa' g{) and LC pi = 1. Since multideg(cja;Qlgij) = S and multideg/ < S, it follows that Y?i=i cidi — 0. Express / as a combination of S'-polynomials: t f = YCidiPi = cidi(Pi ~P2) + (cidi +C2d2)(p2 -p3)+ i=l H-----h (cidi H-----h ct-idt-i){pt-i ~Pt) + + (cirfi + ■ ■ ■ + ctdt)pt-0 Each difference pj — pk can be expressed in terms of S-polynomials djXS ~aigj dkxs-°"' 9k xljk -93 xijk LT gk 9k LTg} = xs-^S(g3,gk) Now the individual coefficients Cjk can be derived easily from these equalities. □ Now follows the proof of the theorem. The "=>" implication follows directly from the corollary of subsection 12.5.11. For the reverse implication, consider a non-zero polynomial / e I. It must be shown that under the implication assumption, LT f e (LT g\,... LT gt}. If it is known that the polynomial can be expressed as / = Yli=i ^i9i wim me property that multideg / = max{multideg(/ii(^)}, then LT f is necessarily divisible by one of the leading terms LT gi, which means that G is a Grobner basis. Denote nij := multideg(/iigi), S := max{mi,...,mi}. Clearly, multideg/ < S. Let polynomials hi,... ,ht be chosen so that S is as small as possible. Since this is a monomial ordering, which, in particular, is a well-ordering, such a S exists. It is necessary to prove that multideg / = S. Write f = Y hi9i + Y hi9i = rrii=8 rrii c\ > s2 > c2, we get the basis (2c2 + 1 - x2, 2ci(1 + x2) - 2s2x -1-x2, 2si(l + x2) + +2s2 - x - x3,4s| - 3 - 2x2 + x4 and hence it is easy to calculate the values of the variables in dependence on x. For example, we can immediately see that c2 = ^~2~^, i- e., 0 = arccos(£-p^). In particular, it is clear that the problem has no solution for |x| > \/3. Specifically, for I a; I < \/3, there are 2 solutions, and for | x = v7^, there is one solution (a = |, (3 = 0, 7 = ^ forpositive x; a = —-|, /3 = 0,7 = ^ for negatives). For a; = 1, we get the solution q = 0, /3 = |,7 = :| and the degenerated solution a = ^, /3 = — f, 7 = 7r. The case a; = —1 is similar. It is good to realize that for |a; < 1, one of the solutions will always correspond to a configuration of the robot where some parts will intersect. For these values of x, there will be only one realizable configuration. Denote CiXa' := LT hi and apply the lemma: {LThi)g, = ^ CiXai9i =J2cjkxs-^kS(gj,gk). rrii=8 rrii=8 j,k It follows from the assumption of the theorem and the multivariate division algorithm that t S(gj,gk) = ^Pijkgi i=l and, moreover, multideg(pjjfcgij) < multideg , gk). Denote qijk := xs~~fjkpijk, to obtain t xs~^'kS(gj,gk) = ^qijkgi- i=l By the second part of the lemma, multideg^fcft) < mu\tideg(xs~1'kS(gj,gk)) < S. Substitution yields ^ {LThi)gi = J2cjk ^ qtjkgt = Yl YC3kQt3k gt i=l \j,k I At the same time, multideg I ^ cjkqijkgi < S for i = 1,..., t. \j,k J Substitute this into the original equality, to get / expressed as a combination of g±,..., gt, where the degrees of all terms are less than S. This contradicts the minimality of S, and so multideg/ = S, whence LT f G (LTgi,... ,LTgt). So G is a Grobner basis. □ 12.5.13. A naive algorithm for Grobner bases. The theorem just proved provides an efficient method for deciding whether a given basis is a Grobner basis. For example, consider I = (x + y,y — z). The only relevant S'-polynomial is S{x + y,y- z) = —(x + y)--(y - z) =xz + y . x y The division yields xz + y2 = z(x + y) + y(y — z), so it is a Grobner basis. The following algorithm utilizes just this method to find a Grobner basis of an ideal that is generated by a given s-tuple of polynomials F = ..., fs). (1) G := F,G' :=0 (2) while G^G' (a) G' := G (b) Vp, q G G': p ^ q do G' (i) s := S(p, q) (ii) if s 0 G := GU{s} 914 CHAPTER 12. ALGEBRAIC STRUCTURES rgy □ Grobner bases can also be used in software engineering when looking for loop invariants, which are needed for verification of algorithms, as in the following problem. 12.K.8. Verify the correctness of the algorithm for the product of two integers a, b. (x, y, z) := (a, b, 0) ; while not (y = 0) do if y mod 2=0 then (x, y, z) := (2*x, else (x, y, z) := (2*x, end if end while return z y/2, z) (y-l)/2, x+z) Solution. Let X,Y,Z denote the initial values of the variables x, y, z, respectively. Then, by definition, a polynomial p is an invariant of the loop if and only if we have p(x, y, z,X,Y,Z) = 0 after each iteration. Such a polynomial can be found using Grobner basis as follows: Let /i, f2 denote the assignments of the then- and else-branches, respectively, i. e., If the algorithm ever terminates, then G contains a Grobner basis. Thus, it suffices to verify that it terminates. However, in each iteration of the inner loop (2), i.e. when a non-trivial remainder is added, either the monomial ideal generated by LTG extends or it remains unchanged. Consequently, there is a non-decreasing chain of (monomial) ideals Ix = LT(F) C I2 C ■■■ C Ik C .... Denoting I = Uj^LjIfc, then I is an ideal, and by Hilbert's theorem, it is finitely generated. However, this means that all generators of I already lie in one of the Ik- Therefore, from this k onwards, Ik = Ik+i = ■ ■ ■ -13 Clearly, the stabilization of this chain of monomial ideals of the leading terms is equivalent to termination of the algorithm. This algorithm is far from ideal. There are quite trivial inputs for which it returns wild results. Moreover, the output basis directly depends on the input, so the outputs for the same ideal defined by different bases may vary. 12.5.14. Reduction of bases. In order to recognize the gen-i, erators which are needed in a Grobner basis, it suffices to follow the leading terms. The first step of the I discussion is simply to remove all elements which are w not needed in this sense. Lemma. Let G be a Grobner basis of an ideal I andp G G such that LTp G (LT(G \ {p})). Then, G — {p} is also a Grobner basis of I. Proof. From the definition of the Grobner basis, (LTI) = (LTG). But LTp G (LT(G \ {p})), so (LT(G \ {p})) = (LTG). Hence the proposition follows immediately. □ Definition. A Grobner basis G of an ideal I is said to be minimal if and only if LCp = 1 and LTp ^ (LT(G — {p})) for all p G G. fi{x,y, z) = (2x,^y,z) and f2(x, y, z) = (2x, For instance, consider K[x,y — 2xy, x2y — 2y2 + x). and ux y H> vy z H> z, where the new variables satisfy uv = 1. Clearly, the invariant polynomial must lie in the ideal 1\ = {ux — X, vy — Y, z — Z, uv — 1). In order to find such polynomial, it suffices to eliminate the variables u and v, which can be done just with the Grobner basis with respect to the graded reverse lexicographic ordering with u>v>x>y>z. This basis is equal to (xy — XY, z — Z,x — vX, y — uY). Hence F^xy - XY) = xy — XY ans F\(z — Z) = z — Z, and all other polynomials are invariant with respect to any number n of applications of f\ and are given by a polynomial in (polynomials) xy — XY and z — Z. Now, we proceed similarly for f2. For n iterations, we derive the formula fi(x,y, z) = (2nx, ±;(y + 1) - 1, (2n - l)x + z), and introducing the variables u and v, we get an equivalent polynomial function F2 : x H> ux y H> v(y + 1) — 1 z H> (u — l)x + z. The invariant polynomial for F2 can be obtained similarly as above, thanks to the Grobner basis of the corresponding ideal. However, we are interested in those polynomials which are invariant for both Fi and F2. Clearly, these must lie in the ideal h = (F2(xy-XY),F2(z - Z),uv - 1). Substituting for F2, we obtain I2 = (uxv(y + 1) — ux — XY, (u — l)x + z — Z, uv — 1} and with the Grobner basis of this ideal, we ehminate the variables u and v and thus find the polynomial xy — XY + z — Z, which is invariant for both Fi and F2, so it is an invariant of the given cycle. Since at the beginning we have X = a, Y = b, Z = 0, we can see that it holds in every step of the algorithm that xy — ab + z = 0. Since the loop terminates only if y = 0, we get that indeed z = ab. □ Reduced Grobner basis Let G be a Grobner basis of an ideal /. A polynomial g e G is said to be reduced for the basis G if and only if none of its monomials lies in (LT(G \ { y > z, then divide the first polynomial by the second one and the third one, and use the result in order to solve the system in the real numbers. Solution. 22,22 2, 2 /,2W2,\ x y + x yz — xyz +yz — z = (y + z ) (x y + z) — — yixyz + z + 1) — z3 + ; as each element is reduced. Hence there is an algorithm for the construction of the reduced Grobner basis. It remains to prove uniqueness. Let there be two reduced Grobner bases G, G of a non-zero ideal /. Then (LT G) = (LT I) = (LTG). Since this ideal is monomial, Dickson's lemma can be applied. Recalling the construction of the basis in the proof of Dickson's lemma, there exists a unique monomial basis of a monomial ideal such that the coefficients of its elements equal 1, and no element of the basis divides another one. By the definition of minimality, both LT G and LT G must be such. This means that LTG = LTG. Consequently for each g e G, there is a unique j e G such that LT g = LT g. _ g — gel. Since G is a Grobner basis, g terms LT g,LT g cancel out in g — g. Since both the bases are reduced, none of the remaining terms of g — g may be divisible by any of LTG = LTG. Therefore, it must be in the remainder, which means that g ° = 0. The This proves the uniqueness. □ 12.5.16. Remarks. Several of the previous questions are now answered: It can be decided efficiently whether or not a given polynomial lies in a given ideal by means of the mul-////' tivariate division and the Grobner basis. Because of the reduced Grobner bases, it can be decided whether or not two ideals coincide - they simply need to have the same reduced Grobner basis. This means that it can be decided whether or not a polynomial equation lies in the ideal generated by a given system. Moreover, it can be decided efficiently whether or not two given systems generate the same ideal of consequences. The above algorithmic construction depends on an appropriate monomial ordering. The answers to the questions are, of course, independent of such ordering. As mentioned at the beginning of this chapter, the technique of Grobner bases is one of the fundamentals of computer algebra. Of course, this algorithm is usually implemented using various tricks to make it faster. One can use the reduction technique as early as when creating the Grobner basis in the fundamental algorithm from subsection 12.5.13, etc. In the literature, one may find miscellaneous variations fjj i for non-commutative algebraic objects (e.g. for formal manipulations with differential operators). The algorithm for finding a Grobner basis can be viewed as a special case of the Knuth-Bendix algorithm for rewriting some rules. This solves the problem of word equivalence in monoids that are given by generators and a set of equalities. Last, but not least, the technique of Grobner bases can . be used in a much more sophisticated way in commutative 917 CHAPTER 12. ALGEBRAIC STRUCTURES Hence 2 = 0, ±1. Then, e. g., 0 = 2 (x2y + 2) — x(xyz + 2 + 1) = = z2 — zx — X. 2 Hence, a; = -^-j-, and we get from the third equation that y = — ^1+J^ ■ This is satisfied by the sole point (|,—4, l). □ 12.K.11. Using Grobner basis, solve the polynomial system x2 + y + 2 = 1, x + y2 + 2 = 1, x + y + 22 = 1. Solution. Let us denote f\ :=a; + y + 22 — 1. The division of x + y2 + 2 — 1 by /1 gives J2 = y2 - y - z2 + 2. The division of a;2 + y + 2 — 1 by f\ yields y2 + 2y22 — y + 24 — 222 + 2, and further division by fa produces the remainder. 2 h 2yz2 + z4 However, (f±, /2, /a) is not a Grobner basis yet. That will be constructed by the choice g\ := /1,52 := h replacing /3 with the S'-polynomial 222/2 - y/3 = -yz4 - yz2 - 2z4 + 2z3. Then, the division by the polynomial leads to g4 = 26 - 424 + 423 - 22 = = 22(2-l)2 (22 + 22 -l). Now, (51,52,53) is a Grobner basis, so we can solve the system by elimination. We get from g4 = 0 that 2 = 0,1, —1 ± V2. Substituting this to tex x2 >iex ''' • Then, for each p = 0, ..., n, Gp := G n K[a;p+i,..., xn] is a Grobner basis for the ideal Ip. If G is minimal or reduced, then Gp is again minimal or reduced, respectively. Proof. Without loss of generality, assume that Gp = {31, • • •, gr}- Since G C I, it follows that Gp C Ip. The inclusion (Gp) C Ip is trivial. It needs to be verified for each polynomial / G Ip that / = /iiSi H-----h hrgr. To do this, perform multivariate division by the original Grob-ner basis G. Since / G 7, it follows that / =0, i.e. / = h1g1 H-----h hrgr + hr+1gr+1----h hmgm. Each of the polynomials gr+i,gm must contain at least one of the variables x\,..., xv, otherwise it would lie in Gp. By the properties of the lexicographic ordering, this variable must also be contained in LT gr+±,..., LT gm. Recall the individual steps of the algorithm for multivariate division and the fact that / contains no monomial with any of xi,..., xv. Then hr+i = ■ ■ ■ = hm = 0. thus verifing that/G {Gp). 918 CHAPTER 12. ALGEBRAIC STRUCTURES x > y > z. Using Maple, we can find the basis UAz5 + 35z7 + 12z9, 23z6 + 12z8 + AAyz4, yz3 + 3z5 + Azy2, 9z4 + Ay3, -8y2 - 6z4 + 3xz3, 2xy2 - 3z3, x2 - 2xz - A. Since the discriminant of the first polynomial of the basis (divided by z5) satisfies 352 - 4 ■ 144 ■ 7 < 0, we must have 2 = 0. Substituting this into the other polynomials, we immediately obtain y = 0, a; = ±2. □ 12.K.13. Solve the following system of polynomial equations in R: xy + yz- 1, yz + zw — 1, zw + wx — 1, wx + xy — 1. Solution. In this case, it is a good idea to take the graded lexicographic ordering with w > x > y > z. Using the algorithm 12.5.13 or appropriate software, we find the corresponding Grobner basis (x - z, w - y,2yz - 1). Thus, the system is satisfied by exactly the points (^,t,-^,t) for an arbitrary t e R except zero. □ 12.K.14. Solve the following system of polynomial equations in R: x2 + yz + x, z2 + xy + z, y2 + xz + y. Solution. Using the algorithm 12.5.13 or appropriate software, we find the corresponding Grobner basis for the lexicographic monomial ordering with x > y > z, consisting of six polynomials: z2 + 3z3 + 2z4, z2 + z3 + 2yz2 + 2yz3, V - yz - z - z2 - 2yz2 + y2, yz + z + z2 + 2yz2 + xz, z2 + xy + z, x2 + yz + x. The roots of the first polynomial are z = 0, —1, —-|. Discussing the individual cases, we find out that the system is satisfied exactly by the points Not only the desired inclusion but also the fact that the division f/G on Ip gives the same result as f/Gp is proved. For 1 < i < j < r, consider the S'-polynomials S(gi, gj). S{9i,9j) G" = S(gi,g/) ° = 0, so Gp is a Grobner basis of Ip. It is clear that the property, for the basis, of being either minimal or reduced, is preserved. □ The only property of the lexicographic ordering used in the proof is that if a variable occurs in the polynomial /, then it occurs in its leading term as well. However, this condition is much weaker than that of the lexicographic ordering. Therefore, in actual implementations, one may use any ordering with the mentioned property. This usually leads to more efficient computations, since the pure lexicographic ordering usually leads to an unpleasant increase of the polynomials' degrees. 12.5.18. Back to parametrized varieties. The above theorem suggests an algorithm for finding an implicit representation of a variety defined in terms of polynomial parametriza-tion. Tools necessary for work with the smallest varieties that contain the points defined by parametrization, are not here available, so a detailed discussion is omitted. When the parametrization of a variety is given by polynomial relations %1 = fl(ui,. ■ ■ ,Uk), ...,Xn = /„(«!, . . - ,Uk), the reduced Grobner basis of the ideal (Xi - /l,. . . ,Xn - fn) can be computed in the lexicographic ordering where Ui < xj for all From this basis, the reduced Grobner basis of the elimination ideal Ik is obtained. This is precisely the required ideal and its implicit representation. It suffices to use an ordering which guarantees that each ui is before each Xj, so that the computation of the Grobner basis would eliminate Ui; otherwise the ordering may be arbitrary. There is a chance that there is a more efficient computation than with the pure lexicographic ordering. When the parametrization is rational, i.e. ^ _ fi (^15 • • • 5 tyn) 9i(h, ■ ■ ■, tm) it is perhaps natural to think of substituting the ideal {xiOi - h, ■■■, xngn - fn) into the above theorem. However, the result of this is usually not good. For instance, consider u2 v2 x = —, y = —, z = u. V u Here, I = (vx — u2,uy — v2,z — u), and the ehmination yields h = (z(x2y — z3)). However, the correct result is %}(x2y — z3). The computation has added an entire plane. The problem is that the entire variety of zero points of the denominators in the parametrizations of individual variables 919 CHAPTER 12. ALGEBRAIC STRUCTURES (0,0,0), (-1,0,0), (0, -1,0), (0,0, -1) and (-±, -±, -±). □ is included in W = ... ,gn)- Instead, perceive the parametrization F as a mapping F : (Kk — W) —> K". For the implicit situation, use the ideal I = {OlXl - fj, ■ ■ ■ , 9nXn - fn, 1 - 91 " ' 9nV) ^ C K[y,h,... ,tm,X!,... ,xn], where the additional variable y enables avoidance of the zero spaces of the denominators. It can be shown that V(Ik+i) is the minimal affine variety which contains F(Km — W). 920 CHAPTER 12. ALGEBRAIC STRUCTURES Key to the exercises 12.A.2. A' A C 12A3. (A NAND (B NAND B)). 12A.31. E.g., {1,2,3,12,18}. 12A.33. There are six isomorphism classes. In three of them ("Y, dual Y, and pentagon"), there is an element incomparable with two other ones, yielding 5! partial orders. In the other three ("X, house, and dual house"), there are two pairs of different incomparable elements, thus yielding only 5!/4 partial orders. Altogether, there are 450 partial orders. 12.B.7. 31. 12.B.8. 45. 12.B.9. 63. 12.B.10. 33. 12.C.1. Suppose the contrary, i. e., that the polynomial is a product of two polynomials with integer coefficients. Use induction to prove that all the coefficients of one of these polynomials are divisible by p (begin with the absolute term). However, then the leading coefficient of f(x) is also divisible by p. 12.C.12. OverR: (x - l)(2x2 - x + l)2, over C: (x - 1) (x - . 12.C.13. OverR: (x + l)(x2 + x + 2)2, overC: (x + 1) (x + ^f^f. 12.C.14. Over R: (x2 - 3x + 2)2, over C: (x - 1 + -J2if (x - 1 - V2if. 12.C.15. x5 + x2 + 2x + 1 = (x2 + l)(x3 + 2x + 1). 12.C.16. x4 + 2x3 + 2 is irreducible. It has no roots and it cannot be written as a product of two quadratic polynomials (this must be verified!). 12.E.3. i) not even a groupoid (the operation is not closed on the given set), ii) a non-commutative monoid, iii) a commutative group, iv) a non-commutative group, v) a commutative group, vi) a commutative monoid. 12.E.4. Since multiplication of complex numbers is commutative, it must remain such on any subset as well. The particular cases are: i) a monoid, ii) a group, iii) a group. 12.E.5. i) a commutative semigroup, but not a monoid; ii) a non-commutative groupoid, but not a semigroup; iii) a non-commutative groupoid, but not a semigroup; iv) a non-commutative semigroup, but not a monoid; v) a commutative monoid, but not a group; vi) a commutative monoid, but not a group; vii) a non-commutative group. 12.E.8. It is a non-commutative group. 12.F.1. i) (1,3,5,7,2,4,6) ii) (1, 3, 2) o (4, 6, 5), (1,4, 2, 5, 3, 6), (1, 5,2, 6, 3,4), (1, 6, 2,4, 3, 5) iii) none exists 12.F.4. Due to the parity, no such permutation exists. 12.F.14. i) Yes ii) No iii) Yes iv) No v) Yes vi) No vii) No viii) No 12.F.15. i) Yes 921 CHAPTER 12. ALGEBRAIC STRUCTURES ii) No 12.F.16. m = 10. (Note that for m = 8 and m = 12, the resulting groups have the desired number of elements but are not isomorphic to Z*.) 12.F.42. This is generally not true. Consider e. g. Sn/An ~ Z2, n > 3. 12.F.47. The subgroup has four elements; the remaining one is the reflection with respect to the plane that is perpendicular to the former one and contains the axis of the rotation (it is isomorphic to the Klein group Z2 x Z2). It is not normal. 12.F.57. i) An isomorphism. ii) A homomorphism, neither surjective, nor injective. in) Not a homomorphism. 12.F.58. i) A surjective homomorphism, not injective. ii) A surjective homomorphism, not injective. in) A homomorphism, neither surjective, nor injective. iv) Not a homomorphism. v) A homomorphism, neither surjective, nor injective. vi) A homomorphism, neither surjective, nor injective. 12.F.59. i) A surjective homomorphism, not injective. ii) Not a homomorphism. in) Not a homomorphism. 12.F.60. i) An injective homomorphism, not surjective. ii) Not a correct definition, since the result does not lie in the specified codomain A4. 12.F.61. i) An injective homomorphism, not surjective. ii) Not a homomorphism. in) Not a homomorphism. iv) Not a homomorphism. 12.F.62. i) A homomorphism, neither injective, nor surjective. ii) Not a mapping. in) A surjective homomorphism, not injective. 12.G.7. 7. 12.H.2. G = 12.G.5. 36 ^ (6!) 12.G.6. i + 2-3!+ 2 24! (8!F (4!)- A 0 1 A 1 1 1 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 \o 0 0 V 6! + (3!)- + 18^ + 2#+4-3! + 24 H = 0 0 k0 0 (4!)J = 477368. 197216213. 12.H.7. 110. 922 CHAPTER 13 Combinatorial methods, graphs, and algorithms Do we often prefer thinking in pictures? - yes, but we can compute discrete things only... A. Fundamental concepts One of the motives for creating graph theory was visualization of certain problems concerning relations. A human brain like thinking about entities it can imagine. Therefore, we like representing a binary relation with a graph whose vertices correspond to the elements and edges (lines between the elements) correspond to the fact that the given pair is related. Optionally, we can encode a relation in a more complicated way-use Hasse diagram (see 12.1.8), for instance. Partially ordered sets are almost always depicted this way. The relation of friendship or acquaintance between people can also be translated to graphs. This gives rise to a good deal of "relaxing" problems. 13.A.1. Prove that the number of odd-degree vertices in an undirected graph is always even. In this chapter, we return to problems concerning properties or mutual relations of (mainly) finite sets of objects. Combinatorial problems are already introduced in the second and third parts of chapter one. Like number theory, combinatorics is a field of mathematics where the problems can often be formulated very easily. On the other hand, solutions can be much more difficult to find. We begin with graph theory, and display a collection of useful algorithms based on this theory. At the end of the chapter methods of combinatorial computations are considered. 1. Elements of Graph theory 13.1.1. Two examples. Several people come to a party; some pairs of people know each other, while other people know nobody. (Acquaintance is assumed to be symmetric). How many people must be there in order to guarantee that there are either three people who all know each other, or there are three people with no mutual acquaintance? Such situations can be aptly illustrated by a diagram. The points (or vertices) stand for the particular people of the party, the full lines represent pairs who know one another, while the dashed lines stand for pairs who do not know one another. Note that every pair of vertices is connected by either a full or a dashed line. The question is now reformulated as: how many vertices must be there in order that either there is a triangle whose sides are all full or a triangle whose sides are all dashed? There is no such triangle in the left-hand diagram with four vertices. The example of a regular pentagon, in which all its outside edges are full, while all its diagonals are dashed (draw a picture!), shows that at least six vertices are required. Such a triangle always exists if the number of vertices is at least six. To show this, consider a set of six vertices, each pair of which is joined by either a dashed line or a full line. CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Solution. Let G = (V,E) be an arbitrary undirected graph with degree d(v) at every v e V. Adding the numbers of all degrees at the vertices we count all edges twice, and, therefore, d(v) = 2\E\. Let Vi C V be the set of vertices vev with all degrees and V2 C V- set of vertices with even degrees. Then, 2\E\ = y d(v) = Yl d(v) + 2~2 d(v)>and vev veVt vev2 52 d(v) = 2\E\-52 veVi vev2 which is an even number. So, V\ is even. □ 13.A.2. Does there exist a graph with degree sequence (3,3,2,2,2,1)? Solution. In this sequence the number of odd-degree vertices equals odd number 3, which is impossible. □ 13.A.3. Let G be a graph with minimum degree d > 1. Prove that G contains a cycle of length at least d + 1. Solution. Let vx... be a maximal path in G, i.e., a path that cannot be extended. Then any neighbour of vx must be on the path, since otherwise we could extend it. Since vx has at least d(G) neighbors, the set {v2,..., v^} must contain at least d(G) elements. Hence k > d(G) + 1, so the path has length at least d(G). Now, neighbour of vx that is furthest from vi along the path must be v{ with i > d(G) + 1. Then vi... viv1 form a cycle of length at least d(G) + 1. □ 13.A.4. Show that any graph with V\ > 2 contains at least 2 vertices with equal degrees Solution. Prove, using pigeon hole principle. Let \ V\ = n. Degree d(v) for any vertex v e V can take values from 0 to n — 1. So, there are 71 possible distinct values for d(v). However, if one of the degrees d(v) = 71 — 1, which means that vertex v is connected to any other vertex, no other degree can be 0. Thus, d(v) can take no more than 71 — 1 distinct values, and, by pigeon hole principle, among 71 values of d(v) at least two coincide. □ 13.A.5. Show that if 71 people attend a party and some shake hands with others, then at the end, there are at least two people who have shaken hands with the same number of people. Solution. Let g be a graph with the set of vertices V being the set of people. If two people shake hands, that would represent an edge e between those two vertices, by previous problem, there will be at least two vertices with equal degrees. □ If v is one of the vertices, then it is joined by five outgoing lines. At least three of these lines are of one type, without loss of generality, full, joining to vertices va,vb,vc- Then either the triangle formed by the vertices va,vb,vc, contains only dashed lines, which is then the desired triangle, or one of its edges is full in which case there is a full triangle. As another example, consider a black box which consumes one bit after another and shines in blue or in red according to whether the last bit is zero or one. Imagine this could be a light over the toilet door recognizing whether the last person came out (0) or in (1). Again, this scheme can be illustrated by a diagram: The third vertex which has only two outgoing arrows represents the beginning of the system (before the first bit is sent). Both situations share the same scheme: there is a finite set of objects represented by vertices. There is a set of their properties represented by connecting lines between particular vertices. The scheme can be modified by distinguishing the directions of the connecting lines by arrows. Such a situation can be described in terms of relations; see the text from subsection 1.6.1 on in the sixth part of chapter one. But this is a complicated terminology for describing simple situations: In the first case, there is one set of people with two complementary symmetric and non-reflexive relations. In the second case, there are two antisymmetric relations on three elements. 13.1.2. Fundamental concepts of graphs. We use the ter-j«V minology which corresponds to the latter dia-grams. Graphs and directed graphs Definition. A graph (also an undirected graph) is a pair G = (V,E), where V is the set of its vertices and E is a subset of the set (^) of all 2-element subsets of V. The elements of E are called edges of the graph. The vertices of an edge e = {v, w}, v ^ w, are called the end-points of e. An endpoint of an edge is said to be incident to that edge. Two edges which share a vertex are called adjacent. Any two vertices which are the endpoints of an edge are called adjacent. 924 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.A.6. Show that if every connected component of a graph is bipartite, then the graph is bipartite. Solution. Graph is bipartite, if we can divide its vertices into two subsets A and B such that every edge in the graph connects a vertex in set A to a vertex in set B. If the vertex sets of components are divided into sets Ai and Bi, we set A = U{Ai and B = UiB{. Subsets A and B impose the graph with bipartite structure. □ 13.A.7. Show that a graph is bipartite if and only if it contains no cycles of odd length. Solution. Suppose there are no cycles of odd length in the graph G. Choose any vertex from the graph and put it in set A. Follow every edge from that vertex and put all vertices at the other end in set B. Erase all the vertices that have been already used. Now for every vertex in B, follow all edges from each vertex in B and put the vertices on the other end in A, erasing all used vertices. Alternate back and forth in this manner until we cannot proceed. This happens when either we exhaust all the vertices or we encounter a vertex that is already in one set and needs to be moved to the other. If the latter occurs, that would represent an odd number of steps from it to itself, so there would be a cycle of odd length. If the graph is not connected, there may still be vertices that have not been assigned. Repeat same process until all vertices are assigned either to set A or to set B. Thus, graph G is bipartite. Suppose now that graph G is bipartite. Let c be a cycle v{l... vik of length k. Vertices along this cycle alternate between subsets A and B of V. Also, the first and the last Vik vertices on c should lie in the same set. Therefore, there is the same number of ^-vertices and B-vertices on c, and, so, k is even. □ 13.A.8. Show that any tree with at least two vertices is bipartite. Solution. Since a tree does not contain any cycles, it also does not contain odd length cycles, and therefore, it is bipartite. □ 13.A.9. Prove that if u is a vertex of odd degree in a graph, then there exists a path from u to some other vertex v of odd degree. Solution. We build a path that does not reuse any edges. As we build the path, we erase edges we used in order not to use them again. Begin at vertex u and select an arbitrary path emanating from it. If, at any point, the path reaches a vertex of A directed graph is a pair G = (V,E), where V is as above, but now, E C V x V. The first of the vertices that define an edge e = (v, w) is called the tail of the edge and the other vertex is called its head. From the vertices' point of view, e is an outgoing edge of v and an ingoing edge of w. The directed edges are also called arcs or arrows. The head and the tail of a directed edge may be the same vertex; such an edge is called a loop. Two directed edges are called consecutive if the tail of one of them is the head of the other one. Similarly, two vertices which are the head and the tail of an edge are called consecutive. To every directed graph G = (V, E), its symmetrization can be assigned. This is an undirected graph with the same set of vertices as G. It contains an edge e = {v, w} if and only if at least one of the edges e' = (v, w) and e" = (w, v) belongs to E. Graph theory provides an extraordinarily good language for thinking about procedures and deriving properties that concern finite sets of objects. They are a good example of a compromise between the tendency to "think in diagrams" and precise mathematical formulations. The language of graph theory allows the adding of information about the vertices or edges in particular problems. For instance, the vertices of a graph can be "coloured" according to membership of the corresponding objects to several (pair-wise disjoint) classes. Or the edges with several values can be labeled, and so on. The existence of an edge between differently coloured vertices can indicate a "conflict". For example, if the vertices are coloured red and blue according to membership to two groups of people with different interests and the edges represent adjacency at a dining table, then an edge connecting two differently coloured vertices can mean a potential conflict. Our first example from the previous subsection can thus be perceived as a graph with coloured edges. The statement we have checked there reads thus in the language of graph theory: A graph Kn = (VJ (^)) with n > 6 vertices and all possible edges which are labeled with two colours always contains a triangle whose sides are of the same colour. The directed graph in the second example above, whose edges are labeled with zero or one, represents a simple finite automaton. This name reflects the idea that the graph describes a process which is, at any moment, in a state represented by the corresponding vertex. It changes to another state, in a step represented by one of the outgoing edges of that vertex. The theory of finite automata is not considered here. 13.1.3. Examples of useful graphs. The simplest case of W graphs are those which contain no edges. There ^H0Sj^/ is no special notation for them. At the other iCSCr^7---- extreme is a graph which contains all possible edges. This is called a complete graph, denoted by Kn, where n is the number of vertices of the graph. The graphs K4 and 925 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS odd degree, we are be done, but each time we arrive at a vertex t of even degree, we are guaranteed that there is another neighbouring vertex where we can move further. By passing through t we erase two edges from vertex t. Since the vertex originally was of even degree, coming in and going out reduces its degree by two, so it remains even. In this way, there is always a way to continue when we arrive at a vertex of even degree. Since there are only a finite number of edges, the tour must end eventually, and the only way it can end is if we arrive at a vertex of odd degree. □ 13.A.10. If the distance d(u, v) between two vertices u and v that can be connected by a path in a graph is denned to be the length of the shortest path connecting them, then show that the distance function satisfies the triangle inequality: d(u, v) + d(v, w) > d(u, w). Solution. If one simply connects the paths from u to v to the path connecting utoic one obtains a path from u to w of length d(u, v) + d(v, w). Since we are looking for the path of minimal length, if there is a shorter path it will be shorter than this one, so the triangle inequality will be satisfied. □ 13.A.11. Show that in a directed graph where every vertex has the same number of incoming as outgoing edges there exists an Eulerian path for the graph. Solution. Suppose a path starts at some vertex v. First edge on the path can be any edge emanating from v, that we delete from the set of possible further edges on a path. As following the path we arrive at any other vertex u we choose an edge that is still in the set of possible outgoing from u edges. We stop as we can no longer choose such an edge. This can happen only if we arrive again at v and all outgoing edges from v have been previously engaged in the path. Indeed, if this is any other vertex w then every arrival at w engages one incoming edge that can be followed by an outgoing edge as their number is the same. Hence, our path starts and ends at v and contains all edges of the graph. □ 13.A.12. An n-cube Qn is a cube in n dimensions that consists of vertices with coordinates 0 and 1 and edges, connecting neighbouring vertices, that may differ from each other in just one coordinate. Show that Qn possesses a Hamiltonian circuit. Solution. We prove by induction. The k + 1-dimensional cobe Qk+i can be considered as two copies of Qk K6 are presented in the introductory subsection. The graph K3 is called a triangle. An important type of graph, is a path. This is a graph whose vertices are ordered as (v0,...,vn) so that E = {ei,e„}, where e, = {i>i_i, Vi} for alH = l,...,n. A path graph of length n is denoted by Pn. If the first and last vertices coincide for the path graph (n > 3), it is called a cycle graph of length n, denoted by Cn. The graphs K3 = C3, C5, and P5 are shown in the following diagram. A OnJ Another type of graph is the complete bipartite graph. Its vertices can be coloured with two (distinct) colours. All possible edges between vertices of different colours are present, but no other edges. Such a graph is denoted by Km,n, where m and n are the numbers of vertices of particular colours. The diagram below illustrates the graphs K13, K23, and K3 3. y w m Another interesting example of a graph is the hyper-cube Hn in dimension n, whose vertices are the integers 0,..., 2™ — 1 and whose edges join those pairs of vertices whose binary expansions differ by exactly one bit. The following diagram depicts the hypercube H4, with labels of the vertices indicated. From the definition it follows that a hypercube of given dimension can always be composed from two hypercubes of dimension one lower, connecting them with edges in an appropriate way. These new edges between the two disjoint copies of H3 are the dashed ones in the diagram. Obviously, the hypercube H4 can be similarly decomposed in several ways (just looking at one fixed bit position, as is done with the very first position in the diagram). 1110 mi 0000 0001 Here are two more examples. The first is the cycle ladder graph CLn with 2n vertices. This consists of two cycle graphs Cn whose vertices are connected by edges according to their order in the cycles. The second is the Petersen graph. 926 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS with every vertex of the first Qk having first coordinate 0 and 1 for the second copy of Qk- Consider some Hamiltonian cycle Vi1...VikVi1 in Qk- In the first copy of Qk that we take a cycle 0v{l .. .Ou^Ou^, and in the second copy lv{l.. AvikAvil. Consider path 0v{l ... 0vik, \vik\vik_1\vi1Qvil which forms a Hamiltonian cycle on Qk+i- □ 13.A.13. Show that a tree with n vertices has exactly nl edges Solution. We proceed by induction. Suppose that every tree with k vertices has precisely kl edges. If the tree T contains k + 1 vertices, we will show that it contains a vertex with a single edge connected to it. If not,start at any vertex, and start following edges marking each vertex as we pass it. If we ever come to a marked vertex, there is a loop in the edges which is impossible. But since each vertex is assumed to have more than one vertex coming out.there is never a reason that we have to stop at one,so we eventually encounter a marked vertex, which is a contradiction. Take the vertex with a single edge connecting to it, and delete it and its edge from the tree T. The new graph T0 will have k vertices. It must be connected, since the only thing we removed was a vertex that was not connected to anything else, and all other vertices must be connected. If there were no loops before, removing an edge certainly cannot produce a loop, so To is a tree. By the induction hypothesis, T0 has kl edges. But to convert T0 to T we need to add one edge and one vertex, so T also satises the formula for the number of edges. □ 13.A.14. If u and v are two vertices of a tree T, show that there is a unique path connecting them. Solution. Since T is a tree, it is connected, and therefore there has to be at least one path connecting u and v. Suppose there are two different paths P and Q connecting u to v. Reverse Q to make a path Q' leading from v to u, and the path made by concatenating P and Q' leads from u back to itself. Now this path PQ' may not necessarily be a loop, since it may use some of the same edges in both directions, but we assumed that there are some differences between P and Q. We can, from PQ', generate a simple loop. Begin at u and continue back one node at a time until the paths P and Q' differ. At this point, the paths split. Continue along both paths of the beyond the bifurcation point until they join again for the rst time, and this must occur eventually, since we know This is somewhat similar to CL5, yet it is actually the simplest counterexample for many propositions about graphs. 13.1.4. Morphisms of graphs. Mappings between the sets 'fit °f vertices or edges which respect the consid-2l£iY ered structure are of great importance in graph ■ss^isSt? theory. It is enough to consider mappings between the vertices only. Morphisms of graphs Definition. Let G = (V,E) and G' = (V',Ef) be two given graphs. Amorphism (or homomorphism) f : G —> G' is a mapping fv ■ V —> V between the sets of vertices such that if e = {v, w} is an edge in E, then e' = { fv (v), fv (w)} is an edge in E. In practice, there is no need to distinguish between the morphism / and the mapping fv. The definition is the same for directed graphs, using ordered pairs e = (v, w) as edges. In the case of undirected graphs, the definition implies that if f(y) = f(w) for distinct v, w e V, then they are not connected by an edge. On the other hand, such an edge is admissible for directed graphs provided the common image of the vertices has a loop. An important special case of a morphism of a graph G is one whose codomain is Km. Such a morphism is equivalent to a labeling of the vertices of G with m colours (or any other names) of the vertices of Km so that vertices of one colour are not adjacent. In this case, it is a (vertex) colouring of the graph G with m colours. If a morphism / : G —> G' is a bijection between the sets of vertices such that the inverse mapping /_1 is also a morphism, then / is called an isomorphism of graphs. Two graphs are isomorphic if they differ only in the labeling of the vertices. Every morphism of directed graphs is also a morphism of their symmetrizations. The converse is not true in general. There are simple and extraordinarily useful examples of graph morphisms: namely a path, a walk, and a cycle in a graph: 927 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS they are joined at the end. The two fragments of P and Q' form a smaller circuit in the tree, which is impossible, since T is a tree. □ 13.A.15. If G is a connected graph and k > 2 is the maximum path length, then any two paths in G with length k share at least one common vertex. Solution. Suppose, not. Let P = v{l ... v{ and Q = vj1 ... Vjk be two paths of maximal length fcthat do not share any vertex. Vertices v{l and Vj1 are connected by a path R = uil...Ui3, where 1 = vi[L and uia = Vjk. Path R coincides with P on some vertices, then it deviates from P. May be, returns to P again, then deviates from P for the last time and connects to Q. Denote this fragment of R by vir — uimuim+i ■ ■ - uim+t — vji> where t > 1. Let P' be path Vi1 ... Vir, T' = Uim ... Uim+t and Q' be the longest part of Q going forward or backwards from Vje, which length is obviously, at least, k + 1. □ 13.A.16. Solution. □ 13.A.17. In a dormitory, there is a party held every night. Every time, the organizer of the party invites all of his/her acquaintances so that at the end of the party, all of the guests know each other. Suppose that each member of the dormitory has organized a party at least once, yet there are still two students who do not know each other. Show that they will not meet at the next party. Solution. Consider the acquaintance graph of the students at the beginning (the vertices correspond to the students, and the edges to the acquaintances). We are going to show that if two students lie in the same connected component of this graph (i. e., there exists a chain of acquaintances beginning with one of the considered students and ending with the other one), see 13.1.10, then they will know each other as soon as each member of the dormitory has held a party. Indeed, consider the shortest path (acquaintance chain) between two students that lie in the same connected component. Every time someone from this path organizes a party, this path is made one shorter (the organizer falls out). Since we assume that each of the students on the path has organized a party, the marginal students must know each other as well. Therefore, if there are two students who do not know each other even after everyone has held a party, then they lie in different connected components Walks, Paths, Trails, and Cycles A walk of length n in a graph G is a morphism s : Pn —> G. Both vertices and edges may repeat in the image. A trail is a walk, where vertices are allowed to repeat, but edges are not allowed to repeat. A path of length n in a graph G is any morphism p : Pn —> G such that p is an injective mapping. The images of the vertices vq, ... ,vn from Pn are pairwise distinct. A cycle of length n in a graph G is any morphism c : Cn —> G such that c is an injective mapping of the vertices. For simplicity, the morphism is often identified with its image. Walks are often written explicitly in the form (v0,e1,v1,... ,en,vn), where e, = {vi_1,vi} for i = l,...,n. A walk can be thought of as the trajectory of a "pilgrim" moving from the vertex f(v0) to the vertex f(yn), not stopping at any vertex of an (undirected) graph. Pn always contains an edge connecting the adjacent vertices vi_1 and v{, while loops are not admitted in undirected graphs. The pilgrim can enter a vertex more than once or even go along an edge already visited. The pilgrim making a "trail" is a little wiser - he does not go along an edge already visited for the second time on his walk from the initial vertex f(v0) to the terminal vertex f(yn). 13.1.5. Subgraphs. The images of paths, walks, and cycles are examples of subgraphs, but not in the same way. Subgraphs Definition. A graph G' = (V, Ef) is a subgraph of a graph G = (V, E) if and only if and only if V C V, and E! C E. Consider a graph G = (V,E). Choose a subset V C V. The largest subgraph (with respect to the number of edges) with V as its set of vertices is called an induced subgraph. It is the graph G' = (Vwhere an edge e e E belongs to Ef if and only if both of its endpoints lie in V. Therefore, the set EJ of G"s edges is given as the intersection E n (K,). A spanning subgraph (also factor) of a graph G = (V,E) is any graph G' = (V,E?) with V = V. Hence G' has the same vertex set as G, but the set of edges may be arbitrary. A clique is a subgraph of the graph G which is isomorphic to a complete graph. Every subgraph can be constructed by a step-by-step application of these two cases - first, select V C V, then choose the target edge set E in the subgraph induced on V. Every image of a homomorphism (vertices as well as edges) forms a subgraph. 928 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS of the graph, so they will never meet at a party (in particular, 13.1.6. How many non-isomorphic graphs are there? It not at the upcoming one). □ Now, we are going to practice the fundamental concepts of graph theory on simple combinatorial problems. 13.A.18. Determine the number of edges of each of the graphs K6, K5t6, Cs. Solution. The complete graph K6 on 6 vertices has (g) =15 edges. The complete bipartite graph K5tG (see 13.1.3) has 5 ■ 6 = 30 edges. Finally, the cycle graph Cs has 8 edges. □ 13.A.19. Degree sequence. Verify whether each of the following sequences is the degree sequence (see 13.1.7) of some graph. If so, draw one of the corresponding graphs. i) (1,2,3,4,5,6,7,8,9), ii) (1,1,1,2,2,3,4,5,5). Solution. First of all, we should check the necessary condition from (1). In the former case, we have 1 + • • • + 9 = \ ■ 9 ■ 10 = 45, so the condition is not satisfied. Therefore, the first sequence does not correspond to any graph. As for the latter sequence, the sum of the wanted degrees equals 24, so the necessary condition is satisfied. Now, we proceed by the Havel-Hakimi theorem from subsection 13.1.7. (1,1,1,2,2,3,4,5, 5) <—► (1,1,1,1,1,2,3,4) <—► <—► (1,1,1,0, 0,1,2) <—► (0, 0,1,1,1,1, 2) <—► <—► (0,0,1,1, 0,0) <—► (0,0, 0,0,1,1) <—► «—►(0,0,0,0, 0). Of course, it was not necessary to execute the procedure to the very end. We could have finished as soon as we saw that the obtained sequence indeed is the degree sequence of some graph. Now, we construct the corresponding graph "backwards" (however, we must take care to always add edges to vertices of appropriate degrees- it is this place where we have the option and can obtain non-isomorphic graphs with the same degree sequence). One of the possible outcomes is the following graph (the order in which each vertex was selected is written inside it): is easy to draw all graphs (up to isomorphism) with a predetermined small number of vertices (three or four for instance). Generally this is a complicated fS 1 combinatorial problem. It is often difficult to decide whether or not two given graphs are isomorphic. Remark. This problem, known as the Graph isomorphism problem, is a somewhat peculiar member of the class NP1 -it is known neither whether it is NP-complete nor whether it can be solved in polynomial time. This is a special case of the problem of deciding whether or not a given graph is isomorphic to a subgraph of another graph. This Subgraph isomorphism problem is known to be NP-complete. It is difficult to answer precisely the question at the beginning of this subsection. There are the same number of graphs on a given set of n vertices as the number of subsets of the edge set. A fc-element set has 2k subsets. There are at most n\ graphs isomorphic to a given one, since this is the number of bijections between n-element sets. It follows that there are at least k(n) 2G) pairwise non-isomorphic graphs. From this, / n \ n log2 k(n) = i 2 ) - loS2 nl > Y 1 _ I _ 21og2n since n\ in2 - 0(nlog2n), follows. (See the notation for asymptotic bounds from subsection 6.1.16 on page 389). Thus the logarithm of the number of non-isomorphic graphs grows at least as fast as n2. 13.1.7. Vertex degree and degree sequence. It is relatively easy to verify that two given graphs are not isomorphic. Since isomorphic graphs differ in the relabeling of the vertices only, they share all nu-_ merical and other characteristics which are not changed by the relabeling. Simple data of this type includes, for instance, the number of edges incident to particular vertices. ^Wikipedia, NP (complexity), http://en.wikipedia.org/ wiki/NP_ (complexity) (as of Aug. 7, 2013,13:44 GMT). 929 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS □ 13.A.20. Find the number of pairwise non-isomorphic complete bipartite graphs with 1001 edges. Solution. A complete bipartite graph Km^n has m ■ n edges. Therefore, the problem can be stated as follows: In how many ways can we write the integer 1001 as the product of two integers? Since 1001 = 7-11-13, we get 1001 = 1 ■ 1001 = 7- (11 ■ 13) = 11 ■ (7-13) = 13- (7- 11). Thus, there are four non-isomorphic complete bipartite graphs having 1001 edges: -^1,1001, ^7,143, ^ii,9i and 7^13,77. □ 13.A.21. Find the number of graph homomorphisms (see 13.1.4) a) from P2 to K5, b) from K3 to K5. Solution. We can see from the definition of the graph homo-morphism that the only condition which must be satisfied is that adjacent vertices must not be mapped to the same vertex. a) 5 ■ 4 ■ 4 = 80. b) 5 ■ 4 ■ 3 = 60. n 13.A.22. Number of walks. Using the adjacency matrix (see 13.1.8), find the number of trails of length 4 from vertex 1 to vertex 2 in the following graph: Vertex degree and degree sequence The degree of a vertex v e V in a graph G = (V,E) is the number of edges from E incident to v. It is denoted by deg n. The degree sequence of a graph G with vertices V = (di, ..., n„) is the sequence (degiii,degii2,.. ., deg n„). Sometimes, it is required that the sequence be sorted in ascending or descending order rather than correspond to the selected order of vertices. In the case of directed graphs, distinguish between the indegree deg+ n of a vertex n and its outdegree deg_ n. A directed graph is said to be balanced if and only if for all vertices n deg_ n = deg+ v. The degree sequence of a graph (and its isomorphic copies) is unique up to permutation. Therefore, if the degree sequences of two graphs differ not merely by permutation, then the graphs are not isomorphic. The converse statement is not true in general. Two non-isomorphic graphs with the same degree sequence are the graph G = C3 U C3 which has degree sequence (2,2,2,2,2, 2), and Cq. However, Cq contains a path of length 5, but C3 UC3 does not contain a path of length 5. Therefore, these two graphs cannot be isomorphic. Since every edge has two endpoints, it is counted twice in the sum of the degree sequence (this condition is sometimes known as handshaking lemma). It follows that (1) ^degu = 2|£|. vev In particular, the sum of the degree sequence must be even. The following theorem2 of Havel and Hakimi is a first re-j§t su^ aDOUt operations with graphs. The proof is 2f£jY constructive. It describes an algorithm for con-■ss^isSt^ structing a graph with a given degree sequence if there is one, or shows that there is no such graph. Deciding about a given degree sequence Theorem. For any natural numbers 0 < di < ■ ■ ■ < dn, there exists a graph G on n vertices with the above values as its degree sequence if and only if there exists a graph on n — 1 vertices with degree sequence (di, d2, . . . , dn-d„ — 1, rfn-d„+l — 1, ... , rfn-1 — !)• Proof. If there exists a graph G' on n — 1 vertices with degree sequence as stated in the theorem, then a new vertex vn can be added to G'. Connect vn with edges to the last dn vertices of G', thereby obtaining a graph G with the desired degree sequence. 2proved independently by Václav J. Havel in 1955 in the Časopis pro pěstováni matematiky (in Czech) and S. L. Hakimi in 1962 in the Journal of the Society for Industrial and Applied Mathematics 930 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Solution. The adjacency matrix of the given graph is Ar 1 1 0 °\ 1 1 1 0/ The number of walks of length 4 from vertex 1 to vertex 2 is the element at [1, 2] in the matrix Aq. Since (2 1 1 2 V2 2\ 2 2 2 3/ wehave (^)i,2 = (2,1,1,2,2)-(l, 4,3,2,2)T = 17. Therefore, there are 17 walks of length 4 between the vertices 1 and 2. □ 13A.23. A cut edge (also called bridge) in a graph is such an edge that its removal increases the number of connected components of the graph. Similarly, a cut vertex (also called articulation point) is a vertex with this property, i. e., when removed (with the edges incident to it, of course), the graph splits up into more connected components. Find all cut edges and vertices of the following graph: o 13. A.24. Prove that a Hamiltonian graph (see 13.1.13) must be 2-vertex-connected. Give an example of a graph which is 2-vertex-connected yet does not contain a Hamiltonian cycle. Solution. Considering any pair of vertices in a Hamiltonian graph, there are two disjoint (except for the two vertices) paths between them (the "arcs" of the Hamiltonian cycle). Therefore, if we remove one of the vertices, the graph clearly remains connected (the vertex to be removed lies on one of the two paths only). As for the example of a non-Hamiltonian graph which is 2-vertex-connected, we can recall the Petersen graph (see the picture at the beginning of this chapter). □ The reverse implication is more difficult. The following needs to be proved. Suppose a fixed degree sequence (di,..., dn) with 0 < di < ■ ■ ■ < dn is given. Then there exists a graph whose vertex vn is adjacent to exactly the last dn verticesvn-dn, The idea is simple - if any of the last dn vertices is not adjacent to vn, then vn must be adjacent to one of the prior vertices. The idea is to interchange the endpoints of two edges so that the vertices vn and vk become adjacent and the degree sequence remains unchanged. Technically, this can be done as follows: Consider all graphs G with a given degree sequence and let, for each G, v(G) denote the greatest index of a vertex ► which is not adjacent to the vertex vn. Fix G to ('J be such that v(G) is as small as possible. Then, either v(G) = n — dn — 1 (and the graph is obtained) or v(G) > n — dn. If the latter is true, then vn is adjacent to one of the vertices Vi, i < v(G). Since, Aegv^^c) > degu;, there exists a vertex vi which is adjacent to iv(g)> but not to v{. Replace the edge {vi, «i,(g)} f°r {ve, Vi} as well as {v{, vn} for {vv(G), vn }, to get a graph G' with the same degree sequence, but with v(G') < v(G), which contradicts the choice of G. (Draw a diagram!) Therefore, the former possibility is true. So the graph is created by adding the last vertex and connecting it to the last dn vertices with edges. □ The procedure reveals that the degree sequence of a graph falls far short of determining the graph. The theorem describes an exact procedure for constructing a graph with a given degree sequence. If there is no such graph, the algorithm so indicates during the computation. Begin with the degree sequence in (say) ascending order. Then delete the largest value d and subtract one from d remaining values on the very right. Then sort the obtained degree sequence and continue with the above step until either there is an example of a graph with the current degree sequence or the degree sequence does not correspond to any graph. If, eventually, a graph is constructed after a number of steps, then one can reverse the procedure, adding one vertex in each step, connected to those vertices where ones were subtracted during the procedure. (Try examples by yourself!) The algorithm constructs only one of the many graphs which share the same degree sequence. 13.1.8. Matrix representation. The efficiency of graph representations is of importance for running algo-; / rithms. One of them is useful in theoretical considerations: 931 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13. A.25. Determine the number of cycles (see 13.1.3) in the graph #5. Solution. We sort the cycles by their lengths, i. e., we count separately the numbers of cycles upon three, four, and five vertices. A cycle of length three is determined uniquely by its three vertices, and there are (g) ways how to choose them. A cycle of length four is determined by its vertices (which can be chosen in (^) ways) and the pair of neighbors of a fixed vertex (the pair can be selected from the remaining three vertices in (?,) ways). Finally, a cycle of length five is given by the pair of neighbors of a fixed vertex as well as the other neighbor (from the two remaining) of a fixed vertex of this pair. Altogether, there are '4\ /2^ Adjacency matrix = 37 cycles. □ 13.A.26. Determine the number of subgraphs (see 13.1.4) of the graph K5. Solution. Again, we count the number of subgraphs separately by the number v of their vertices: • v = 0. There is a unique graph on 0 vertices, the empty graph. • v = 1. There are 5 ways of selecting 1 vertex, resulting in 5 subgraphs. • v = 2. Two vertices can be chosen in ([j) ways. Further, there may or may not be an edge between them. Altogether, we get (g) ■ 2 such subgraphs. • v = 3. Three vertices can be selected in (3) ways. For each pair of them, there may or may not be an edge. This results in (g) ■ 2(2) subgraphs. • v = 4. Here, we calculate (^) ■ 2(2) subgraphs. • v = 5. Finally, in this case, there are (|!) ■ 2 (2) subgraphs. Altogether, we have found 1550 subgraphs of the graph K5. □ 13.A.27. Determine the number of paths between a fixed pair of different vertices in the graph Kj. Solution. We sort the paths by their lengths. There is a unique path of length one (it consists of the edge that connects the selected vertices). There are five paths of length two (it may lead through any of the remaining vertices). There are 5 ■ 4 paths of length three (we select the two vertices through which it leads, and their order matters). Similarly, there are 5-4-3 paths of length four, 5-4-3-2 paths of length five, and The adjacency matrix of the (undirected) graph G = (V, E) is denned as follows. Fix a (total) ordering of its vertices V = (iii,..., n„). Define the matrix AG = (a^) over Z2 (entries are zero and ones) 1 if the edge 0 if the edge e{j {vi,Vj} e E, {vi,Vj} E. It is recommended to write explicitly the adjacency matrices of the graphs mentioned at the beginning of this chapter! By definition, adjacency matrices are symmetric. There are straightforward generalizations of this concept for more general graphs. For oriented edges their directions may indicated by the sign, multiple edges might be encoded by appropriate integers, etc. If the matrix is stored in a two-dimensional array, then this method of graph representation is very inefficient. It consumes 0{n2) memory. However, if the graph is rather sparse, i.e. there are only a few edges, and then almost all of the entries of the matrix are zeros. There are many methods of storing such matrices more efficiently. The matrix representation of graphs is suggestive of linear algebra considerations. For example, there is the following beautiful theorem: Theorem. Let G = (V,E) be a graph with vertices ordered as V = (111,..., vn), and let Ac be its adjacency matrix. Further, let AG = {a{f) denote the entries of the k-th power of the matrix Ac = (aif). (k) Then, a\- is the number of walks of length k between the vertices Vi andvj. Proof. The proof is by induction on the length of the walks. For k = 1, the statement is simply a reformulation of the definition of the adjacency matrix. Suppose the proposition holds for a fixed positive integer k. Examine the number of walks of length k + 1 between the vertices v{ and Vj for some fixed indices i and j. Each walk can be obtained by attaching an edge from v{ to a vertex vi to a walk of length k between vi and Vj. Further, each walk of length k + l can be (k) obtained uniquely in this way. Therefore, if a\-' denotes the number of walks of length k from vi to Vj, then the number of walks of length k + 1 is ik+i) Y, an ■ a (k) This is exactly the formula for the product of the matrix Aq and the power AG. It follows that the entries of the matrix AkG+1 are the integers af3+1) . □ Corollary. If G = (V,E) and Ac are as above, then each pair of vertices in G is connected by a path if and only if the matrix (A + En)n_1 has only positive entries (En is the n-by-n identity matrix). 932 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS also 5! paths of length six. Clearly, there are no longer paths inK7. Altogether, we have 1 + 5+5-4+5-4-3+51+5! = 326 paths. □ At the end of this subsection, we present one more amusing problem. 13.A.28. The towns of a certain country are connected with roads. Each town is directly connected to exactly three other towns. Prove that there exists a town from which we can make a sightseeing tour such that the number of roads we use is not divisible by three. Solution. First of all, we reformulate this problem in the language of graph theory. Our task is to prove that every 3-regular graph (i. e., such that the degree of every vertex equals three) contains a cycle whose length is not divisible by three. We will proceed by induction, and actually, we will prove a stronger proposition: every graph each of whose vertices has degree at least three contains a cycle whose length is not divisible by three. In fact, the original proposition could not be proved by induction since the induction hypothesis would be too weak. The induction will be carried on the number k of vertices of the graph. Clearly, the statement holds for k = 4. Now, consider a graph where the degree of each vertex is at least three and suppose that the statement is true for any such graph on fewer vertices. The reader should be able to prove that there exists a cycle in the graph. If its length is not divisible by three, we are done. Thus, suppose from now on that G = ... v3n. Each vertex of this cycle is connected to at least one more (different from the neighbors on the cycle) in the graph. If there is a vertex v{ on the cycle that is connected to a vertex Vj on the cycle (j > i + then the lengths of the cycles viv2 ■ ■ ■ ViVjVj+i... v3n and ViVi+i... vj total up to 3n + 2, so the length of at least one of them is not divisible by three, as wanted. The situation is similar if there are two vertices v{ and Vj, 1 < i < j' < 3n, which are connected to the same vertex outside the cycle. Proof. (A + En)n_:L= A™-1 + (n;1)An-2 + ---+Czl)A + En. 1 ' 1 \n-2l The entries of the resulting matrix are (using the notation as above) (n-l) , . (n-l\ (n-1- + + (n- l)a,ij + Sij, where 8 a = 1 for all i, and 8{j = 0 for i =P j. This gives the sum of numbers of walks of length 0,..., n — 1 between the vertices v{ and Vj, multiplied by positive constants. Therefore, it is non-zero if and only if there is a path between these vertices. □ 13.1.9 ML" life Remark. Observe how permuting the vertices of V affects the adjacency matrix of the corresponding graph. It is not hard to see that each such permutation permutes both the rows and columns of the matrix Ac in the same way. Such a permutation can be given uniquely by the permutation matrix, each of whose rows and columns contain zeros only except for one entry, which is 1. If P is a permutation matrix, then the new adjacency matrix of the isomorphic graph G' is Ac = P ■ Ac ■ PT, where (the dots stand for matrix multiplication). The transposed matrix PT is also the inverse matrix to P, since permutation matrices are orthogonal. Every permutation can be written as a composition of transpositions; hence every permutation matrix can be obtained as the product of the matrices corresponding to the transpositions. Of course, this is exactly how the matrices of linear mappings change under the change of basis. Understanding the adjacency matrix as a linear mapping is often useful. For example, the adjacency matrix may be thought of as hitting vectors of zeros and ones (imagine the ones indicating active vertices of interest) and yielding vectors of integers (showing how many times the given vertices are arrived at from all active vectors along the edges in one step). This observation also shows that the question whether two adjacency matrices describe isomorphic graphs is equivalent to asking for the equivalence of the matrices via a permutation matrix P. 13.1.10. Connected components of a graph. Every graph G = (V,E) naturally partitions into disjoint subgraphs Gi such that two vertices neG, and w e Gj are connected by a path if and only if * = 3- This procedure can be formalized as follows: Let G = (V,E) be an undirected graph. Define a relation ~ on the set V. Set v ~ w for vertices v,w e V if and only if there exists a path from v to w in G. This relation is clearly a well-defined equivalence relation. Every class [v] of this equivalence determines the induced subgraph G[„] C G, and the (disjoint) union of these subgraphs actually gives the original graph G. According to the definition of an equivalence 933 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Therefore, suppose that each vertex of the cycle is connected to some vertices outside G and no two vertices of the cycle are adjacent to the same vertex outside. Then, we can consider the graph which is obtained from the original one by replacing the vertices v1, v2,..., v3n with a single vertex V. In this new graph, there are also at least three edges leading from each vertex (including V), so we can apply the induction hypothesis to it. Therefore, there is a cycle w1w2 ...Wk where 3 \ k. If it does not contain the vertex V, then it is a cycle in the original graph as well. If it does, then we proceed analogously as above: we consider two cycles whose lengths sum up to 3n + 2k, so the length of at least one of them is not divisible by three. We have found the wanted cycle in every case, which finishes the proof. □ B. Fundamental algorithms Let us begin with breadth-first search and depth-first search, which serve as a basis for more sophisticated algorithms. Their actual implementations may differ; therefore, the answers to the following problems may be ambiguous. 13.B.1. Consider a graph on six vertices which are labeled 1, 2,..., 6. A pair of vertices is connected with an edge if and only if the sum of their labels is odd. Describe the run of the breadth-first search algorithm on this graph. Which of the edges is visited at the end provided the search is initiated at vertex 5 and the neighbors of a given vertex are visited in ascending order? Solution. The algorithm starts at vertex 5 and goes along the edges (5,2), (5,4), (5,6), thereby visiting the vertices 2, 4, 6 (the queue of vertices to be processed is 2,4,6). The first vertex to have been visited is 2, so the algorithm continues the search from there, i. e., vertex 5 is processed and vertex 2 becomes active. The algorithm goes along the edges (2,1), (2,3), (2,5) (the last one has already been used), thereby visiting the vertices 1 and 3 (the queue of vertices to be processed is 4,6,1,3). Now, vertex 2 becomes processed and the first unprocessed vertex to have been visited becomes active. That is vertex 4. The algorithm discovers the edges (4,1) and (4,3), yet no new vertices. Vertex 4 becomes processed and relation, no edge of the original graph can connect vertices that belong to different components. The subgraphs G[„] are called connected components of the graph G. A graph G = (V, E) is said to be connected if and only if it has exactly one connected component. If the graph G is directed, then the definition is analogous to the case of undirected graphs - it is only required that there exist both paths from vlow and from w to v in order for the pair (v, w) to be related. Using this definition, strongly connected components can be discussed. On the other hand, it may only be required that the symmetrization of the graph be connected (in the undirected sense); then weak connectedness can be discussed. 13.1.11. Multiply connected graphs. It is useful to consider the concept of connectedness in a much stronger sense, i.e. to enforce a certain redundancy in the number of paths between vertices. Definition. An (undirected) graph G = (V, E) is said to be • k-vertex-connected if and only if it has at least k+l vertices, and remains connected whenever any k—1 vertices are removed; • k-edge-connected if and only if it has at least k edges, and remains connected whenever any k—1 edges are removed. In the case k = 1, the definition simply says that the graph is connected (in both cases) since the condition is vacuously true. Stronger graph connectedness is desirable with any networks supporting some surfaces (roads, pipelines, internet connection, etc.) where the clients prefer considerable redundancy of the provided service for the case if several connections in the network (i.e. edges in a graph) or nodes in the network (vertices in a graph) break down. In general, Menger's theorem3 holds. It says that for every pair of vertices v and w, the number of pairwise edge-disjoint paths from utoic equals the minimum number of edges that must be removed so as to leave v and w in different components of the new graph. Similarly, the number of pair-wise vertex-disjoint paths from utow equals the number of vertices that must be removed in order to disconnect v from w. We return to this topic in subsection 13.2.13. Right now, we consider the simplest interesting case in detail. These are graphs (on at least three vertices) such that deleting any one vertex does not destroy the connectedness. Theorem. IfG = (V,E) has at least three vertices, then the following conditions are equivalent: • G is 2-vertex-connected; • every pair of vertices v and w in G lie on a common cycle; • the graph G can be constructed from the triangle K3 by repeatedly adding and splitting edges. 3Karl Menger proved this as early as in 1927; that is, before graph theory came into being. 934 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS vertex 6 becomes active. This leads to discovery of the edges (6,1) and (6,3). If the algorithm know the number of edges in the graph, it terminates at this moment. Otherwise, it goes through the vertices 1 and 3, finding out that there are no new edges or vertices, and then it terminates. In either case, the last edge to have been discovered is (3,6). □ 13.B.2. Consider a graph on six vertices which are labeled 1, 2,..., 6. A pair of vertices is connected with an edge if and only if the sum of their labels is odd. Describe the run of the depth-first search algorithm on this graph. Which of the edges is visited at the end provided the search is initiated at vertex 5 and the neighbors of a given vertex are visited in ascending order? Solution. The algorithm starts at vertex 5 and goes along the edges (5,2), (5,4), (5,6), thereby visiting the vertices 2, 4, 6 in this order (the stack of vertices to be processed is 6,4,2). Vertex 5 becomes processed and the last vertex to have been visited (i. e., vertex 6) becomes active. The algorithm goes along the edges (6,1) and (6,3) (the edge (6, 5) has already been used), thereby visiting the vertices 1 and 3 (the stack of vertices to be processed is 3,1,4,2). Now, vertex 2 becomes processed and the last unprocessed vertex to have been visited becomes active. This is vertex 3. The algorithm discovers the edges (3,2) and (3,4), so the stack becomes 4,2,1,4,2. Vertex 3 becomes processed and vertex 4 becomes active. This leads to discovery of the edge (4,1), leaving the stack at 1,2,1,2. The algorithm continues the search from vertex 1, visiting the last edge (1,2). (Note: only unprocessed vertices are pushed into the stack.) □ Remark. If we had chosen the opposite edge priority, the edges would have been visited in the following order: (5,2), (2,1), (1,4), (4,3), (3,2), (3,6), (6,1), (6,5), (4,5). Intuitively, the depth-first search can be perceived so that the algorithm examines the first undiscovered edge in each step. 13.B.3. Let the vertices of the graph KG be labeled 1, 2,..., 6. Write the order of edges of KG in which they are visited by the depth-first search algorithm, supposing the search is initiated from vertex 3 and the neighbors of a given vertex are visited in ascending order. O 13.B.4. Let the vertices of the graph KG be labeled 1, 2,..., 6. Write the order of edges of KG in which they are visited by the breadth-first search algorithm, supposing the search is Proof. If the second proposition is true, there are at least two different paths between any two vertices. So deleting a vertex cannot destroy the connectedness and the first proposition follows. Conversely, suppose the first proposition is true. Proceed j^jfi,, by induction on the minimal length of a path between v and w. Suppose first that the vertices are the end-r\t^- points of an edge e, and that the shortest path is of length 1. If removing the edge e splits the graph into two components, then this would also occur if the vertex v is removed or if the vertex w is removed. Therefore, the graph is connected even without the edge e, so there is a path between v and w. This path, together with the edge e, forms a cycle. For the induction hypothesis, assume that such a shared cycle is constructed for all pairs of vertices connected by a path whose length does not exceed k. Consider vertices v, w and one of the shortest paths between them: (v = v0,e1,v1,e2, ■ ■ - ,vk+i = w) of length k + 1. Then, vi and w can be connected by a path of length at most k, hence they he on a common cycle. Denote by Pi and P2 the corresponding two disjoint paths between vi and w. Now, the graph G \ {vi} is also connected, so there exists a path P from v to w which does not go through the vertex vi, and this path must once meet either of the paths Pi, P2. Without loss of generality, suppose that this occurs on the path Pi, at vertex z. Now, the cycle can be built: it consists of the part of P from v to z, the part of Pi from ztow, and P2 (directed the other way) from w to v (draw a diagram!). It follows that the second proposition is a cosequence of the first proposition, and hence first condition is equivalent to the second one. Suppose the third proposition is true. Neither splitting an edge nor adding a new one in a 2-vertex-connected graph destroys the 2-connectedness. So the first proposition follows from the third proposition. It remains to prove that third proposition follows from the first proposition. From the first proposition, G is 2-connected, so there exists a cycle, which can be obtained from K3 by splitting edges. Consider the subgraph G' = (V',E) determined by this cycle, and consider an edge e = {v, w} E such that one of its endpoints lies in V. If both of its end-points he there, a new edge can simply be added to the graph G', which leads to the subgraph (V, E U {e}) in G, which contains more vertices and edges than the graph G'. Consider the remaining possibility, i.e. v e V while w <£ V. Since G is 2-connected, it remains connected even if the vertex v is removed, and it contains a shortest path P between the vertex w and some vertex (denote it as v') in G' (apart from the removed vertex v) and containing no other vertex from V. Adding this path to the graph G', together with the edge e (which can be done by adding the edge {v, v'} splitting it to the desired number of "new" vertices and edges), A new subgraph is obtained which satisfies the requirements 935 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS initiated from vertex 3 and the neighbors of a given vertex are visited in ascending order. O 13.B.5. Apply Dijkstra' algorithm to find the shortest path from vertex number 9 to each of the other vertices. @ /^"3~/4j ffl 18 1 2 1 7 1 3 1 (1^-2^ ^"6-^ (9M-(10) ^n) 1 1 8 15 2 7 5 ®- 2 -^D- 7 -^2>- 3 -)?)- 4 -^) (gf o 13.B.6. Give an example of i) a graph on at least 4 vertices which does not contain a negative cycle, yet Dijkstra's algorithm fails on it; ii) a graph on at least 4 vertices which contains a negative edge, yet Dijkstra's algorithm succeeds on it. Solution. In both cases, we must be well aware how Dijkstra's algorithm works. Then, it is easy to find the wanted examples (apparently, there are many more possibilities). As for the first problem, we can consider the following graph (where S is the initial vertex): 1 1 If Dijkstra's algorithm is run from S, then it visits the vertex A and fixes its distance from S to 1. However, there is a shorter path, namely the path (S, B, C, A) of length 0. As for the second problem, consider the following: -1 □ Bellman-Ford algorithm. This algorithm is based on the same principle as Dijkstra's. However, instead of going through particular vertices, it processes them "simultane-ously"-the relaxation loop (i. e., finding out whether the temporary distances of the vertices can be improved using a given edge) is iterated (| V| —1)-times over all edges. Theadvantage and contains more vertices than the considered graph G'. After a finite number of these steps, the entire graph G is built from the triangle K3, as desired. The proof is complete. □ 13.1.12. Eulerian graphs. There are problems of the type "draw a graph without removing the pencil from the paper". In the language of graph theory, this can be stated as follows: Eulerian trails Definition. A trail which visits every edge exactly once and whose initial and terminal vertices are the same is called a Eulerian trail. Connected graphs that admit such a trail are called Eulerian graphs. Of course, an Eulerian trail goes through every vertex :tp . at least once, but it can visit a vertex more than once. To draw a graph with-out removing the pencil from the paper w, while ending at the same point where one started means to find a Eulerian trail. The terminology refers to the classical story about the seven bridges in Königsberg. There, the task was to go for a walk and visit each of the bridges exactly once. The first proof that this is impossible is by Leonhard Euler, in 1736. The situation is depicted in the diagram. On the left, there is a sketch of the river with the islands and bridges. The corresponding multigraph is caught in the right-hand diagram. The vertices of this graph correspond to the "connected land", while the edges correspond to the bridges. If it is desired to do without the multiple edges (which have not been admitted so far), it would suffice to place another vertex inside each bridge (i.e. to split the edges with new vertices). Surprisingly, the general solution of this problem is quite simple, as shown by the following theorem. Of course, this also shows that Euler could not design the desired walk. Eulerian graphs Theorem. A graph G is Eulerian if and only if it is connected and all vertices ofG have even degree. Proof. If a graph is Eulerian, for every vertex entered there is an exit. Therefore, the degree of every vertex is even. More formally: consider a trail that begins and ends at a vertex v0 and passes through all edges. Every vertex occurs once or more on this trail and its degree equals twice the number of its occurrences. Now suppose that all vertices of a graph G have even degree. Consider the longest possible trail (v0, ei,..., Vk) in G 936 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS is that this approach works even with negative edges, and it is able to detect negative cycles (if another iteration of the relaxation loop leads to a change, then there must be a negative cycle in the graph). However, we pay for that with increased time complexity. 13.B.7. Use the Bellman-Ford algorithm to find the shortest paths from the vertex S to all other vertices. Assume that the edges are ordered by the number of the tail (or head) and the initial vertex is the least one. Then, change the value of the edge (8, 6) from 18 to —18, execute the algorithm on this new graph, and show the detection of negative cycles. picture skipped Solution. According to the conditions, the edges are visited in the following order: (S,4), (S,7), (1,2), (1,5), (2,1), (2,3), (2,6), (3,7), (4,7), (4,8), (5,1), (5,6), (6,2), (6,5), (7,8), (8,6). The vertex distances (potential higher values computed earlier during the same iteration are written in parentheses): S 1 2 3 4 5 6 7 8 1 0 oo oo oo 1 oo 22 3(6) 4 2 0 oo 23 oo 1 24 22 3 4 3 0 25(30) 23 26 1 24 22 3 3 4 0 25 23 26 1 24 22 3 3 Since the fourth iteration does not lead to any change, we can terminate the algorithm at this moment. In the changed graph, the execution is as follows (for the sake of clarity, we do not write the values of vertices that are untouched by the change): S 1 2 3 4 5 6 7 8 1 0 oo oo oo 1 oo -14 3(6) 4 2 -13 -12 3 -ll(-6) -10 -19 -2 -1 4 -18 -17 5 -16 -15 -24 -7 -6 6 -23 -22 7 -21 -20 -29 -12 -11 8 -28 -27 9 -26 -25 -34 -17 -16 The graph has 9 vertices, and since the ninth iteration changed the distance of one of the vertices, there is a negative cycle. Of course, we could have terminated the algorithm much earlier if we had noticed exactly what changes took place between the particular steps. Clearly, the values of the vertices 1, 2, 3, 5, 6, 7, 8 keep decreasing below all bounds. The algorithm can also be implemented so that it produces the tree of shortest paths and also finds the vertices lying on a negative cycle if there is one. □ Paths between all pairs of vertices. We often need to know the shortest paths between all pairs of vertices. Of course, we where no edge occurs twice or more. First, suppose for a moment that Vk ^ v0. This would mean that the number of edges of the trail that enter or leave the vertex v0 is odd, so there must be an edge which is incident to v0 and not contained in the trail. However, then the trail can be prolonged while still using every edge of the graph at most once, which is a contradiction. Therefore, v0 = vk. Define a subgraph G' = (V, E) of G as follows: It contains the vertices and edges of our fixed trail and nothing else. If V =^ V, then (since the graph G is connected) there exists an edge e = {v, w} such that v e V and w <£ V. However, then the trail can be "rotated" so that it begins and ends at the vertex v. It can be prolonged with the edge e, which contradicts the assumption of the greatest possible length. Therefore, V = V. It remains to show that E = E. So suppose there is an edge e = {v, w} £ E. As above, the trail can be rotated so that it begins and ends at the vertex v and then goes along the edge e - a contradiction. □ Corollary. A graph can be drawn without removing the pencil from the paper if and only if there are either no vertices of odd degree or exactly two of them. Proof. Let G be a graph with exactly two odd-degree vertices. Construct a new graph G' by attaching a new vertex w to the original graph G and connecting it to both the odd-degree vertices. This graph is Eulerian, and the Eulerian trail in it leads to the desired result. On the other hand, if a graph G can be drawn in the desired way, then the graph G' is necessarily Eulerian, so the degrees of the vertices in G are as stated. □ The situation for directed graphs is similar. A directed graph is called balanced if and only if the outcoming and incoming degrees coincide, i.e. deg+(ii) = deg_(v), for all vertices v. Proposition. A directed graph G is Eulerian if and only if it is balanced and its symmetrization is connected (i.e. the graph G is weakly connected). Proof. The proof is analogous to the undirected case. (Work out the details yourself!) □ 13.1.13. Hamiltonian cycles. Find a walk or cycle that vis-jr-ty its every vertex of a graph G exactly once. Of y necessity, such a walk can visit every edge at /^lw\ mosl once. Such a cycle is called a Hamiltonian cycle in the graph G. A graph is called Hamiltonian if and only if it contains a Hamiltonian cycle. This problem seems to be very similar to the above one of visiting every edge exactly once. But while the problem of finding an Eulerian trail is trivial, the problem of deciding whether a graph is Hamiltonian is NP-complete. Of course, this problem can be solved by "brute force". Given a graph on n vertices, generate all n\ possible orders of the n vertices, and for each of them, verify whether it is a cycle in G. 937 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS could apply the above algorithms to all initial vertices. However, there is a more effective method to do this. One of the possibilities is to use the similarity with matrix multiplication, which is the basis of the Floyd-Warshall algorithm (the best-known among algorithms of the all pairs shortest paths type), which: • computes the distances between all pairs of vertices in time 0(n3); • starts with the matrix Uo = A = (a^) of edge lengths (setting ua = 0 for each vertex i) and then iteratively computes the matrices Uq, U\,..., U\v\, where Uk(i,j) is the length of the shortest path from i to j such that all of its inner vertices are among {1, 2,..., k}; • the matrices are computed using the formula Uk{i,j) = min{ufc_i(j, + (k, j)}. In other words, considering the shortest path from i to j which can go only through the vertices 1,..., k, we can ask whether it uses the vertex k. If so, then this path consists of the shortest path from i to k and the shortest path from k to j (and these two paths use only the vertices 1,..., k — 1). Otherwise, the wanted path is also the shortest path from i to j which can go only through the vertices 1,..., k — 1. Clearly, for k = V |, we get the shortest paths between all pairs of vertices without any restrictions. Moreover, we can maintain the so-called predecessor matrix (i. e., the predecessor of each vertex on the shortest path from each vertex and update it as follows: • Initialization: (Po)ij = i for i 7^ j and a^- < oo, • In the fc-th step, we update y. \{Pk-i)kj, if the path through k is better, v y(Pk-i)ij, otherwise. As soon as the algorithm terminates, we can easily construct the shortest path between any pair of vertices u, v: we derive it from the matrix P = Pn = (pij) (in the reverse order) as V, W Puv: Puw, - - - ■ 13.B.8. Apply the Floyd-Warshall algorithm to the graph in the picture. Write the intermediate results into matrices. Show the detection of negative cycles. Maintain all information necessary for the construction of the shortest paths. picture skipped This problem forms a vital field of research. For instance, in 2010, A. Bjorklund published a randomized algorithm based on the Monte Carlo method, which counts the number of Hamiltonian cycles in a graph on n vertices in time 0(l,567n).4 Finding Hamiltonian cycles is desired in many problems related to logistics. For example, finding optimal paths for goods delivery. 13.1.14. Trees. Sometimes it is desired to minimize the ^4?/?^ numDer °f edges in the graph while keeping it ^^m/ connected. Of course, this is possible if and <^ki*^~-- only if there is at least one cycle in the graph. The graphs without cycles are extremely important as we shall see below. Forests, Trees, Leaves A connected graph which does not contain a cycle is called a tree. A graph which does not contain a cycle is called a forest (a forest is not required to be connected). Every vertex of degree one in any graph is called a leaf. The definition suggests an easily memorable "theorem": A tree is a connected forest. Lemma. Every tree with at least two vertices contains at least two leaves. For any graph G with a leafv, the following propositions are equivalent: • G is a tree; • G\v is a tree. Proof. Let P = (v0,..., Vk) be (any) longest possible path in a tree G. If the vertex v0 is not a leaf, then there is an edge e incident to it whose other endpoint v is not in P since this would form a cycle in the tree. Then the path P with this edge could be prolonged, which contradicts '"longest"'. So the vertex v0 is a leaf. The proof for the vertex is similar. Thus, if the longest path is not trivial, then it must contain two leaves v0 and v^. Next, consider any leaf v of a tree G. Consider any two other vertices w, z in G. There exists a path between them, and no vertex on this path has degree one. Therefore, this path remains the same in G \ v. Hence the graph remains connected even after removing the vertex v. There is no cycle, since it is constructed by removing a vertex from a tree. Conversely, if G\v is a tree, then adding a vertex with degree 1 cannot create a cycle. The resulting graph is evidently connected. □ Trees can be characterized by many equivalent and useful properties. Some of them appear in the following theorem which is more difficult to formulate than to prove. Bjorklund, Andreas (2010), "Determinant sums for undirected Hamiltonicity", Proc. 51st Impartial Symposium on Foundations of Computer Science (FOCS '10), pp. 173-182, arXiv: 1008.0541, doi:10.1109/FOCS.2010.24. 938 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Solution. We proceed according to the algorithm, obtaining the following shortest-length matrices and predecessor matrices: U0 = u2 = u3 = (° 4 -3 oo^ (- 1 1 -3 0 -7 oo 2 — 2 — oo 10 0 3 , Po = — 3 — 3 \5 6 6 0/ 4 4 -/ (° 4 -3 oo^ (- 1 1 "\ -3 0 -7 oo 2 — 2 oo 10 0 3 , Pi = — 3 — 3 \5 6 2 0/ 4 1 -/ (° 4 -3 oo\ (~ 1 1 "\ -3 0 -7 oo 2 — 2 7 10 0 3 , p2 = 2 3 — 3 V3 6 1 0/ 4 2 -/ (° 4 -3 °\ (- 1 1 -3 0 -7 -4 2 — 2 3 7 10 0 3 , P3 = 2 3 — 3 \3 6 -1 o, I V2 4 2 -/ (° 4 -3 °\ (- 1 1 3\ -3 0 -7 -4 2 — 2 3 6 9 0 3 , Pa = 2 4 — 3 \3 6 -1 0/ V2 4 2 -/ Since there is no negative number on the diagonal of U4, there is no negative cycle in the graph. Suppose we would like to find the shortest path from vertex 3 to vertex 1, for instance: The predecessor of 1 is P4 [3,1] = 2 and the predecessor of 2 is P4 [3,2] = 4. Therefore, the wanted path is 3, 4, 2,1 and its length is Ua[3,1] = 6. □ Hamiltonian graphs. To decide whether a given graph is Hamiltonian is an NP-complete problem. Therefore, it might be useful to have some simpler necessary or sufficient conditions for this property at our disposal. We mention three sufficient conditions: Dirac's, Ore's, and the Bondy-ChvAAtal theorem. Dirac: Let a graph G with n > 3 vertices be given. If each vertex of G has degree at least n/2, then G is Hamiltonian. Ore: Let a graph G with n > 3 vertices be given. If the sum of the degrees of each pair of non-adjacent vertices is at least n, then G is Hamiltonian. The closure of a graph G is the graph cl(G) obtained from G by repeatedly adding an edge u, v such that u, v have 13.1.15. Theorem. Let G = (V,E) be a graph. The following conditions are equivalent: (1) G is a tree. (2) For every pair of vertices v, w, there is exactly one path from v to w. (3) G is connected but ceases to be such if any edge is removed. (4) G does not contain a cycle, but the addition of any edge creates one. (5) G is a connected graph, and the number of its vertices and edges satisfies W\ LEI +1. Proof. The properties 2-5 are satisfied in every tree. Indeed, by the previous lemma, every tree which has at least two vertices has a leaf v. It continues to be a tree when this leaf v is removed. Therefore, it suffices to show that if any of the statements 2-5 is true for a given tree, then it holds when a leaf is added to the tree as well. This is clear. In the case of properties 2 and 3, the graph is connected, and their formulation directly excludes the existence of cycles. As for the fourth property, it suffices to verify that G is connected. However, any two of vertices v, w in G are either connected with an edge, or adding this edge to the graph creates a cycle. So there exists a path between them even without this edge. The last implication can be proved by induction on the number of vertices. Suppose that all connected graphs on 71 vertices and 71 — 1 edges are trees. The sum of vertex degrees of any graph on 71 +1 vertices and 71 edges is In, so the graph must contain a leaf. It follows from the induction hypothesis that this graph can be constructed by attaching a leaf to a tree; hence it is also a tree. □ 13.1.16. Rooted trees, binary trees, and heaps. Trees are often suitable structures for data storage. They permit basic database operations (eg. finding a KJh \fe particular piece of information) efficiently. ^Ifr^po^— Since there is no cycle in a tree, fixing one vertex vr defines the orientation of all edges. For every vertex v, there is exactly one path from vr to v, so the orientation can be defined accordingly. Since there are no cycles, it is impossible for two such paths to force both orientations of a particular edge. If one of the vertices of a tree is fixed, the situation is similar to a real tree in nature - there is a distinctive vertex which "grows from the ground". Trees with a fixed distinguished vertex vr are called rooted trees, and vr is said to be the root of the tree. In a rooted tree, the terms successor and predecessor of a vertex are defined as follows: a vertex w is a successor of v (or v is a predecessor of w) if and only if the path from the root of the tree to the vertex w goes through v and v ^ w. If the vertices are directly connected with an edge, we can talk about a direct successor and a direct predecessor. More 939 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS not been adjacent and deg(w) + deg(ii) > n until no such pair of vertices u, v exists. Bondy, ChvAAtal: A graph G is Hamiltonian if and only if cl(G) is. 13.B.9. Prove the Bondy-ChvAAtal theorem. Solution. Clearly, it suffices to prove that if G is Hamiltonian after addition of an edge {u, v} such that u, v have not been adjacent and deg(u) + deg(w) > n, then it is already Hamiltonian without this edge. Suppose that G + {u, v} is Hamiltonian, but G is not. Then, there exists a Hamiltonian path from u to v in G. It must hold for each vertex adjacent to u that its predecessor on this path is not adjacent to v (otherwise, there would be a Hamiltonian cycle in G). Therefore, deg(u) + deg(i>) < n — 1. □ 13.B.10. i) Prove that the Bondy-ChvAAtal theorem implies Ore's and Ore's implies Dirac's. ii) Give an example of a Hamiltonian graph which satisfies Ore's condition but not Dirac's. iii) Give an example of a Hamiltonian graph whose closure is not a complete graph. Solution. i) If a graph G satisfies Ore's condition, then its closure is a complete graph, which is Hamiltonian, of course. By the Bondy-ChvAAtal theorem, the original graph is Hamiltonian as well. Further if G satisfies Dirac's condition, then it clearly satisfies Ore's as well and thus is Hamiltonian. ii) Consider the following example: The degree of vertex 5 is 2, which is less than §. The sum of the degrees of any pair of (not only non-adjacent) vertices is at least 5. iii) The wanted conditions are satisfied by the cycle graphs Cn, n > 4, for which cl(Cn) = Cn. 1=1 Planar graphs. often, they are called a child and a parent (motivated by the genealogical trees). The most common data structures are the binary trees, which are special cases of a rooted trees: there, every vertex has at most two children (sometimes, the term binary tree implies that every vertex is either a leaf, or has exactly two children; to avoid ambiguity, such trees are often called full binary trees). Such trees are very useful in search procedures. If the vertices are associated to keys from a totally ordered set (eg. the integers), the search for the vertex with a given key is performed by searching the path from the root of the tree to that vertex. At every vertex, compare its key to the desired one. This decides whether one continues to the left or to the right, or stop the search if it is found. If this algorithm is to be correct, one of the children with all its successors must have lower keys than the keys of the other child and all its successors. In order for the search to be efficient, some effort must be made to keep the binary trees balanced, with the lengths of the paths from the root to the leaves differing by at most one. The most unfortunate example of a binary tree on n vertices is the path graph Pn (which may be formally considered a binary tree), while the most desired case is the perfect complete binary tree, where every vertex that is not a leaf has exactly two children, and all leaves are at the same level. Such a tree can be constructed only when the number of vertices is of the form n = 2k — 1, fc = 1,2,____ Therefore, in a balanced tree, finding the vertex with a given key value can be done in 0(log2 n) steps. Such trees are often called binary search trees. Think out as an exercise how to efficiently perform basic graph operations over binary search trees (additions and removals of the vertex with a given key as well as how to keep the tree balanced). An extraordinarily useful example of binary trees is the structure of a heap. It is a full balanced binary tree, where the keys are either strictly decreasing along each path from the root (the so called max heap), or they are inreasing (the min heap). Because of this ordering along the paths in a max heap, the maximum key value of the heap can be found in constant time and removed in logarithmic time (similarly with minimum in the min heap). The desired maximum is just at the root and after removing it we need to balance the shape of the heap. Prove this is possible in logarithmic time yourself! The left-hand diagram shows a binary search tree. In the right-hand diagram, there is a max heap. Much literature is devoted to trees, their applications and miscellaneous variations. 940 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.B.11. Decide whether the graph in the picture is planar. Solution. By the Kuratowski theorem (see page 945), this graph is not planar since one of its subgraphs is a subdivision Of #3,3- □ 13.B.12. Decide whether there is a graph with degree sequence (6, 6,6,7,7,7, 7,8,8,8). If so, is there a planar graph with this degree sequence? O 13.B.13. What is the minimum number of edges of a hexahedron? Solution. In any polyhedron, every face is bounded by at least three edges. At the same time, every edge lies belongs to two faces. If / is the number faces and e the number of edges of the polyhedron, then we have 3/ < 2e (see also 13.1.20). For a hexahedron, this bound yields 18 < 2e, i. e., e > 9. Indeed, there exists a hexahedron with nine edges. It can be obtained by "gluing" two identical regular tetrahedra together along one face. Therefore, the minimum number of edges of a hexahedron is nine. □ 13.1.17. Remarks on sorting. Suppose it is required to distinguish all different sortings of n elements, thus distinguishing among ti! different objects. If there is no information other than comparing the order of two single elements, then the tree of all possible decision paths can be written down. The sorting provides a path through this binary tree. As seen, any binary tree of depth h (i.e. h — 1 is the length of the longest path from the root to a leaf) has at most 2h~1 leaves. It follows that a tree of depth h satisfying 2h~1 > n\ is needed. Consequently the depth h satisfies h log 2 > log n\. log n\ = log 1 + log 2 + ■ ■ ■ + log n > / log xdx = n log n — (n — 1) 71 log 71 — 71 h > - > 71 log 71 — 71 log 2 It is proved that the depth of the necessary binary tree is bounded from below by an expression of size n log n. Hence no algorithm based only on the comparison of two elements of the ordered set can have a better worst case run than 0(n log ti). The latter claim is not true if there is further relevant information. For example, if it is known that only a finite number of k values may appear among our ti elements, then one may simply run through the list counting how many occur-rencies of the individual values are there, and hence write the right ordered list from scratch. This all happens in linear time! 13.B.14. Decide whether the given planar graph is maximal. Add as many edges as possible while keeping the graph planar. Solution. The graph has 14 vertices and 20 edges, hence 31VI — 6 — \E\ = 16. Therefore, it is not maximal, and 16 edges can be added so that it is still planar. (12V----------------------»-{13) XQ)--X2J 13.1.18. Tree isomorphisms. Simple features of trees are jji,, exploited in order to illustrate the (generally difficult) problem of graph isomorphisms on this special class of graphs. First, strengthen the structure to be preserved by the isomorphisms. Then show that the obtained procedure is also applicable to the most general trees. In order to keep more information about the structure of rooted trees, remember the relations parent-child. Also have the children of every node sorted in a specific order (for instance, from left to right if drawn on a diagram). Such trees are called ordered trees or plane trees. They are formally defined as a tuple T = (V, E, vr, v), where v is a partial order on the edges such that a pair of edges is comparable if and only if they have the same tail (i.e. they all go from one parent vertex to all its children). A homomorphism of rooted trees T = (V,E,vr) and V = (V, E, v'r) is a graph morphism tp : T -> V such that vr is mapped to v'r; similarly for isomorphisms. For plane trees, it is further required that the morphism preserves the partial orders v and \J. 941 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Ten (dashed) have been added. For the sake of clarity, the other 6 edges that connect the vertices of the "outer" 9-gon are not drawn. □ 13.B.15. Prove or disprove each of the following propositions. i) Every graph with fewer than 9 edges is planar. ii) Every graph which is not planar is not Hamiltonian. iii) Every graph which is not planar is Hamiltonian. iv) Every graph which is not planar is not Eulerian (see 13.1.12). v) Every graph which is not planar is Eulerian. vi) Every Hamiltonian graph is planar. vii) No Hamiltonian graph is planar. viii) Every Eulerian graph is planar, ix) No Eulerian graph is planar. o Trees. 13.B.16. Determine the code of the following graph as a i) plane tree, ii) tree. Solution. i) Using the procedure from 13.1.18, we get the following code of the plane tree: 0 0 0001100100101111 10 01010000101011111 1 The highlighted vertex in the graph is indeed the appropriate candidate to be the root since it is the only element of the center of the tree. ii) As for the unique construction of a plane tree, we sort the descendants lexicographically in ascending order. Thus, the wanted code is 9 00000010101111010110000010110110011111. Coding the plane trees Given the plane tree T = (V, E, vr, v). It has a code W by strings of ones and zeros, denned recursively as follows: Start with the word 01 for the root v0 and write W = QW\.. .Wil, where Wi axe the I still unknown words for the subtrees rooted by the children of v0. In particular the code of the tree with just one vertex is W = 01. Applying the same procedure recursively over the children and concatenating the results defines the code. The tree in the left-hand diagram above is encoded as follows (the children of a vertex are ordered from left to right, Wr is for the code of the child with key r): OWWsl OOWiWslOW-gll ^ OOOlOWWellOOWiilll m- 000100101110001111. Imagine drawing the entire plane tree with one move of the pencil, starting with an arrow ending in the root and going downwards with arrows towards the leaves and then upwards to the predecessors, reaching consecutively all the leafs from the left to the right and writing 0 when going down and 1 when going up. The very last arrow is then leaving the root upwards. Theorem. Two plane trees are isomorphic if and only if their codes are the same. Proof. By construction, two isomorphic trees are assigned the same code. It remains to show that different codes lead to non-isomorphic plane trees. This is proved by induction on the length of the code (i.e. the number of zeros and ones). This length is 2{\E\ + 1), (twice the number of vertices; therefore, the proof can be viewed as an induction on the number of vertices of the tree T. The shortest code corresponds to the smallest tree on one vertex. Assume that the proposition holds for all trees up to n vertices, i.e. for codes of length up to k = In, and consider a code of the form 0W1, where W is a code of length In. Find the shortest prefix of W\ which contains the same number of zeros and ones (when drawing a diagram of the tree, this is the first moment when we return to the root of the tree that corresponds to the code 0W1). Similarly, find the next part of the code W that contains the same number of zeros and ones, etc. Hence the code W can be written as W = W\ W2 ■ ■ ■ We. By the induction hypothesis, the codes Wi correspond uniquely (up to isomorphism) to plane trees, and the order of their roots, being the children of our tree T, is given uniquely by the order in the code. Therefore, the tree T is determined uniquely by the code 0W1 up to isomorphism. □ Use the encoding of plane trees to encode any tree. >"""'>*. Deal first with the case of rooted trees. Deter-S$. fiX, mine the order of the children of every vertex r>(/T if uniquely up to isomorphism. The order is unim-_JssL— portant if and only if the subgraphs determined by the respective children are isomorphic. CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS □ 13.B.17. For each of the following codes, decide whether there exists a tree with this code. If so, draw the corresponding tree. • 00011001111001, • 00000110010010111110010100001010111111. o Huffman coding. We are working with plane binary trees where every edge is colored with a symbol of an output alphabet A (we often have A = {0,1}). The codewords C are those words over the alphabet A to which we translate the symbols of the input alphabet. Our task is to represent a given text using suitable codewords over the output alphabet. We can easily see that it makes sense to require that the set of codewords be prefix, (i. e., no codeword can be a prefix of another one); otherwise, we could get into trouble when decoding. We will use binary trees for the construction of binary prefix codes (i. e. over the alphabet A = {0, 1}). We label the edges going from each vertex by 0 and 1. Further, we label the leaves of our tree with symbols of the input alphabet. This results in a prefix code over A for these symbols by concatenating the edge labels along the path from the root to the corresponding leaf. Clearly, this code is prefix. Moreover, if we take into account the frequencies of particular symbols of the input alphabet in the input text, we obtain a lossless data compression. Let M be the list of frequencies of the symbols of the input alphabet in the input text. The algorithm constructs the optimal binary tree (the so-called minimum-weight binary tree) and the assignment of the symbols to the leaves. • Select the two least frequencies w1,w2 from M. Create a tree with two leaves labeled by the corresponding symbols and root labeled by w1+w2, then replace the values w1,w2 with the new value w1 + w2 in M. • Repeat the above step; if the selected value from M is a sum, then simply "connect" the existing subtree. • The code of each symbol is determined by the path from the root to the corresponding leaf (left edge = "0", right edge = "1", for instance). The same construction can be used as for the plane trees, ordering the vertices lexicographically with respect to their codes (see 12.5.5 for the related concepts). This means that codes W\, W2 satisfy W\ > W2 if and only if W\ contains a one at an earlier position than W2 or W2 is a prefix of W\. The rooted tree as a whole is described by the same recursive procedure: if the children of a vertex v are coded by W±,... ,Wi, then the code of the vertex v is OWi... Wel, where the order is selected so that W\ < W2 < ■ ■ ■ < We. If no vertex is designated to be the root of a tree, the root can be designed so that it would be almost "in the middle" of the tree. This can be realized by assigning an integer to every vertex of the tree which describes its eccentricity. That eccentricity exT (v) of a vertex v in a graph T is denned to be the greatest possible distance between v and some vertex w in T. This concept is meaningful for all graphs; however, by the absence of cycles in trees, it is guaranteed that there are at most two vertices with the maximal eccentricity. Lemma. Let C(T) be the set of those vertices of a tree T whose eccentricity is minimal. Then, C(T) contains either a single vertex, or exactly two vertices, which are connected with an edge in T. Proof. The claim is proved by induction, using the trivial fact that the most distant vertex from any vertex v must be a leaf. Therefore, the center of T coincides with the center of the tree T" which is created from the tree T by removing all its leaves and the corresponding edges. After a finite number of such steps, there remains either just one vertex, or a subtree with two vertices. □ C(T) determined by the latter lemma is called the center of the graph, and the minimal eccentricity is called the radius of the graph. A unique (up to isomorphism) code can now be assigned to every tree. If the center of T contains only one vertex, use it as the root. Otherwise, create the codes for the two rooted subtrees of the original tree without the edge that connects the vertices of the center, and the code of T is the code of the rooted tree (T, x), where x is the vertex of the center whose subtree has lexicographically smaller code. Corollary. Trees T and X" are isomorphic if and only if they are assigned the same code. The above ideas imply that the algorithm for verifying tree isomorphism can be implemented in linear time with respect to the number of vertices of the trees. The trees form a special class of graphs. They are often used in miscellaneous variations and with additional requirements. We return to them later, in connection with practical applications. Now follows another extraordinarily important class of graphs. 943 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.B.18. Find the Huffman code for the input alphabet with the frequencies ['A' :16, 'B' :13, 'C :9, 'D' :12, 'E' :45, ' F' :5]. Solution. If we naively assign a 3-bit code to each letter of the alphabet, then this message of length 100 consumes 300 bits. We show that Huffman code is more succinct. We build the tree according to the algorithm. 0 *fi:45 1 ABCDF:55 0 •"B:13 BD:25 1 ^0:12 0 «^F:5 A 0 FC:14 1 % C:9 AFC:30 1 % A: 16 We have thus obtained the codes A : 111,5 : 100, C : 1101, D : 101, E : 0, F : 1100. Multiplying the code lengths by the frequencies, we can see that a 100-letter message with the given distribution of letters is encoded into only 3 ■ 16 + 3 ■ 13 + 4 ■ 9 + 3 ■ 12 + 1 ■ 45 + 4 ■ 5 = 224 bits. □ C. Minimum spanning tree 13.C.1. How many spanning trees (see 13.2.6) of the graph K5 are there? And how many are there if we do not distinguish isomorphic ones? Solution. There are three pairwise non-isomorphic spanning trees (with degree sequences (1, 2,2,2,1), (1,2,3,1,1), (4,1,1,1,1)). The corresponding classes of isomorphic spanning trees have 5 ■ (g) • 2, 5 ■ 4 ■ 3, and 5 elements, respectively. Altogether, there are 125 = 53 spanning trees, which is in accordance with Cayley's formula for the number of spanning trees of a complete graph (see 13.4.11). □ 13.C.2. Let the vertices of KG be labeled 1, 2,..., 6 and let every edge {i, j} be assigned the integer [(i + j) mod 3] + 1. How many minimum spanning trees are there in this graph? Solution. There are five edges whose value is 1: four of them lie on the cycle 12451 and the remaining one is the edge 36. Therefore, they form a disconnected subgraph of the complete 13.1.19. Planar graphs. Some graphs are drawn in the plane in such a way that their edges do not "cross" one another. This means that every vertex of the graph is identified with a point of the plane, and an edge between vertices v, w corresponds to a continuous curve c : [0,1] —> R2 that connects the vertices c(0) = v and c(l) = w. Furthermore, suppose that edges may intersect only at their endpoints. This describes a planar graph G. The question whether a given graph admits a realization as a planar graph often emerges in practical problems. Here is an example: Providers of water, electricity, and gas have their connection spots near three houses (each provider has one spot). Each house wants to be connected to each resource so that the connections would not cross (they might want not to dig too deep, for instance). Is it possible to do this? The answer is: "no". In this particular case, it is clear from the diagram. There is a complete bipartite graph K3t3, where three of the vertices correspond to the connection spots, while the other three represent the houses. The edges are the connections between the spots and the houses. All edges can be placed except the last one - see the diagram, where the dashed edge cannot be drawn without crossing any more: For a complete proof, more mathematical tools are needed. A complete explanation is not provided here, but an indication of the reasoning follows. One of the basic results from the topology of the J.,<,, plane (the Jordan curve theorem) states that every closed continuous curve c in the plane that is not self-intersecting (i.e. it is a "crooked circle") divides the plane into two distinct parts. In other words, every other continuous curve which connects a point inside a curve c and a point outside c must intersect c. If the edges are realized as piecewise linear curves (every edge composed of finitely many adjacent line segments), then it is quite easy to prove the lordan curve theorem (you might do it yourself!). The general theorem can be proved by approximating the continuous curves by piecewise linear ones (quite difficult to do, but it is much easier if the curve is assumed to be piecewise differentiable). Consider the graph K3t3. The triples of vertices that are not connected with edges are indistinguishable up to order. Therefore the thick cycle can be considered the general case of a cycle with four points in the graph. The position of the remaining two vertices can then be discussed. In order for the graph to be planar, either both of the vertices must lie inside the cycle, or both outside. Again, these possibilities are equivalent, so it can be assumed without loss of generality 944 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS graph, so the spanning tree must contain at least one edge of value 2. Thus, the total weight of a minimum spanning tree is at least 4 1 + 2 = 6. And indeed, there exist spanning trees with this weight. We select all the edges of value 1 except for one that lies on the mentioned cycle and connect the resulting components 1245 and 36 with any edge of value 2. There are four such edges. Altogether, there are 4 ■ 4 = 16 minimum spanning trees. □ 13.C.3. Find a minimum spanning tree of the following graph using i) Kruskal's algorithm, ii) Jarnik's (Prim's) algorithm. Explain why we cannot apply Boruvka's algorithm directly. 4 12 17 1 3 1 1 1 5 3 2 2 5 Solution. The spanning tree is V A A I T\ 4 12 1 1 3 1 xH v \H 3 2 \ 1 Boruvka's algorithm cannot be applied directly since the mapping of the weights to the edges is not injective. However, this can be fixed easily by slight modifications of the weights. □ 13.C.4. Consider the following procedure for finding the shortest path between a given pair of vertices in an undirected weighted graph: First, we find a minimum spanning tree. Then, we proclaim that the only path between the pair of vertices in the obtained spanning tree is the shortest one. Prove the correctness of this method, or disprove it by providing a counterexample. O 13.C.5. We are given the following table of distances of world metropolises: London, Mexico City, New York, Paris, that they are in opposite sides, as are the black vertices on the diagram. Now, their position with respect to a suitable cycle with two thick edges and two thin black edges can be discussed (i.e. through three gray vertices and one black one). Then, we can discuss the position of the remaining black vertex with respect to this cycle. This leads to the impossibility of drawing the last (dashed) edge without crossing the thick cycle. It can be shown similarly that the complete graph K5 is not planar either. We provide a pure combinatorial argument why K5 and K3t3 cannot be planar graphs below, see the Corollary in the end of the next subsection. Notice that if a graph G is"expanded" by dividing some of its edges (i.e. adding new vertices in the edges), then the new graph is planar if only if G is planar. The new graph is a subdivision of G. Planar graphs must not contain any subdivision of K3t3 or K5. The reverse implication is also true: KURATOWSKI THEOREM Theorem. A graph G is planar if and only if none of its subgraphs is isomorphic to a subdivision of K33 or K5. The proof is complicated, so it is not discussed here. Much attention is devoted to planar graphs both in research and practical applications. There are algorithms which are capable of deciding whether or not a given graph is planar in linear time. Direct application of the Kuratowski theorem would lead to a worse time complexity. 13.1.20. Faces of planar graphs. Consider a planar graph G embedded in the plane R2. Let S be the set of those points x e R2 which do not belong to any edge of the graph (nor are vertices). In this way, the set R2 \ G is partitioned into connected subsets Si, called the faces of the planar graph G. Since the graphs are finite, there is exactly one unbounded face So. The set of all faces are denoted by S = {So, Si,..., Sk}, and the planar graph by G = (V,E,S). The simplest case of a planar graph is a tree. Every tree is a planar graph since it can be constructed by step-by-step addition of leaves, staring with single vertex. Of course, the Kuratowski theorem can also be applied- when there is no cycle in a graph G, then there cannot be a subdivision of 7^,3 or K5, either. Since a tree G cannot contain a cycle, there is only one face So there (the unbounded face). Since the number of edges of a tree is related to the number of its vertices, cf. the formula 13.1.15(5), it follows that IVI \E\ + \S\ for all trees. Surprisingly, the latter formula linking the number of edges, faces, and vertices can be derived for all planar graphs. The formula is named after Leonhard Euler. Especially, the 945 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Peking, and Tokyo: / L MC NY P Pe T \ L 5558 3469 214 5074 5959 MC 2090 5725 7753 7035 NY 3636 6844 6757 P 5120 6053 \ Pe 1307/ What is the least total length of wire used for interconnecting these cities (assuming the length necessary to connect a given pair of cities is equal to the distance in the table). O 13.C.6. Using the matrix variation of the larnAnk-Prim algorithm, find a minimum spanning tree of the graph given by the following matrix: (- 12 - 16 - - - 13\ 12 - 16 - - - 14 - — 16 — 12 — 14 — — 16 — 12 — 13 — — — - - - 13 - 14 - 15 — — 14 — 14 — 15 — — 14 — — — 15 — 14 \l3 - - - 15 - 14 o D. Flow networks 13.D.1. An example of bad behavior of the Ford-Fulkerson algorithm. The worst-case time complexity of the Ford-Fulkerson algorithm is 0(E ■ |/|), where |/| is the size of a maximum flow. Consider the following network: 0/100 0/100 < °' > 0/100^0/100 The bad behavior of the algorithm is due to the fact that it uses depth-first search to find unsaturated paths. Solution. We proceed strictly by depth-first search (examining the vertices from left to right and then top down): 1/100 0/100 1/100 1/100 < 7 > < 7 > 0/100 1/100 1/100 1/100 2/100 1/100 2/100 2A 1/1 2/100 2/100 < 7 > < 7 > 1/100 2/100 2/100 2/100 number of faces is independent of the particular embedding of the graph in the plane: Euler's formula Theorem. Let G = (V, E, S) be a connected planar graph. Then, \V\ - \E\ + ISI = 2. Proof. The proof is by induction on the number of edges. The graph with zero or one edge satisfies the formula. Consider a graph G with \E\ > 1. If G does not contain a cycle, then it is a tree, and the formula is already proved for this case. Suppose that there is an edge e of G that is contained in a cycle. Then, the graph G' = G \ e is connected, and it follows from the induction hypothesis that G' satisfies Euler's formula: \V\ - (\E\ - 1) + (\S\ - 1) = 2, since removing an edge necessarily leads to merging two faces of G into one face in G'. Hence Euler's formula is valid for the graph G. □ Corollary. Let G = (V,E,S), be a planar graph with \V\ = n > 3, and \E\ = e. Then • There is the inequality e < 3n — 6 which becomes equality if and only if G is a maximal planar graph (adding any edge to G, would violate planarity). • IfG does not contain a triangle (i.e. the graph K3 is not a subgraph), then e < 2n — 4. Proof. Continue adding edges to a given graph until it is maximal. If the obtained maximal graph G satisfies the inequality with equality, then the inequality holds for the original graph as well. Similarly, if the graph G is not connected, two of its components can be connected with a new edge, so such a graph cannot be maximal. Even if it were connected but not 2-connected, there would exist a vertex v e V such that when it is removed, the graph G collapses into several components Gi,..., Gk, k > 2. However, then an edge can be added between these components without destroying the planarity of the original graph G (draw a diagram!). Therefore, it can be assumed from the beginning that the original graph G is a maximal planar 2-connected graph. As shown in theorem 13.1.11, every 2-connected graph can be constructed from the triangle K3 by splitting edges and attaching new ones. It is easily proved by induction that every face of a planar graph must be bounded by a cycle (which seems intuitively apparent). However, if there is a face of our maximal planar graph G that is not bounded by a triangle, then this face can be split with another edge (a "diagonal" in geometrical terminology), so G would not be maximal. It follows that all faces of G are bounded by triangles K3. Hence 3|S| = 2\E\. 946 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS .../100 .../100 100/100 100/100 < > < 7 > .../100 .../100 100/100 100/100 We can see that 200 iterations were needed in order to find the maximum flow. □ It suffices to substitute \S\ = ||B| for the number of faces in Euler's formula. The second proposition is analogous; now, the faces of the maximal planar graph without triangles are bounded by either four or five edges, whence it follows that 415*1 < 2\E\ with the equality if and only if there are just quadrangles there. □ 13.D.2. Find the size of a maximum flow in the network given by the following matrix A, where vertex 1 is the source and vertex 8 is the sink. Further, find the corresponding minimum cut. The corollary implies (even without the Kuratowski theorem) that neither K5 nor is planar: in the former case, \V\ = 5 and \E\ = 10 > 3|V| - 6, while in the latter, \V\ = 6, \E\ = 9 > 2\V\ — 4, which is again a contradiction since K^,z does not contain a triangle. A = - 16 24 12 - - - - ----30--- ----96 12- ----- - 21- -----9-15 ------ - 9 -------18 Solution. The following augmenting semipaths are found: 1-2-5-8 with residual capacity 15. 1-2-5-6-8 with residual capacity 1. 1-3-5-6-8 with residual capacity 8. 1-4—7-8 with residual capacity 12. 1-3-7-8 with residual capacity 6. The total size of the found flow is 42. We can see that it is indeed of maximum size from the fact the cut consisting of edges (5, 8), (6,8), and (7, 8) has also size 42 (and it is thus a minimum cut). □ 13.D.3. The following picture depicts a flow network (the numbers f/c define the actual flow and the capacity of a given edge, respectively). Decide whether the given flow is maximum. If so, justify your answer. If not, find a maximum flow and describe the used procedure in detail. Find a minimum cut in the network. picture skipped Solution. In the given network, There exists an augmenting (semi)path 1—2—3^4—8 with residual capacity 4. Its saturation results in a flow of size 32. Since the cut (3,8), (5, 8), (2,4), (6,4) is of the same size, we have found a maximum flow. 13.1.21. Convex polyhedra in the space. Planar graphs can be imagined as drawn on the sphere instead in 'z^^tjy' me plane. The sphere can be constructed from the plane by attaching one point "at infinity". Again, faces of such graphs can be discussed, and the faces are now equivalent to one another (even the face 5*0 is bounded). On the contrary, every convex polyhedron PCS3 can be imagined as a graph drawn on the sphere (project the vertices and edges of the polyhedron onto a sufficiently large sphere from any point inside P). Dropping a point inside one of the faces (that face becomes the unbounded face So) then leads to the planar graph as above - the sphere with the hole is spread in the plane. The planar graphs that are formed of convex polyhedra are clearly 2-connected since every pair of vertices of a convex polyhedron lies on a common cycle. Moreover, every face is interior to its boundary cycle and the graphs of convex polyhedra are always 3-connected. In fact, they are just such graphs as the following Steinitz's theorem says (we omit the proof): Steinitz's polyhedra theorem Theorem. A graph G is the graph of a convex polyhedron if and only if it is planar and 3-vertex-connected. 13.1.22. Platonic solids. As an illustration of the combina-=-., torial approach to polyhedral graphs, we clas-1/ sify all the regular polyhedra. These are those built up from one type of regular polygons so that the same number of them touch at every vertex. It was known as early as in the epoch of the ancient philosopher □ Plato that there are only five of them: 947 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.D.4. Find a maximum flow and a minimum cut in the following flow network (source=l,sink=14). T)-4/6-*i 9 12/30 2/2 0/2 0/14 2/2 12/18 (J)" 1°/10^5^2/18 ^^6/6 18/18 8/8 8/20 0/8 4/4 22/32 8s)- 14/28 tt) Solution. The paths are saturated in the following order: 1 ^> 2 ^> 7 ^> 10 ^ 5 ^> 8 ^ 11 4 13 ^> 14 r.2 1 ^> 2 ^ 7 ^> 10 ^ 5 A 8 ^> 13 ^> 14 r.8 We have found a flow of size 50. And indeed, it is a maximum flow since there is no further unsaturated path. If we look for the reachable vertices, we can also find a cut with capacity 50, consisting of edges [2,4] : 4, [7,9] : 2, [7,12] : 4, [10,12] : 2, [10,14] : 6, [13,14] : 32. □ 13.D.5. Find a maximum flow in the following network on the vertex set {1,2,..., 9} with source 1 and sink 9 using the Ford-Fulkerson algorithm (during the depth-first search, choose the vertices in ascending order). Find a minimum cut in this network. Describe the steps of the procedure in detail. The edges e e E as well as the lower and upper bounds on the flow (/(e) and u(e)) and the current flow /(e) are given in the table: Translate the condition of regularity to the properties of the corresponding graphs: Every vertex needs the same degree d > 3, and the boundary of every face must contain the same number k > 3 of vertices. Let n, e, s denote the total number of vertices, edges, and faces, respectively. Firstly, the relation of the vertex degrees and the number of edges requires dn = 2e. Secondly, every edge lies in the boundary of exactly two faces, so 2e = ks. Thirdly, Euler's formula states that 2e 2e 2=n — e + s = —— e H——. a k Put this together. The constants d and k must satisfy 1 1 1 _ 1 d ~ 2 + k ~ e' Since d, k, e, n are positive integers (in particular, | > 0), this equality restricts the possibilities. Especially, the left-hand side is maximal for d = 3. Substitute this value, to obtain the inequality 1 1 _ 1 6 k e It follows that k e {3,4, 5} for a general d. The roles of k and d are symmetric in the original equality, so also d e {3,4,5}. Checking each of the remaining possibilities, yields all the solutions: d k 71 e s 3 3 4 6 4 3 4 8 12 6 4 3 6 12 8 3 5 20 30 12 5 3 12 30 20 It remains to show that the corresponding regular polyhe-dra exist. This is already seen in the above diagrams, but that is not a mathematical proof. The existence of the first three is apparent. Concentrate on the geometrical construction of the regular dodecahedron (draw a diagram!). Begin with a cube, building "A-tents" on all its sides si-i multaneously. The upper horizontal poles are set on the level of the cube's sides so that those of adjacent It sides are perpendicular to each other. Its length is cho-f ^ sen so that the trapezoids of the lateral sides would have three sides of the same length. Now, simultaneously 948 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS e 1(e) u(e) 1(e) (1,2) 0 6 0 (1,3) 0 6 0 (1,6) 0 4 0 (2,3) 0 2 0 (2,4) 0 3 0 (3,4) 0 4 0 (3,5) 0 4 0 (4,5) 3 5 4 (4,8) 0 3 0 e '(e) u(e) f(e) (5,1) 0 3 0 (5,6) 0 6 0 (5,7) 0 5 4 (5,8) 0 5 0 (6,9) 0 5 0 (7,4) 1 6 4 (7,9) 0 3 0 (8,9) 0 9 0 o 13.D.6. A cut in a network (V,E,s,t,w) can also be viewed as a set C C E of edges such that in the network (V,E\ C,s,t,w), there is no path from the source s to the sink t, but if any edge e is removed from C, then the resulting set does not satisfy this property, i. e., there is a path from s to t in (V, E \ C U e, s, t, w). Find all cuts (and their sizes) in the following network: Solution. Let us fix the following edge labeling: Then, there are cuts: {f,i}, {f,h j,a},{f,j,c,a,d,e}, {f,j,c,a,d,£ {bj,c},{b,j,h},{b,i}. Their capacities are 12, 9, 20, 18, 15, 10, and 15, respectively. □ raise all tents while keeping the ratio of the three sides of the trapezoids. There is a position at which the adjacent trapezoid and triangle sides are coplanar. At that position, the regular dodecahedron is created. Now, the regular icosahedron can be constructed via the so called dual graph construction. The dual graph G' to a planar graph G = (V,E,S) has vertices denned as the faces in S and there is an edge between faces 5*1 and 5*2 if and only if they share an edge (i.e. were neighbours) in G. Clearly the dual graph to the dodecahedron is the isosahedron. Exactly as the cube and the octohedron are dual, while tetrahedron is dual to itself. 2. A few graph algorithms In this part, we consider several applications of graph concepts and the algorithms built upon them. 13.2.1. Algorithms and graph representations. As already indicated, algorithms are often formulated with the help of the language of M^'~_ graphs. The concept of an algorithm can be formalized as a procedure dealing with a (directed) graph whose vertices and/or edges are equipped with further information. The procedure consists in walking through the graph along its edges, while processing the information associated to the visited vertices and edges. Of course, processing the information includes also the decision which outgoing edges must be investigated in a further walk, and in which order. In the case of an undirected graphs, each (undirected) edge can be replaced with a pair of directed edges. The graph may also be changed during the run of the algorithm, i.e. vertices and/or edges may be added or removed. In order to execute such algorithms efficiently (usually on a computer), it is necessary to represent the graph in a suitable way. The adjacency matrix representation is one possibility, cf. 13.1.8. There are many other options based on various lists with suitable pointers. The edge list (also the adjacency list) of the graph G = (V, E) consists of two lists V and E that are interconnected by pointers so that every vertex points to the edges it is incident to. and every edge points to its endpoints. The necessary memory to represent the graph as an edge list is 0(\V\ + \E\) since every edge is pointed at twice and every vertex is pointed at d times, where d is its degree, and the sum of the degrees of all vertices equals twice the number of edges. Therefore, up to a constant multiple, this is an optimal way of graph representation in memory. It is of interest in how the basic graph operations are processed in both representations. By the basic operations, is meant: • removal of an edge, . • addition of an edge, • removal of a vertex, • addition of a vertex, • splitting an edge with a new vertex. 949 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.D.7. Find a maximum flow in the network given in the above exercise. O Further exercises on maximum flows and minimum cuts can be found on page 1003. £. Classical probability and combinatorics In this section, we recall the methods we learned as early as in the first chapter. 13.E.1. We throw n dice. What is the probability that none of the values 1, 3, 6 is cast? Solution. We can also see the problem as throwing one dice n times. The probability that none of the values 1, 3, 6 is cast in the first throw is 1/2. The probability that they are cast neither in the first throw nor in the second one is clearly 1/4 (the result of the first throw has no impact on the result of the second one). Since this holds generally, i. e., the results of different throws are (stochastically) independent, the wanted probability is 1/2™. □ 13.E.2. We have a pack of ten playing cards, exactly one of which is an Ace. Each time, we randomly draw one of the ten cards and then put it back. How many times do we have to repeat this experiment if we require the probability of getting the Ace at least once to be at least 0.9? Solution. Let A{ be the event "the Ace is picked in the i-th draw". The events A{ are (stochastically) independent. Hence, we know that P{UAlj=l-{l-P{A1))-{l-P{A2))---{l-P{An)) for every n eN. We are looking for an n e N such that 1 - (1 - P(Ax)) ■ (1 - P(A2)) ■■■ (1 - P(An)) > 0.9. Apparently, we have P(Aj) = 1/10 for any i e N. Therefore, it suffices to solve the inequality !-(&)"> 0.9, whence n > log^asi' where a >1. Evaluating this, we find out that we have to repeat the experiment at least 22 times. □ 13.E.3. We randomly draw six cards from apack of 32 cards (containing four Kings). Calculate the probability that the sixth card is a King and, at the same time, it is the only King drawn. It is apparent that if the matrix is represented by an array of zeros and ones, then the first and second operations can be executed in 0(1) (constant time), while the others are in O(n) (linear time). In the case of the adjacency list, implementation of the data structures is crucial for the time complexity. However, all of the operations should be proportional to the number of edited data units provided the corresponding item(s) are already found. For instance, if a vertex is removed, then all of the edges that are incident to it must also be removed. The matrix representation is also useful in theoretical discussions about graphs, using matrix calculus: 13.2.2. Searching in graphs. Many useful algorithms are based on going through all vertices of a given '■-jt'^-rP"' graph step by step. Usually, the vertex is given to start with or it is selected at the beginning of the procedure. At every stage of the search, each vertex has (exactly) one of the following situations: • processed - it has been visited and completely processed; • active - it has been visited and is prepared to be processed; • sleeping - it has not been visited yet. At the same time, information about processed edges is retained. At every stage, the sets of vertices and/or edges in these groups must form a partition of the sets V and E while one of the active vertices is being processed. The general principle on searching through the vertices is illustrated first. In the subsequent subsections, such procedures are used to build algorithms solving particular problems. At the beginning of the algorithm, there is just one active vertex and all the others are sleeping. At the first step, traverse all edges incident to the active vertex and change the status of their other endpoints from sleeping to active. Then, the active vertex started may be marked as processed, and another active vertex may be chosen. In the following steps, always go through those adjacent edges not yet met, marking their other endpoints as active. This algorithm can be applied to both directed and undirected graphs. In practical problems, the search is often restricted to only some edges going from the current vertex. This is an insignificant change to the algorithm. To specify the algorithm completely, a decision must be made in which order to process the active vertices and in which order to process the edges going from the current vertex. In general, the two simplest possibilities of processing the vertices are: (1) they are processed in the same order as they were visited (queue), (2) they are processed in the reversed order than they were visited (stack). The former case, is called a breadth-first search. The latter case is called a depth-first search. 950 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Solution. By the theorem on product of probabilities, the result is 0.0723. □ 28 27 26 25 24 _4_ 32 ' 31 ' 30 ' 29 ' 28 27 13.E.4. We randomly draw two cards from a pack of 32 cards (containing four Aces). Calculate the probability that the second card drawn is an Ace if: a) the first card is put back; b) the first card is not put back. Solution. If the first card is put back in the pack, then we clearly repeat an experiment with 32 possible outcomes (with the same probability), 4 of which are favorable. Therefore, the wanted probability is 1/8. However, even if we do not put the first card back, the probability is the same. Clearly, the probability of a given card being drawn the first time is the same as for the second time. Of course, we can also apply the conditional probability. This leads to 4 3 28 4 _ 1 32 ' 31 + 32 "31 ~ 8' 13.E.5. Combinatorial identities. Use combinatorial means to derive the following important identities (in particular, do not use induction): Arithmetic series 5Zfc=o ^ ~ Geometric series Y^l-. 2 -m -l k=0 • x-1 Binomial theorem (x + y)n = Y^k=o (fc)2^™ k Upper binomial theorem £™=0 Q = ( Vandermonde's convolution1 (m+n) = ELo (7) L-k)■ o 13.E.6. An urn consists of 30 red balls and 70 green balls. What is the probability of getting exactly k red balls in a sample of size 20, ) < k < 20 if the sampling is done with replacement (repetition allowed)? Solution. Any time we take a sample from the urn we put it back before the next sample (sampling with replacement). Thus in this experiment each time we sample, the probability of choosing a red ball is ^ and we repeat this in 20 independent trials. Thus, using the binomial formula, we obtain \20-fc P(kredballs) = (0.3)fe(0.7)2 □ 13.E.7. An urn consists of 30 red balls and 70 green balls. What is the probability of getting exactly k red balls in a sample of size 20, ) < k < 20 if the sampling is done without replacement (repetition not allowed)? The role of the data structures used for representing the graph is immediately apparent: The adjacency list allows passage through all edges going from a given vertex in a time proportional to the number of them. Each edge is visited at most twice since it has only two endpoints. Hence the following result: Theorem. Both the breadth-first and depth-first searches run in 0((n + m)K) time, where n and m are the number of vertices and edges of the graph, respectively. K is the time needed for processing an edge or a vertex. The following diagram illustrates the breadth-first search through the Petersen graph: •........• •........• •........• •........• The first 8 steps are shown here. The circled vertex is the one to be processed, the bold vertices are the already processed one, while the dashed edges are those that have been processed, and the small vertices adjacent to some dashed edges are the active ones. At the given vertex, the edges are processed counterclockwise, beginning with the direction "straight down". The diagram below illustrates the depth-first search applied to the same graph. Note that the first step is the same as above. Also called hockey identity. •........<9 •........• •........•........• As a simple example of graph searching, consider an algorithm for finding all connected components of a given graph. The only information that must be processed during the search (no matter whether breadth-first or depth-first) is which component is being examined. The search, as here presented, passes exactly the vertices of a single component. Hence, one can start with all vertices in the sleeping state and choose any one of them. During the search, whenever there are no more active vertices to be processed, the search of one component is finished. One can then choose an arbitrary sleeping vertex and continue likewise. The algorithm terminates as soon as there are no more sleeping vertices remaining. 951 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Solution. Let A be the event of getting exactly k red balls. To find P(A) we need to find \A\ and the total number of possibilities \S\. Here 5*1 (1)00,20). Next, to find \ A\, we need to find out in how many ways one can choose k red balls and 20 — k green balls. Thus, A\ = %,k)(%,20-k). Thus, P(A) \A\ Js\ 0,fc)(7)0,20-fc) (1)00,20) □ 13.E.8. Assume that there are k people in a room and we know that • k = 5 with probability \; • k = 10 with probability \\ • k = 15 with probability \. i) What is the probability that at least two of them have been born in the same month? Assume that all months are equally likely. ii) Given that we already know there are at least two people that celebrate their birthday in the same month, what is the probability that k = 10? Solution. Let Ak be the event that at least 2 people out of k people have birthdays on the same month. We have The complimentary event Bk is that none out of k people have birthday on the same month, i.e. there are exactly k distinct months when these people were born. Thus P(Ak) = 1 - ^12fe,for2 < k < 12. / For k > 12, obviously, P(Ak) = 1. Therefore the required P = Ip(A5)+ip(Al0)+ip(A15) = I(i__i-i25)+i(l The second part of the question asks for conditional probability P(k = \Q\A). According to Bayes' rule: 13.2.3. Natural metrics on graphs. The concept of "path length" is used earlier. This recalls the general idea of distance. The concept of distance in graphs can be built mathematically in this manner. For an (undirected) graph, define the distance between vertices v and w to be the number dc(v,w). This is the number of edges on the shortest path from v to w. If there is no such path, write dc(v, w) = oo. For the sake of simplicity, consider only connected graphs G. The function dc ■ V x V —> N defined as above satisfies the usual three properties of a distance (it is recommended to compare this to the issues from the relevant part of chapter seven, see 7.3.1 (the page 482): • dc(v, w) > 0, and dc(v, w) = 0 if and only if v = w; • the distance is symmetric, i.e. dc(v, w) = dc(w, v); • the triangle inequality holds; i.e. for every triple of vertices V, w, z, dc(v, z) < dc(v, w) + da(w, z). dc is a metric on the graph G. Besides these three properties, every metric on a graph apparently satisfies the following: • dc(v, w) is always a non-negative integer; • if dc(v,w) > 1, then there exists a vertex z distinct from v and w such that dc(v, w) = dc(v, z) + dc(z, w). The following is true: Every function dc on V x V (for a finite set V), satisfying the five properties listed above, allows to define the edges E so that G = (V,E) is a graph with metric dc-Prove this yourself as an exercise! (It is quite clear how to consecutively construct the corresponding graph. It remains "merely" to show that the given function da could be achieved as the metric on the constructed graph.) 13.2.4. Dijkstra's shortest-path algorithm. One may sus-pect that the shortest path between a given ver- tex v and another given vertex w can be found ^ ^_ _|_ iby breadth-first searching the graph. With this approach, discuss first the vertices which are reachable with one edge from the initial vertex v, then those which are two edges distant, and so on. This is the fundamental idea of one of the most often used graph algorithms - the Dijkstra's algo- P(k = W\A) = P(A\k = W)P(k = 10) P(A1 rühmt - £ifii2i210 P(A) 4P(A)) ~ Pi2 This algoriQijUiiois able Lo find the shortest paths even (1 ~ in/ prt5blems^fr6m~pfa\iticeL ~wilere each edge e is assigned a weight w(e), which is a positive real number. When looking for shortest paths, the weights are to represent lengths of the □ 13.E.9. Hat matching problemN guests arrive at a party. Each person is wearing a hat. All hats are collected and then randomly redistributed at the departure. What is the probability that at least one person receives his/her own hat? Edsger Wybe Dijkstra (1930 - 2002) was a famous Dutch computer scientist, being one of the fathers of this discipline. Among others, he is credited as one of founders of concurrent computing. He published the above algorithm in 1959 952 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Solution. Let A{ be the the event that person i receives own hat. Then the task is to find P(E), where E = Vjf=1Ai. To find P(E) we use inclusion-exclusion principle: P(E) = P(U?=1Ai) = ^P(^) -J2P(AinAj) + i oo. + (-i)N — □ 13.E.10. Texas Hold 'em Poker. Now, we solve several simple problems about one of the most popular card games-Texas Hold'em Poker. We do not present its rules; they can be easily found on the Internet. What is the probability that: i) we are dealt a pair? ii) we are dealt an Ace? iii) we have one of the six best poker combinations at the end? 0 for v = vo, oo for v v0. Set Z = V, W = 0. (2) Cycle condition: If every vertex y G Z is assigned oo, the algorithm terminates; otherwise the algorithm continues with another iteration. (In particular, the algorithm tex|nin^tHs- Ifi =J.) %pdate of the verted statuses: • Find the set N of those vertices v G Z for which d(v) = S is as small as possible: S = min{d(y); y G Z}. • All vertices which have been in W are removed and marked as processed; the new set of active vertices is W = N, while all these vertices are removed from Z, i.e. they are no more sleeping. (4) Cycle body: For each edge e G Ewz (i-e. whose tail is an active vertex v and head is a sleeping vertex y: • if d(v) +w(e) < d(y), then update d(y) to d(v) + w(e). Move back to check the cycle condition (step 2). 953 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS iv) we win if we are holding an Ace and a Three and there are three Twos and the differently-suited Ace on the table (the river has not been dealt yet)? Solution. i) There are 4 cards of each of 13 ranks. Therefore, there are 13(g) = 78 pairs. The total number of pairs is (1324) = 1326. Thus, the wanted probability is ^ = 0.06. ii) One of the cards is an Ace (there are four possibilities) and the other one is arbitrary (51 possibilities). However, this includes the (2) = 6 pairs of Aces twice. Therefore, the number of favorable cases is only 4 -51 — 6 = 198 and the wanted probability is = 0.15. iii) We compute the probabilities of the particular combinations when dealt five cards at random: ROYAL FLUSH: There is exactly one such combination for each suit-four in total. Further, there are (552) = 2598960 possibilities for a hand of five cards. Thus, the probability is approximately 1.5 ■ 10~6, very low indeed. STRAIGHT FLUSH: The highest card of the straight must be between 5 and K, i. e., there are 9 possibilities for each suit. Altogether, the probability is 259386960 = 1.4- 10"5. POKER (FOUR OF A KIND): There are 13 possibilities for the quad and the fifth card can be arbitrary (48 possibilities). Hence: 2Jg960 = 2.4 ■ 10"4. FULL HOUSE: There are 13(4) = 52 possibilities for the triple and 12 (4) = 72 possibilities for the remaining pair. Altogether, = 1.4-10" FLUSH: There are 4 suits and ( 3) hands for each suit, i. e., 4 • 5148 possibilities in total. However, we must not count the straights again. There are 40 of them, so the resulting probability is 2c5n18°n860 = 2 ■ 10~3. 2598960 STRAIGHT: The highest card of the straight is between 5 and A, so there are 10 possibilities. Selecting the suit of each card arbitrarily, this gives 10 ■ 45 = 10240 possibilities. However, we must exclude flushes, so the total probability is 2^2°°0 = 3.9 ■ 10"3. Altogether, the probability of one of the best six combinations is approximately 3.9 ■ 10~3 + 2 ■ 10~3 + 1.4 ■ 10"3 + 0.24-10-3 == 7.54 ■ 10"3, i. e., about 0.75%. In the Texas Hold 'em variation, the best 5-card hand of the seven cards is always considered. We have computed the number of favorable 5-card hands and there are 13.2.5. Theorem. For a given vertex v0, the Dijkstra's algorithm finds the distance dw (v) of each vertex v in G that lies in the connected component of the vertex v0. For the vertices v of other connected components, d(v) = oo remains. The algorithm can be implemented in such a way that it terminates in time 0(n log n + m), where n is the number of vertices and m is the number of edges in G. Proof. The algorithm is correct, since • it terminates after a finite number of steps; • when it does, its output has the desired properties. The cycle condition guarantees that in each iteration, the number of sleeping vertices decreases by one at least since N is always non-empty. Therefore, the algorithm necessarily terminates after a finite number of steps. After going through the initialization cycle, (1) dw(v) < d(v) for all vertices v of the graph. Now assume that this property holds when the algorithm enters the main cycle and show that it holds when it leaves the cycle as well. Indeed, if d(y) is changed during step 4, then it is caused by finding a vertex v such that dw(y) < dw(v) + w({v,y}) < d(v) + w({v,y}) = d(y), where the new value is on the right-hand side. The inequality (1) is satisfied when the algorithm terminates. It remains to verify that the other inequality holds as well. For this purpose, consider what is actually done in steps 3 and 4 of the algorithm. Let 0 = do < ■ ■ ■ < dk denote all (distinct) finite distances dw (y) of the vertices in G from the initial vertex vq . At the same time, this partitions the vertex set of the graph G into clusters Vi of vertices whose distance from vq is exactly d{. During the first iteration of the main cycle, N = Vo = {v0}, the number S is just d±, and the set of sleeping vertices is changed to V \ Vo. Suppose this holds up to j-th iteration (inclusive), i.e. the algorithm enters the cycle with N = Vj, 5 = dj, and Ui=o ^ = V \ N. Consider a vertex y e V}+i, i.e. dw(y) = dj+i < oo, and there exists a path (i>o, ei,vi,..., ve, ee+i,y) with total length dj+i. However, then (2) dw(ve) < dj+1 -w({ve,y}) < dj+1. It follows from the assumption that the vertex ve was active during an earlier iteration of the main cycle, and dw(ye) = d(ye) = di for some i < j then. Therefore, after the current iteration of the main cycle has been finished, d(y) = dw(ve) + w({ve,y}) = dj+1 and this does not change any more. It follows that the inequality (1) holds with equality when the algorithm terminates. 954 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS ( g ) possibilities for the remaining two cards. The total number of 7-card hands is (572). We can thus approximate the probability for Texas Hold 'em from the classic Poker by multiplying by the coefficient v 5,52\2 =21. However, note that this is indeed only an approximation of the actual probability since some favorable combinations are counted more than once this way. For instance, we have a full house in the considered 5-card hand and if the arbitrary pair contains the fourth card of the triple, then we actually have a poker (four of a kind), so this combination has been counted more times. On the other hand, the true result only merely differs from the computed approximation, so the probability of one of the best six poker combinations is about twenty times higher than in classic Poker. This may be the reason why this variation is so popular, iv) Clearly, our situation is very good. Hence, it will be easier to count the unfavorable cases when the other player has a better combination. We have Twos full of Aces, so we lose only if the opponent has Aces full of Twos or a poker of Twos, i. e., he must hold the remaining Two or an Ace. In the former case, he surely wins, and this happens in0 + 3 + 4 + -- -+ 4 + 2 = 45 cases (we can see the remaining Twos, a Three, and two Aces) out of all 1035, so the probability of this loss is about 0.043. In the latter case, there are more possibilities. If he holds a pair of Aces and the river card is not the remaining Two, then we lose; otherwise (i. e., if he has only one Ace or the Two appears on the river), it is a tie. Thus, we lose in this case by j^g-ff = 10~3. Altogether, the probability that we win or draw is almost 96 %. n 13.E.11. Four players are given two cards each from a pack consisting of four Aces and four Kings. What is the probability that at least one of the players is given a pair of Aces? Express the result as a ratio of two-digit integers. O 13.E.12. Alex owns two special dice: one of them has only 6's on its sides. The other one has two 4's, two 5's, and two 6's. Martin has two ordinary dice. Each of the players throws his dice and the one whose sum is higher wins. What is the probability that Alex wins? Express the result as a ratio of two-digit integers. O The analysis of the main cycle just made also determines a bound for the running time of the algorithm (i.e. the number of elementary operations with the graph and other corresponding objects). The main cycle is iterated as many times as there are (distinct) distances d{ in the graph. Every vertex, when processed during step 3, is considered exactly once. The vertices that are still sleeping must be sorted. This gives the bound 0{n log n) for this part of the algorithm provided the graph is stored as a list of vertices and weighted edges the sleeping vertices are kept in a suitable data structure that allows the finding of the set of N active vertices in time 0(logn+| 7Y|). This can be achieved if a heap is used. Every edge is processed exactly once in step 4 since the vertices are active only during one iteration of the cycle. □ Note that the inequality (2), essential for the analysis of the algorithm, need not hold if the weights of the edges are allowed to be negative. In practice, many heuristic improvements of the algo- ■ # rithm are applied. For instance, it is not nec-essary to compute the distance between all ver-sSgssiS^s tices if only the distance between a given pair of vertices is of interest. When the vertex is excluded from the active ones, its distance is final. Further, it is not necessary to initialize the distances with the value of infinity. Of course, this is technically impossible, and a sufficiently large constant would ne needed in the implementation. However, there is a better solution than that. For instance, if the shortest path in a road network is required, the known air distances can be used as the initialization values. Then, the bounds for the distances d° (y) between vertices v and v0 can be used such that for any edge e = {v, y}, \d0w(v)-d°w(y)\ 3) fortresses positioned on a circle, j\, numbered 1 through ti. At a given moment, every fortress shoots at one of its neighbors (i. e., fortress 1 shoots at ti or 2, fortress 2 shoots at 1 or 3, etc.). We will refer to the set of hit fortresses as a result (i. e., we are only interested in whether each fortress was hit or not; it does Spanning forest algorithm Input: Graph G = (V,E) Output: A forest T = (V,E) consisting of spanning trees of the components of G. (1) Sort all edges e0,..., em e E in any order. (2) Start with E0 = {e0} and gradually build the sets of edges E{ so that in the i-th step add the edge e{ to E{_ 1 unless this creates a cycle in the graph Gi = (V, Ei_1 U {e,}). If this edge creates a cycle, leave E{ = Ei_1 unchanged. (3) The algorithm terminates if the graph Gi = (V,Ei) has exactly ti — 1 edges at some step i or if i = m, and produces the graph T = (V,Ei). If the algorithm terminates for the latter reason, then the graph is not connected and no spanning tree exists (but there are still the spanning trees of all individual components). Proof. It follows from the rules of the algorithm that the resulting subgraph T of G never contains a cycle. Therefore, it is a forest. If the resulting number of edges is ti — 1, then it must be a tree; see theorem 13.1.15. It remains to show that the connected components of the graph T have the same sets of vertices as the connected components of the original graph G: Every path in T is also a path in G; therefore, all vertices that lie in one tree of T must lie in the same component of G. If there is a path in G from v to w such that its endpoints lie in different trees of T, then one of its vertices v{ is the last one that is in the component determined by v (in particular, vi+1 does not lie in this component). The corresponding edge {vi,vi+1} creates a cycle when examined by the algorithm since otherwise, it would be in T. Since the edges are never removed from T, there is a path between v{ and vi+1 in T, which contradicts the assumptions. Therefore, v and w cannot lie in different trees of T. The number of components of T is given by the fact that the number of vertices and edges differs by one in every tree. The difference increases by one with every component so if there are ti vertices and k edges in the forest, then there are ti — k components. □ Remark. As always, the time complexity of the algorithm is of interest. The addition of an edge creates a cycle if and only if its endpoints lie in the same connected component of the forest T under construction. Knowledge of the connected components of the current forest T is helpful. To implement the algorithm, it is needed to unite two equivalence classes of a given equivalence relation on a given set (the vertex set) and to find out whether two vertices are in the same class or not. The union requires time 0{k), where k is the number of elements to be united, k can be bounded from above by ti, the total number of vertices. However, for each equivalence class it can be noted how many vertices it contains. If, for each vertex, the information to which class it belongs is kept, then the union operation 956 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS not matter whether it was hit once or twice). Let P{n) denote the number of possible results. Prove that the integers P(n) and P(n + 1) are coprime. means to relabel the vertices of one of the united classes. If the smaller class is always selected to be relabeled, then the total number of operations of the algorithm is 0{n log n+m). (As an exercise, complete the details of these considerations yourself!) The above reasoning shows that slightly better in time might be achieved, if only the spanning tree of the connected component of a given starting vertex is of interest: Another spanning tree algorithm Solution. First of all, note that a set of hit fortresses is a possible result if and only if no pair of adjacent-but-one fortresses (i. e., whose numbers differ by 2 modulo n) is unhit. Therefore, if n is odd, then P(n) is equal to the number K(n) of results where no pair of adjacent fortresses is unhit (consider the order 1, 3,5,..., n, 2,4,..., n — 1). If n is even, then P(n) equals K(n/2)2 since fortresses at even positions and those at odd positions can be considered independently. We can easily derive the following recurrent formula for K(n): K(n) = K(n - 1) + K(n - 2). (Well, on the other hand, it is not so trivial... It is left as an exercise for the reader.) Further, we can easily calculate that K(2) = 3, K(3) = 4, K(A) = 7, so K(2) = F(4) - F(0), K(3) = F(5) - F(\),K(A) = F(6) - F(2), and simple induction argument shows that K(n) = F(n + 2) — F(n — 2), where F(n) denotes the n-th term of the Fibonacci sequence (F(0) = 0, F(l) = F{2) = 1). Moreover, since (K(2), K(3)) = 1, we have for n > 3 that (similarly as with the Fibonacci sequence) (K(n),K(n - 1)) = (K(n) - K{n - l),K{n - 1)) = = {K{n-2),K{n-\)) = ■■■ = !. Now, we are going to show that, for every n = 2a, P(n) = K(a)2 is coprime to both P(n + 1) = K(2a + 1) and P(n — 1) = K(2a — 1). It suffices to realize that for a > 2, we have (K(a),K(2a + 1)) = (K(a),F(2)K(2a) + F(l)K(2a - 1)) = (K(a), F(3)K(2a - 1) + F(2)K(2a -2) = ... = (K(a),F(a + l)K(a + l)+F(a)K(a)) = (K(a), F(a + 1)) = (F(a + 2) - F(a - 2),F{a + 1)) Input: G = (V,E) with n vertices and m edges, vertex v e V. Output: The tree T spanning the connected component of v. (1) Initialize T0 = {{v},$). (2) In the i-th step, we build the tree T{ as follows. Look for edges e which are not in Ti_1, but their tail vertex ve is. Take one of them and add it to Ti_1, i.e. add the head vertex to Vi-\ and e to Ei_1. (3) The algorithm terminates as soon as no such edge exists. Apparently, the resulting graph T is connected. The count of its vertices and edges shows that it is a tree. Proof. The vertices of T coincide with the vertices of the connected component of the graph G containing the starting vertex v. Suppose there is a path from v to a given vertex w. If w does not he in T, then label it by v{ the last of its vertices that lie in T (just like in the proof of the previous lemma). However, the subsequent edge of the path would have to be added to T by the algorithm when it terminated, which is a contradiction. Consequently, this algorithm finds a spanning tree of the connected component that contains a given initial vertex v in time 0(ri + m). □ 13.2.7. Minimum spanning tree. All spanning trees of a j§t * given graph G have the same number of edges •r<\ since this is a general property of trees. Just as sSliiSl'g the shortest path in graphs with weighted edges was found, now spanning trees with the minimum sum of their edges' weights is desired. Definition. Let G = (V,E, w) be a connected graph whose edges e are labeled by non-negative weights w(e). A minimum spanning tree of G is such that its total weight does not exceed that of any other spanning tree. This problem has many applications in practice. For instance, networks of electricity, gas, water, etc. Surprisingly, it is quite simple to find a minimum spanning tree (supposing all edge weights w(e) of G are non-negative) by the following procedure6: Joseph Bernard Kruskal (1928 - 2010) was a famous American mathematician, statistician, computer scientist, and psychometrician. There are other famous mathematicians of the same surname - his two brothers and one 957 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS = (F(a + 2)-F(a + l)-F(a-2),F(a + l)) = (F(a)-F(a-2),F(a + l)) = (F(a - l),F(a + 1)) = (F(a - l),F(a)) = 1. (K(a),K(2a-\)) = (K(a),F(2)K(2a-2)+F(l)K(2a-3)) = (K(a),F(3)K(2a-3)+F(2)K(2a-A)) = ■■■ = (K{a),F{a)K{a) + F(a - l)K{a - 1)) = (K(a), F(a - 1)) = (F(a + 2) - F(a - 2),F(a - 1)) = (F(a + 2)-F(a),F(a-l)) = (F(a + 2) - F(a + 1), F(a - 1)) = (F(a), F(a - 1)) This proves the proposition. □ G. Probability in combinatorics Classical probability is tightly connected to combinatorics, as we have already seen in the first chapter. Now, we present another example, which is a bit more complicated. Combinatorics is hidden even in the following "probabilistic" problem. 13.G.1. There are 100 prisoners in a prison, numbered 1 \\N i'/, through 100. The chief guard has placed 100 chests fr (also numbered 1 through 100) into a closed room and randomly put balls with numbers 1 through 100 on them into the chests so that each chest contains exactly one ball. He has decided to play the following game with the prisoners: He calls them one by one into the room and the invited prisoner is allowed to gradually open 50 chests. Then, he leaves without any possibility to talk to the other prisoners, the guard closes all the chests, and another prisoner is let in. The guard has promised to free all the prisoners provided each of them finds the ball with his number in one of the 50 opened chests. However, if any of the prisoners fails to find his ball, all will be executed. Before the game begins, the prisoners are allowed to agree on a strategy. Does there exist a strategy that gives the prisoners a "reasonable" chance of winning? Solution. Clearly, if the prisoners choose to open the chests randomly (where the choices of the particular prisoners are independent), the chance for one prisoner to find his ball is 1/2, so the total probability of success is merely 1/2100. Therefore, it is necessary to look for a strategy where the successes of the prisoners are as dependent as possible. First of all, we should realize that the invited prisoner has no information from other prisoners and does not know the positions Kruskal's algorithm Input: A graph G = (V,E,w) with non-negative weights over edges. Output: The minimal spanning trees for all components of G. (1) Sort the m edges in E so that w(ei) < w(e2) < • • • < w(em). (2) For this order of the edges, call the "Spanning forest algorithm" from the previous subsection. This is atypical example of the "greedy approach", when maximizing profits (or minimizing expenses) is achieved by choosing always the option which is the most advantageous at dach stage. In many problems, this approach fails since low expenses at the beginning may be the cause of much higher ones at the end. Therefore, greedy algorithms are often a base for very useful heuristic algorithms but seldom yield optimal solutions. However, in the case of minimum spanning tree, this approach works: Theorem. Kruskal's algorithm finds a minimum spanning tree for every connected graph G with non-negative edge weights. The algorithm runs in 0(m log m) time where m is the number of edges of G. Proof. Let T = (V,E(T)) denote the spanning j',, tree generated by Kruskal's algorithm and, further, let f = (V,E(f)) be an arbitrary minimum spanning tree. The minimality implies that E. e€E(t) e€E(T) y(e), so the goal is to show that also J2eeE(T) w(e) = J2eeE(f) w(e)- If E(T) = E(T), then nothing further is needed. So assume there exists an edge e e E(T) such that e ^ E(f). From all such edges, choose one, call it e with weight w(e) as small as possible. The addition of e into T creates a cycle eeie2 ■ ■ ■ &h in T, and at least one of its edges e{ is not in E(T). The choice of the edge e implies that if w(ei) < w(e), then the edge would be among the candidate edges in Kruskal's algorithm after a certain subtree T" C T n T had been created, so its addition to the gradually constructed tree T would not create a cycle. Therefore, if w(ei) < w(e), the edge e{ would be chosen in the algorithm. It follows that w(ei) > w(e). However, now the edge e{ can be replaced with e in T (by the choice of e{, this results in a spanning tree again) without increasing the total weight. So the resulting T is a minimum spanning tree. It differs from T in fewer edges than before. Therefore, in a finite number of steps, T is changed to T without increasing the total weight. □ nephew. Martin David co-invented solitons and surreal numbers, William was active in statistics, Clyde was a computer scientist too. The above algorithm dates from 1956. 958 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS of particular balls in the chests. However, once he opens a chest, he knows the ball number it contains and may choose the next chest accordingly. This suggests the following simple strategy: Every prisoner starts with the chest that bears his number. If it contains the corresponding ball, the prisoner succeeds and can open the remaining chests at random. If not, he opens the chest with the number of the found ball. He continues this way until he eventually finds his ball or opens the fiftieth chest. Since every chest "points" at another chest according to the described procedure, let us call this strategy the pointer strategy. Probability of success. The guard's possible placements of the balls bijectively correspond to permutations of the numbers 1 through 100. In order to find the probability of success, we must realize for which permutations the pointer strategy works. Recall that every permutation can be expressed as the composition of pairwise disjoint cycles. If each prisoner were allowed to open an arbitrary number of chests, he would find his ball as the last one of the corresponding cycle since he begins with the chest with his number, which is pointed at just by the chest with his ball. It follows that the strategy fails if and only if there is a cycle of length greater than 50 because then no prisoner of this cycle finds his ball in time. Thus, we must count the number of such permutations. In general, the probability that a random permutation of length n contains a cycle of length r > n/2 (there could be more occurrences of shorter cycles; however, there can be at most one cycle of length greater than n/2, which simplifies the calculation) is as follows: We must choose the r elements of the cycle, order them, and then choose an arbitrary permutation of the remaining n — r numbers. This leads to (r-l)!(n-r)! = £ 13.2.8. Two more algorithms. The second algorithm for finding a spanning tree, presented in 13.2.6 also leads to a minimum spanning tree: Iarnik-Prim's algorithm7 Input: A connected graph G = (V,E,w) with n vertices and m edges, with non-negative weights over the edges. Output: The minimum spanning tree T of G. (1) Initialize T0 = ({v}, 0) with some vertex v e V. (2) In the i-th step, look for all edges e which are not in Tj_i, but their tail vertex ve is. Take the one of them with minimal weight and add it to Tj_i, i.e. add the head vertex to Vi-\ and e to Ei_1. (3) The algorithm terminates when the number of added edges totals at n — 1. The Boruvka's algorithm is similar. It constructs as many as possible connected components simultaneously: It begins with the singleton components in the graph T0 = (V, 0). In each step, it connects every component to another component with the shortest edge possible. It is easy to prove that (provided the edge weights are pairwise distinct) this results in a minimum spanning tree. Boruvka's algorithm Input: A connected graph G = (V,E,w) with non-negative weights on the edges. Output: The minimum spanning tree for G. (1) Initialization. Create the graph S with the same vertex set as G and no edges; (2) The main loop. While S contains more than one component, do: • for every tree T in S, find the shortest edge that connects T to G \ T, and add this edge into E, • add all edges of E into the graph S and clear E. Note that Boruvka's algorithm can be executed using parallel computation, which is why it is used in many practical modifications. The proofs that both of these algorithms are correct, are similar to that of Kruskal's. The details are omitted. 13.2.9. Traveling salesman problem. ^ajl where 7 is Euler's constant. Thus, we have: n 1 ^ 1 in P= J^k^J^k ->ln(n)+7-ni(^)-7 = m2, pro n -> k=l k=l Hence it follows that, for large values of n, the probability of success approaches 1 — p ~ 1 — In 2 = 0.30685.... Now, we are going to show that the pointer strategy is optimal. Optimality of the pointer strategy. In order to prove the optimality of the pointer strategy, we merely modify the rules of the game and further define another game. Consider the following rules: Every prisoner keeps opening the chests until he finds his ball. The prisoners win iff each opens at most 50 chests. Clearly, this modification does not change the probability of success, but it will help us prove the optimality. We will refer to this game as game A. Now, consider another game (game B) with the following rules: First, prisoner number 1 enters the chest room and keeps opening the chests until he finds his ball. Then, the guard leaves the opened chests as they are and immediately invites the prisoner with the least undiscovered number. The game proceeds this way until all chests are opened. The prisoners win iff none of them opened more than 50 (n/2 in the general case) chests. Suppose that the guard notes the ball numbers in the order they were discovered by the prisoners. This results in a permutation of the numbers 1 through 100, from which he can see whether the prisoners won or not. The probability few cases, the contrary is true — mostly there are no algorithms running in polynomial time, so one needs to use algorithms which do not always find the optimal solution but give one which is as good as possible. This is called a heuristic approach. One of the most important combinatorial problems of this class is the problem of finding a minimum Hamiltonian cycle. This is a Hamiltonian cycle with the minimum sum of the weights of its edges among all Hamiltonian cycles. This problem arises in many practical applications. For instance: • goods or post delivery (via a given network) • network maintenance (electricity, water pipelines, IT, etc.) • request processing (parallel requests for reading from a hard disk, for instance), • measuring several parts of a system (for example, when studying the structure of a protein crystal using X-rays, the main expenses are due to the movements and focusing for particular measurements), • material division (for instance, when covering a wall with wallpaper, one tries to keep the pattern continuous while minimizing the amount of unused material)). The greedy approach can be applied in case of looking Tor a minimum Hamiltonian cycle as well. The algorithm begins in an arbitrary vertex vi, which is set active, and the other vertices are labeled as sleeping. For each step, it examines the sleeping vertices adjacent to the active one and selects the one which is connected by the shortest edge. The active vertex is labeled as processed, and the selected vertex becomes active. The algorithm terminates either with a failure, when there is no edge going from the active vertex to a sleeping one, or it successfully finds a Hamiltonian path. In the latter case, if there exists an edge from the last vertex vn to vi, a Hamiltonian cycle is obtained. This algorithm seldom produces a minimal Hamiltonian cycle. At least, it always finds some (and relatively small) Hamiltonian cycle in a complete graph. 13.2.10. Flow networks. Another group of applications of the language of graph theory concerns moving some amount of a measurable material in a fixed network. The vertices of a directed graph represent places between which one transports ma- terial up to predetermined limits which are given as assessments of the edges (called capacities). There are two important types of vertices: the sources and sinks of the network. A network is a directed graph with valued edges, where some of the vertices are labeled as sources or sinks. Without loss of generality, assume that the graph is directed and has only one source and one sink: In the general case, an artificial source and a sink can always be added, connected with directed edges to the original sources and sinks. 960 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS of discovering a particular ball is at every moment independent of the selected strategy. There are 100! permutations which correspond to some strategies (no matter whether they are random or sophisticated) since they are merely the orders in which the ball numbers are discovered. In order to compute the probability of success in game B, we should note that any order can be written as the composition of cycles where each cycle contains the ball numbers discovered by a given prisoner. For the sake of clarity, consider a game with 8 prisoners. If the guard has noted the permutation (2,5, 7,1,6,8,3,4), then we can see that the prisoners win since prisoner 1 discovered numbers (2,5,7,1), then prisoner 3 discovered (6,8,3), and finally prisoner 4 discovered only his number (4). In this case, we can write: (2,5,7,1,6,8,3,4) -> (2, 5,7,1)(6,8, 3)(4). Further, any such permutation corresponds to a unique order of the numbers 1 through 8. Having any permutation in the cyclic notation, we first rearrange each cycle so that its least number is the last one and then sort the cycles by their last numbers in ascending order. For instance, we have: (7,5, 8)(2,4)(1,6,3) -j- (6,3,1)(4, 2)(8,7,5) -j-(6,3,1,4,2,8,7,5). We have thus constructed a bijection between the winning orders of discovered numbers and the permutations of the numbers 1 through 8 that do not contain a cycle of length greater than 4. It follows that the probability of success in game B is the same as the probability that a random permutation does not contain a cycle of length greater than 4 (nj 2 in the general case). This corresponds to the probability of success in the original game using the pointer strategy. Now, this implies an important conclusion for game A. Indeed, the prisoners may apply any strategy from game A to game B as follows: each prisoner behaves like in game A, but he considers open chests to be closed, i. e., if he wants to open a chest which has already been opened, he just "p"asses this move and further behaves as if he had just discovered the ball number in the considered chest. Therefore, any strategy that succeeds for a given placement of the balls in game A must succeed for the same placement in game B as well. Therefore, if there existed a better strategy for game A, we could apply it to game B and obtain a higher chance of winning there. However, this is impossible since all strategies in game B lead to the same Then the capacities of the added edges would cover all maximum capacities of the particular sources and sinks. The situation is depicted in the diagram. There, the black vertices on the left correspond to the given sources, while the black vertices on the right stand for the given sinks. On the left, there is an artificial source (a white vertex), and there is an artificial sink on the right. The edge values are not shown in the diagram. Flow networks A network is a directed graph G = (V,E) with a distinctive vertex z, called the source, and another distinctive vertex s, called the sink, together with a non-negative assessment of the edges w : E —> R, which represents their capacities. A flow in a network S = (V,E, z, s, w) is an assessment of the edges / : E —> R such that, for each vertex v except for the source and the sink, the total input is equal to the total output, i.e. E /(e) = E /w- e€lN(v) eeOUT(v) This rule is often called the Kirchhoff's law (referring to the terminology used in physics). The size of a flow f is given by the total balance of the source values E /(e)- E /(e)- eeOUT(z) eelN(z) It follows directly from the definition that the size of a flow / can also be computed as E /(e)- E /(e)- eelN(s) eeOUT(s) The left hand part of the following diagram shows a simple network with the source in the white circled vertex and the sink in the black bold vertex. The labels over the edges determine the maximal capacities. Looking at the sum of the capacities that enter the sink, the maximum flow in this network is 5 (the sum of the capacities leaving the source is larger). 3 ' 2/3 961 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS probability of success. Therefore, the pointer strategy is better than or equally good as any other strategy. □ 13.G.2. In a competition, there are m contestants and n officials, where n is an odd integer greater than two. Each official judges each contestant as either good or bad. Suppose that any pair of officials agree on at most k contestants. Prove that k n — l — > -. 771 2?1 Let us look at two possible approaches to this problem. Solution. Let us count the number N of pairs ({official, official}, contestant) where the officials are distinct and agree on the contestant. Altogether, there are Q) pairs of officials, and each pair can agree on at most k contestants. Therefore, N m(n — l)2/4. Combining these two inequalities together, we get k n — l — > -. 771 2?1 An alternative solution - using probabilities. Let choose a pair of officials at random. Let X be the random variable which tells the number of cases when this pair agrees. We are going to prove the contrapositive implication, i. e., if -\ < ztj7^-, then X is greater than k with probability greater than zero, which will be denoted P(X > k) > 0. Consider the random variables X{ for i = 1,2,... , m with codomain 0,1, denoting whether the pair agrees on the j-th contestant. Let Xi = 1 when they agree, and let Xi = 0 otherwise. Hence we have: X = x± + x2 + ■ ■ ■ + xm Using the linearity of expectation, we obtain: E\X] = E\Xi] + E[X2] + ■■■ + E[Xm]. 13.2.11. Maximum flow problem. The next task is to find the maximum possible flow for a given network on a graph G. The right hand side of the above diagram shows a flow of size five, and the size of any flow cannot exceed this. The fundamental principle is that the capacities of a set of edges are added up through which each path from z to s must go. In the diagram, there are three such choices providing the limits 12, 8,5 (from left to right). At the same time, in such a simple case the flow that realizes the maximal possible value is easily found. This idea can be formalized as follows: Cut in a network A cut in a network S = (V,E, z, s, w) is a set of edges C C E such that when these edges are removed, there remains no path from the source z to the sink s in the graph G = (V,E\C). The number |C| = £>(e) eGC is called the capacity of the cut C. Clearly, there is no flow whose size is greater than the capacity of a cut. We present the Ford-Fulkerson algorithm8, which finds a cut with the minimum possible capacity as well as a flow which realizes this value. This proves the following theorem: Theorem. In any network S = (V,E, z, s, w), the maximum size of a flow equals the minimum capacity of a cut in S. The idea of the algorithm is quite simple. It looks for paths between the vertices of the graph, trying to "saturate" them with the flow. For this purpose, define the following terminology: An undirected path from the vertex v to the vertex v' in a network S = (V,E, z, s, w) is called unsaturated if and only if all edges e directed along the path from uton' satisfy /(e) < w(e) and the edges e in the other direction satisfy /(e) > 0 (sometimes, one tries to saturate the flow in the other direction; yielding a semipath, or the augmenting semipath). The residual capacity of an edge e is the number w(e) — /(e) if the edge is directed from v to w, and it is the number /(e) otherwise. The residual capacity of a path is defined to be the minimum residual capacity of its edges. For the sake of simplicity, assume that all the edge capacities are rational numbers. Ford, L. R.; Fulkerson, D. R. (1956). "Maximal flow through a network". Canadian Journal of Mathematics 8: 399-404. 962 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Now, let us calculate E[X{] = Y^x e{o 1} Xi ' = xd-Since Xi can be only 0 or 1, we have directly E[Xi] = P(Xi = 1). Let us examine the probability P(Xi = 1), i. e., that the officials agree on the i-th contestant. There are (2) pairs of officials. Let t{ denote the number of officials who say the i-the contestant is good and n —1{ be the number of those who do not. Then, there are (*2') pairs who agree that the 2-the contestant is good and ("~*') pairs who agree on the ') pairs that agree M + ,n-ti contrary. Altogether, there are (2') + on the j-th contestant. Therefore, e[xt] = P(xt = 1) = Hence, e[x\=y: i=l \2) We are going to show that for odd values of n, we have (*2') + 2) + \ 2 Rearranging this leads to (n - 2t{)2 > 1 U < n-1 or U > 71+1 which is clearly true since !L2^ and are adjacent integers. Using the inequality (*') + (" 2U) > , we obtain: E\X] > m (V^)2 _ m(n-l) n(n— 1) 2 271 Thanks to the assumption m{%n ^ > k, we have E[X] > k, and thus P(X > k) > 0, which finishes the proof. □ ford-fulkerson algorithm Input: A network S = (V,E, z, s, w). Output: A maximal possible flow / : E —> R and a minimal cut C, which is given by those edges which lead from U to V\U. (1) Initialization: Set /(e) = 0 for each edge e e E, and using depth-first search from z, find the set U C V of those vertices to which there exists an unsaturated path. (2) The main loop: While s e U, do • select an unsaturated path P from the source z to the sink s; then increase the flow / along all edges of the path P by the value of the residual capacity of P. • update U. Proof. As seen, the size of any flow cannot exceed the capacity of any cut. Therefore, it suffices to show that when the algorithm terminates, the capacity of the generated cut equals the size of the constructed flow. The algorithm terminates in the first moment when there is no unsaturated path from the source z to the sink s. This means that U does not contain s and for all t edges e from U and ending outside of U, /(e) = w(e) f & 1 (otherwise, the other endpoint of e would be added to U). For the same reason, all edges e leading from V \ U to U must have /(e) = 0. Clearly, the total size of the flow satisfies Further, we demonstrate application of probabilities to an interesting problem. 13.G.3. Let S be a finite set of points in the plane which are in general position (i. e., no three of them lie on a straight line). For any convex polygon P all of whose vertices lie in S, let a(P) denote the number of its vertices and b(P) the number of points from S which are outside P. Prove that for any real number x, we have ^>a(p)(l-x)b(p) = 1 p where the sum runs over all convex polygons P with vertices in S. (A line segment, a singleton, and the empty set are considered to be a convex 2-gon, 1-gon, and 0-gon, respectively.) Solution. First of all, we prove the wanted equality for x e [0,1]. Let us color a point from S so that it is white with probability x and black with probability 1 — x (in other words, we consider a random choice of the size I SI with the binomial 1/1= E /(e) edges from U to V \ U E m ■ edges from V \ U to U However, when the algorithm terminates, this expression equals \c\= E /(e) edges from U to V \ U E /(e) . edges from V \ U to U which is the desired result. It remains to show that the algorithm always terminates. Since the edges are assumed assessed with rational numbers, it can be assumed by rescaling that the capacities are integers. Then every flow constructed during the run of the algorithm has integer size. In addition, every iteration of the main loop increases the size of the flow. However, since any cut bounds the maximum size of any flow from above, the algorithm must terminate after a finite number of steps. □ 963 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS probability distribution Bi(n, x) and let us say that success corresponds to white and failure corresponds to black). We can note that for any such coloring, there must exist a polygon such that all of its vertices are white and all points outside are black (this polygon is the convex hull of the white points). The above suggests that the probability that the random choice realizes a polygon with all vertices white and all exterior points black is equal to one. However, we can compute this probability in a different way. The event of a polygon having this property is the union of k disjoint events, where k is the number of convex polygons, namely that a given polygon has the desired property (note that the property cannot be shared by different convex polygons). For every given convex polygon P, the probability that its vertices are white and the points outside it are black is equal to xa<-p\l — x)b<-p\ where a(P) is the number of vertices of P and b(P) is the number of points from S outside P. Since the probability of a union of disjoint events is equal to the sum of the particular events' probabilities, we get E-a(p)d •\b(P) 1. This proves the equality for all numbers in the interval [0,1]. However, we can also perceive this fact as follows: any real number from the interval [0,1] is a root of the polynomial ^2Pxa(p\\ — x)b<-p^ — 1. As we know, a nonzero polynomial over the (infinite) field of real numbers can have only finitely many roots (see 12.2.6). Therefore, J2p xa<-p^ (1 — x)b<-p^ — 1 is the zero polynomial and the equality J2p xa<-p^ (1 — x)b<-p^ = 1 thus holds for all real numbers x. □ Remark. This equality holds even if we define the numbers a(P) and b(P) in another way: The definition of a(P) is the same, but now let b(P) denote the number of points from S which are not the vertices of P. (Thus, we always have a(P) + b(P) = 15*1). Then, the given equality is a corollary of the binomial theorem for (x + (1 — x))\s\. 13.G.4. A competition with n players is called an (n, k)-tournament iff it has k rounds and satisfies the following: i) every player competes in every round and any pair of players competes at most once, ii) if A plays with B in the i-th round, C plays with D in the j-th round, and A plays with C in the j-th round, then B plays with D in the j-the round The run of the algorithm is illustrated in two diagrams. On the left, there are two shortest unsaturated paths from the source to the sink in gray (the upper one has two edges, while the lower one has three). On the right, another path is saturated (taking the first turn in the upper path), also drawn in gray. Now, it is apparent that there can be no other unsaturated path from the source to the sink. Therefore, the algorithm terminates at this moment. 13.2.12. Further remarks. The algorithm allows for further conditions incorporated in the problem. For in-stance, capacity Umits can be set for the vertices of the network as well. There are not only upper limits for the flows along particular edges or through vertices, but also lower ones. It is easy to add vertex capacities - just double every vertex (one for the incoming edges, the other for the outgoing edges), connecting each pair with an edge of the corresponding capacity. The lower limits for the flow can be included in the initialization part of our algorithm. However, one needs to check whether such a flow exists at all. Many other variations can be found in literature. On the other hand, the algorithm does not necessarily ter- , .. i Put some nice example minate if the edge capacities are irrational. Moreover, the in the other column, flows that are constructed during the run may not even con-f* ° J httpsy/www.cs.prrnce- verge to the optimal solution in such a case. However, it still ton.edu/com-s- es/archive/springl3/cos423/l< holds that if the algorithm terminates, then a maximum now tmes/twDemoFordFuik- iS fOUnd. ersonPathological.pdf If the capacities are integers (equivalently rational numbers), the running time of the algorithm can be bounded by 0(f\E\), where / is the size of a maximum flow in the network and \E\ is the number of edges. The worst case occurs if every iteration increases the size of the flow by one. In the proof of correctness, no explicit way of searching the graph when looking for an unsaturated path is used. Another variation of the Ford-Fulkerson algorithm is to use breadth-first search. The resulting algorithm is called Edmonds-Karp, and its running time is 0(\V\\E\2).9 We mention Dinic's algorithm, which simplifies the search for an unsaturated path by constructing the level graph, where augmenting edges are considered only if they lead between Edmonds, Jack; Karp, Richard M. (1972). "Theoretical improvements in algorithmic efficiency for network flow problems". Journal of the ACM (Association for Computing Machinery) 19 (2): 248-264. doi: 10.1145/321694.321699. 964 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Find all pairs (n, k) for which there exists an (n, k)-tournament. Solution. There exists an (n, k)-tournament if and only if 2 ri°s2divides the integer n. First of all, we are going to show the if-part. We construct a (2*, k)-tournament where k < 2* — 1 (then, the general case 2* | n can be easily derived from that). There are thus 2* players in the tournament, so we assign to each player a (unique) number from the set {0,1,..., 2* — 1}. In the i-th round, player a competes with player a® i (where ffi is the binary XOR operation, i. e., the j-th bit of a ffi b is one if and only if the j-th bit of a is different from the j-th bit of b). This schedule is correct since every player is engaged in every round and different players have different opponents (for a / ft we have a ffi i ^ j3 ffi i). Further, the opponent of the opponent of a is indeed a (since (a ffi i) ffi i = a). Moreover, the second tournament rule is also satisfied: if a plays with (3 and 7 plays with S in the i-th round (i. e., (3 = a ffi i and S = 7 ffi i) and if j is such that a plays with 7 in the j-th round, then we have /3 ffi j = (a ffi i) ffi j = (a ffi j) ffi i = 7 ffi i = 5, so (3 indeed plays with S in the j-th round. Any (2* ■ s, k)-tournament where s is odd can be obtained as s parallelized (2*, fc)-tournaments. Now, we are going to show that the condition 2ri°S2(fc+1)l J n is necessary as well. Consider the graph Gi whose vertices correspond to the players and edges are between the pairs who have played in or before the i-th round. Consider players A and B who play together in round i + 1. We want to show that we must have \r\ = \A\ where r is the component of A and A is the component of B. Actually, we show that any player of r competes with a player of A in round i + 1. Thus, let C 6 T, i, e., in Gi, there exists a path A = Xi, X2,..., Xm = C such that Xj has played with Xj+1, j = 1,..., m — 1, in or before the j-th round. Consider the sequence Y1,Y2,.. .Ym, where Yk is the opponent of Xk in round i + 1, k = 1,... ,m (thus Y1 = B). Then, for any 1 < j < m — 1, we have that Xj competes with Yj and Xj+1 competes with Yj+1 in round i + 1 (by the definition of the sequence Y1,..., Y{) and in a certain r-the round (1 < r < i), Xj played with Xj+i (by the definition of the sequence X1,..., X{). However, by the second tournament rule, this means that Yj also played with Yj+1 in the r-the round, so the edge YjYj+1 is contained in Gi for any 1 < j < m — 1, Thus, Y1, Y2,... Ym is a vertices whose distances from the source differ. The time complexity of this algorithm is 0(\V\2\E\), which is much better for dense graphs than the Edmonds-Karp algorithm. 13.2.13. Problems related to flow networks. A good application of flow networks is the problem of bipartite matching. The task is to find a maximum matching in a bipartite graph, i.e. a set of as many edges as possible so that each vertex of the graph is the endpoint of at most one of the selected edges. This is an abstract variation of a quite common problem. For instance, it may be needed to match boys and girls in dancing lessons, provided information about which pairs would be willing to dance together is given. This problem is easily reduced to the problem of maximum flow. Add an artificial source to the graph and connect it with edges to all vertices of one part of the bipartite graph, while the vertices of the other part are connected to an artificial sink. The capacity of each edge is set to one, and the resulting graph is searched for the maximum flow. Then, the edges that are used in the flow correspond to the selected pairs. Of course, information on which pairs to put together by leaving some of them out, may be included. Another important application of flow networks is the ^Ji,, proof of Menger's theorem (mentioned as a theorem !>» in 13.1.10). It can be understood as follows: Given a r\^n' directed graph, set the capacity of each edge as well as edge vertex to one. Further, select an arbitrary pair of vertices v and w, which are considered to be the source and the sink, respectively. Then, the size of a maximum flow in this graph equals the maximum number of disjoint paths from vtow (the paths may share only the source and the sink). Every cut divides v and w into different connected components of the remaining graph (since they are chosen to be the source and sink). The desired statements then follow from the fact that the size of a maximum flow equals the capacity of a minimum cut. 13.2.14. Game trees. We turn our attention to a very broadly used application of tree struc- T^^^S^SS^H tures wnen analyzing possible strategies ^^'\^*JK!!*'' - or Procedures. They can be encountered in the theory of artificial intelligence as well as in the game theory. They play an important role in economics and many other social fields. This is about games. In the mathematical sense, game theory examines models in which one or more players take turns in playing moves according to predetermined and generally known rules. Usually, the moves are assessed with profits or losses for the given player. Then, the task is to find a strategy for each player, i.e. an algorithmic procedure which maximizes the profits or minimizes the losses. We use an extensive description of the games. This means that a complete and finite analysis of all possible states of the game is given, and the resulting analysis gives an exact account about the profits and losses. This is supposing that the other players also play the best moves for them. 965 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS path in Gi, so B = Y\ and Ym lie in the same component (A). It can be deduced analogously that any player from A competes with a player of r in round i + 1, and since every player plays exactly once in a given round, we must have \r\ = \A\. By the definition of a component, the component of A in Gj+i is equal to r U A. Then, we have either r = A (then, the component of A in Gj+i is T), or r n A = 0 (in this case, the component of A in Gj+i is the disjoint union r U Z\). Altogether, the component of A in Gj+i is either the same or twice as great as in Gi. Now, consider the components r2,..., 11 of A in the respective graphs Gi, G2, • • • Gfc. We have = 2 (since A had exactly one opponent in the first round) and for 1 < i < k — 1, we have either \r{\ = |il+i|, or 2\r{\ = |il+i|. Therefore, the number of vertices (players) of every component is a power of two, i. e., |i~it| = 2l for some /, and > k + 1 (A had different opponents in the k rounds). Hence, 2l > k + 1, i. e., 2l is at least 2 flog2 , so the number of players in each component is divisible by 2 flog2 . Thus, so must be the total number n. □ H. Combinatorial games 13.H.1. Consider the following game for two players: On the table, there are four piles of 9, 10, 11, and 14 tokens, respectively. Players alternate moves where the move consists of selecting one of the piles and removing an arbitrary (positive) number of tokens from that pile. The player who takes the last token wins. Is there a winning strategy for one of the players? Solution. Note that this game is the sum of four games which correspond to one-pile games where an arbitrary (positive) number of tokens can be removed (the sum of combinatorial games is both commutative and associative, so we can talk just about the sum of those games without having to specify the order). A simple induction argument shows that the value of the Sprague-Grundy function (the SG-value) of such one-pile game is equal to the number of tokens: Suppose that a natural number n is such that for all k < n, the SG-value of the game with k tokens is k. According to the rules of the game, we can remove an arbitrary (positive) number of tokens, i. e., we can leave there an arbitrary number from 0 to n — 1. By the induction hypothesis, this means that for any number k < n, we can reach a position whose SG-value is k, and we cannot reach a position whose SG-value would be n. A game tree is a rooted tree whose vertices are the possible states of the game and they are labeled according to whose turn it is. The outgoing edges of a vertex correspond to the possible moves of the player from that state. This complete description of a game using the game tree may be used for common games like chess, naughts and crosses (known also as tic-tac-toe), etc. As a simple example, consider a simple variation of the game known as Nim.w There are k tokens on the table (the tokens may be sticks or matches), where k > 1 is an integer, and players take turns at removing one or two tokens. The player who manages to take the last token(s) wins. There is a variation of the game in which the player who is forced to take the last token loses. The tree of this game, including all necessary information about the game, can be constructed as follows: • The state with £ tokens on the table and the first player to move corresponds to the subtree rooted at Ft. The state with the same number of tokens but the second player to move is represented by the subtree rooted at Si. • The vertex Ft has St-i as its left-hand son and St-2 as its right-hand son. Similarly, the sons of the vertex St are Ft_1 and -F>_2. • The leaves are always F0 or So. (In the variation when the player to take the last token loses, these would be the states F1 and Si.) Every run of the game starting at root Fk corresponds to exactly one leaf of the resulting tree. Therefore, the total number p(k) of possible runs for Fk is equal to p{k)=p{k-\)+p{k-2) for k > 3, and clearly p(l) = 1 and p{2) = 2. This difference equation is already considered. It is satisfied by the Fibonacci numbers, which can be computed by an explicit formula (see the subsection on generating functions in the end of this chapter, or the corresponding part about difference equations in chapter three, cf. 3.B.1). A formula is known for the number of possible runs of the game. The number of possible states equals the number of all vertices of the tree. The game always ends in a win of one of the players. We can also consider games where a tie is possible. 13.2.15. Game analysis. The tree structure allows an analyse sis of the game so that an algorithmic strategy for fn&" each player can be built. This is done with a sim-j 11' pie recursive procedure for assessing the root of r^ a subtree. Each vertex is given a label: W for vertices where the first player can force a win, L for those where the first player loses if the other one plays optimally, and, optionally, T for vertices where optimal play of both players results in a tie. The procedure is as follows: The game was given this name by Charles Bouton in his analysis of this type of games from 1901. It refers to the German word "Nimm!", meaning "Take!". 966 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS By the definition of the SG-function, the value of the game with n tokens is n. It follows from the theorem of subsection 13.2.16 that the SG-value of the initial position of our game is equal to the xor of the initial positions of the particular games, namely 9© 10© 11 © 14 = 6. Since this value is non-zero, there exists a winning strategy for the first player: he always moves to a position whose SG-value is zero-such a position must exist by the definition of the SG-function. For instance, the first move would be to remove 6 tokens from the pile containing 14. (We look at the highest one in the binary expansion of the SG-value and find a pile where the corresponding bit is also one. Then, we set this bit to zero-thereby surely decreasing the number of tokens- and adjust the lower bits so that there would be an even number of ones in each position, resulting in zero SG-value.) □ 13.H.2. Consider the following game for two players: On the table, there is a pile of tokens. Players alternate moves where the move consists of either splitting one pile into two (non-empty) piles or removing an arbitrary (positive) number of tokens from a pile. The player who takes the last token wins. Find the SG-value of the initial position of this game if the pile contains n tokens. Solution. We are going to prove by induction that any positive integer k satisfies: S(4fc + 1) = Ak + 1 ff(4fc + 2) = Ak + 2 g(Ak + 3) = Ak + A g(Ak + 4) = 4fc + 3 Clearly, we have g(0) = 0. The following picture shows how we can deduce the value of the SG-function for one-, two-, and three-token piles. However, it is apparent that this would be much harder for a general number of tokens. (1) The leaves are labeled directly according to the rules of the game (in the case of our Nim, the leaves 5*0 are labeled by W, and the leaves F0 by Ľ). (2) Considering the vertex Fe. Label it W if it has a son who is labeled by W. If there is no such son, but there is a son labeled by T, then Fe is given the label T. Otherwise, i.e. if all sons are labeled by L, then Fe also gets L. (3) Similarly, a vertex Se is labeled L if it has a son labeled by L. Otherwise if it has a son labeled by T, it receives T. Otherwise (i.e. if it has only W-sons), it is labeled by W. Calling this procedure on the root of the tree gives the labeling of each vertex as well as an optimal strategy for each player: • The first player tries to move to a vertex labeled by W; if this cannot be achieved, he moves to a T-vertex at least. • Similarly, the second player tries to move to a vertex labeled by L; if this cannot be achieved, he moves to a T-vertex at least. The depth of the recursion is given by the depth of the tree. For instance, having a Nim game with k tokens, the depth is k. This analysis is not very useful yet. In order to use it in the mentioned form, the entire game tree is needed for disposal. This can be a great amount of data (for instance, in the case of naughts and crosses on 3 x 3 playground, the corresponding tree has tens of thousands of vertices). Usually, the analysis with game tree is used when only a small part of the whole tree is examined, applying appropriate heuristic methods, and the corresponding part is being created dynamically during the game. This is a fascinating field of the modern theory of artificial intelligence. The details are omitted. There is a more compact representation of the tree strúcej' ,, ture for our purposes of complete formal analysis. If ľj?» the game tree for Nim is drawn, then one state of the r\ún: game is represented by many vertices which corre-if^ spond to different histories of the game. However, the strategies depend only on the actual state (i.e. the number of tokens and the player to move) rather than on the history of the game. Therefore, the same game can be described by a graph where for each number of tokens, there is only one vertex, and the whole strategy is determined by identifying who is winning (whether this is the player on move or the other one) Directed edges are used for the description of possible moves and there is then always an acyclic graph. 967 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Now, assume that the above is satisfied for all positive integers below 4k + 1 and let us prove that we indeed have g(4k + 1) = 4k + 1. By the definition, the SG-value is the least non-negative integer / such that there is no move to a position with SG-value /. Moreover, this property (including that the terminal positions have zero value) determine the Sprague-Grundy function uniquely. Therefore, it suffices to prove that, for each / < 4k+1, we can move to a position with SG-value /, and that we cannot get into a position with SG-value 4k + 1. The former is clear since by the induction hypothesis, the SG-values of one-pile games of 0,1,..., 4k tokens take all the integers 0,1,... ,4k (although not in this order), so we can just remove the corresponding number of tokens from the pile. Now, we are going to show that we cannot reach a position with SG- value 4k+1: We already know that the only moves that could possibly lead to this SG-value are to split the pile into two. If we examine the resulting amounts modulo 4, there are two possibilities: either the number of tokens in one of the resulting piles is divisible by 4 (4a) and the other one leaves remainder 1 (4b + 1), or the numbers leave remainders 2 and 3, respectively. As for the former case, the SG-values of the resulting piles are, by the induction hypothesis, 4a—1 and 46+1 (the numbers of tokens in the particular piles are non-zero and less than 4k + 1, so we may use the induction hypothesis. In the latter case, i. e., if we split the pile into 4a + 2 and 46 + 3 tokens, we get that their SG- values are 4a + 2 and 46 + 4. Furthermore, a two-pile game is the sum of the two corresponding one pile-games, so the SG-value of the two-pile game is the xor (nim-sum) of the amounts. In both cases, the SG-value leaves remainder 2 upon division The example of the game Mm is displayed on the diagram. On the left, there is a complete game tree corresponding to three tokens. The directed graph on the right represents the game with seven tokens. A complete tree for this game would already have 21 leaves, and the number of leaves grows exponentially with the number of tokens. The individual vertices in the directed acyclic graph on the right-hand side of the diagram indicate the number of tokens left and the information whether the game at that state is won by the player who is to move (letter N as "next") or the other one (letter P as "previous"). Altogether, considering a game with k tokens, this graph always has only k + 1 vertices. At the same time, there is the complete strategy encoded in the graph: The players always try to move from the current state into a vertex labeled by P if such one exists. In fact, every directed acyclic graph can be seen as a description of a game. The initial situations are represented by those vertices which have no incoming edges (there can be one or more of them), and the game ends in leaves, i.e. vertices with no outgoing edges (again, there can be one or more of them). The strategy for each player can be obtained by a simple recursive procedure as above (for the sake of simplicity, it is assumed that there is no tie): • The leaves are labeled by P (the player who is to move from a leaf loses). • A non-leaf vertex of the graph is labeled by N if there is an edge leading to a P-vertex. Otherwise, it is labeled P. In the case of our variation of Mm, the situation is very simple. It follows from the strategy described that the player who is to move loses if and only if the number of tokens is divisible by three. The games that can be represented by a directed acyclic graph are called impartial. These are exactly those games which satisfy: • in every state, both players choose from the same set of moves; • the number of possible states is finite; • the game has "zero sum", i.e. the better the outcome for one of the players, the worse for the other one. An example of an impartial game is tic-tac-toe. Although the players use different symbols in this game, they can place them in any of the unoccupied squares. On the other hand, chess is not an impartial game in this sense, since the set of possible moves in every situation depends on the number of pieces the players have at their disposal. 13.2.16. Sum of combinatorial games. The rules of the real classical game Mm are somewhat more com-plicated: There are three piles of tokens. The TTSsgji? move consists of selecting one of the piles and CO*iF— removing an arbitrary (positive) number of tokens from that pile. The player who manages to take the last token wins. There is a variation of the game in which the 968 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS by 4 (consider the last two bits). In particular, it is surely not equal to 4k + 1. This proves the induction step for positive integers of the form 4k + 1. The proof for integers of the form 4k + 2 is analogous. The situation is more amazing in the 4k +3 case: Similarly as above, it follows from the induction hypothesis that the SG-values of the one-pile positions we can move to exhaust all the non-negative integers up to 4k + 2. However, note that if we split the pile into two containing 1 and 4k + 2 tokens, respectively, then their SG-values are also 1 and 4k+2 by the induction hypothesis, and the xor of these integers is 4k+ 3. It remains to prove that there is no move into a position with SG-value 4k + 4: Again, the only remaining possibility is to split the existing pile. Then, the resulting remainders modulo 4 are either 0 and 3, or 1 and 2. By the induction hypothesis the remainders of the corresponding SG-values are respectively 3 and 0 in the former case, and 1 and 2 in the latter. In either case, the xor of these integers (and thus the SG-value of the resulting position) leaves remainder 3, so it is not equal to 4k + 4. This proves the induction step for positive integers of the form 4k + 3. The proof for integers of the form 4k + 4 is analogous. □ I. Generating functions 13.1.1. In how many ways can we buy 12 packs of coffee if we can choose from 5 kinds? Further, solve this problem with the following modifications: i) we want to buy at least 2 packs of each kind; ii) we want to buy an even number of packs of each kind; iii) there are only 3 packs of one of the kinds. Solution. The basic problem is a classical example of a combinatorial problem on the number of 5-combinations with repetition-the answer is (12^]~1) = (146). The modifications can also be solved by combinatorial reasonings with a bit of invention. However, we want to demonstrate how these problems can be solved (almost without no need to think) using generating functions. The wanted number corresponds to the coefficient at a;12 in the expansion of the function (1 + x + x2 + ... f = = (1 + X + . . + X + . . .) ■ ■ ■ (1 + X + . . .) player who is forced to take the last token loses. If this game is considered with one pile, the situation is easy: The first player takes all the tokens and wins immediately. However, it is not that easy with three piles. Whether the analysis of the one-pile game is of any use for this more complicated game is a good question.. For this purpose, introduce a new concept, the sum of impartial games: A situation in the game composed of two simpler games is a pair of possible situations in the particular games. Then, a move consists of selecting one of the two games and performing a move in that game (the other game is left unchanged). Therefore, the sum of impartial games is an operation which assigns to a pair of directed acyclic graphs a new one. Considering graphs Gi = (Vi,Ei) and G2 = (V2,E2), its sum G i +G2 is the graph G = (V, E), where V = V\xV2 and E = {(viv2, wi«2); {vi,wi) G Ex} \J {{viv2,viw2); (v2,w2) e E2}. In the case of one game, the vertices can be labeled step- y^-.....v by-step by the letters TV and P in an upwards Jil.- manner, according to whether one can get to a 'S/Tj" P-vertex along some of the edges. However, in _jscL— the sum of games, movement along the edges is needed in a much more complicated way. Therefore, finer tools are needed for expressing the reachability of vertices labeled by P from other vertices. This needs some preparation which might seem like a strange magic (but the proof of the theorem below shows that all this is quite natural). Define the Sprague-Grundy function recursively, g : V —> N on a directed acyclic graph G = (V,E) as follows:11 (1) for a leaf v, set g(y) = 0; (2) for a non-leaf vertex v e V, define g(v) = mex{g(w); (v, w) is an edge}, where the minimum excluded value function mex is denned on subsets S of the natural numbers N = {0,1,... } by mex S = min N \ S. The function g(y) is just the mex S operation for the set S of the values g(w) over those vertices w where one can get along an edge from v. Note that this definition is correct since, clearly, the formula uniquely defines a function that assigns a natural number to any vertex in the acyclic graph in question. Yet another operation on the natural numbers is needed. It is the binary XOR operation (a,b) M- a ffi b, We are presenting the theory which was developed in combinatorial game theory independently by R. P. Sprague in 1935 and P. M. Grundy in 1939. 969 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS into a power series. The number of packs of the first kind determines which term is selected from the first parenthesis, and similarly for the other kinds. (Note that we need not pay special attention to that fact that there cannot be more than 12 packs of a given kind - it turns out that infinite series are usually simpler to work with than finite polynomials.) Since 1 -, 2 = 1 + x + xl + . . . l-x (see 13.4.3), the function we are considering is (1 — a;)-5. Our task is thus to expand (1 — a;)-5 into a power series. By the generalized binomial theorem, from 13.4.3, the coefficient at xk is the number (kt>^11), which is (146) in our case. Note that using generating functions, we have answered the question not only for 12, but rather for an arbitrary number of packs of coffee. The modifications can be solved analogously: i) The generating function is / ^2 x 5 (x2 + X3 + . . . )5 = 1-X (I- hence the coefficient at x12 is equal to (2^^1) ■ ii) An even number of each kind corresponds to the generating function (l + x2 + x4 + ...^ 1 (1 - a;2)5' The coefficient at a;12 can be found by many means; the easiest one seems to be the substitution y = x2 and looking for the coefficient at y6 (which can be perceived as joining the packs into pairs in the shop). This leads to the answer fl^1). iii) In this case, the generating function equals (l + a; + a;2 + a;3)(l+a; + a;2 + ...)4 and the wanted result is thus (Tt1)- ° 13.1.2. In how many ways can we use the coins of values 1, 2, 5, 10, 20, and 50 crowns to pay exactly 100 crowns? Solution. We are looking for non-negative integers 01,021^5,010,020, and 050 such that a, is a multiple of i for all i £ {1,2,5,10,20,50} and, at the same time, 01 + 02 + 05 + aio + 020 + a50 = 100. We can see that the wanted number of ways can be obtained as the coefficient Gi + G2 performing the exclusive-or operation bit-wise on the binary expansions of a and b. This operation can be considered from the following point of view: Consider the binary expansions of a and b to be vectors in the vector space (Z2)fe over Z2 (for a sufficiently large k), and add them there. The resulting vector is the binary expansion of a ffi b. Now the main result can be formulated: Sprague-Grundy theorem 13.2.17. Theorem. Consider a directed acyclic graph G = (V,E). Its vertices v are labeled by P if and only if g(v) = 0, where g is the Sprague-Grundy function. For any two directed acyclic graphs Gi = {Vi,E{), G2 = (V2,E2) and their Sprague-Grundy functions g 1, g2, the Sprague-Grundy function g of their sum is given by g(viv2) = gi(vi) ©52(^2)- Proof. The first proposition of the theorem follows directly by induction from the definition of the Sprague-Grundy function g. The proof of the other part is more complicated. Let (V1V2) be a position of the game = (V,E), and consider any a £ No such that a < <7i(i>i) © 92(^2)- There exists a state x1x2 of the game Gi + G2 such that g(xi) © g(x2) = a and (v^, x1x2) £ E, and, at the same time, there is no edge («1^2,2/12/2) £ E such that 91 (yi) ©52^2) = gi(vi) ®g2(v2). This justifies the recursive definition of the Sprague-Grundy function and proves the rest of the theorem. To show why, find a vertex 2; 1X2 with a given value a < 91 (i>i) © 52(^2) of the Sprague-Grundy function. Consider the integer b := a ffi i) ffi 52 (^2)- Refer to the bit of value 2J as the i-th bit of an integer. Clearly, b ^ 0. Let k be the position of the highest one in the binary expansion of b, i.e. 2k < b < 2k+1. This means that the fc-th bit of exactly one of the integers a, gi(i>i) ffi 32 (^2) is one and that these integers do not differ in higher bits. It follows from the assumption a < gi(i>i) ® 92(^2) that it is the integer 91 (i>i) ffi 52(^2) whose fc-th bit is one. Therefore, the fc-th bit of exactly one of the integers <7i(i>i), 52 (^2) is one. Assume without loss of generality that it is the integer gi(i>i). Further, consider the integer c := <7i(i>i) ffi b. Recall that the highest one of b is at position k, so the integers c, g1 (vi) do not differ in higher bits and the fc-th bit of c is zero. Therefore, c < gi(vi). Then, by the definition of the function value g\(v\), there must exist a state w\ of the game G\ such that (vi,wi) £ E\ and g\(w\) = c. Now, (i>ii>2, vj\V2) £ E and 9\(wx) ffi 52(^2) = cffi 52(^2) = 01(^1) ®b®g2(v2) = 9\(v\) ffi a ffiSi(vi) ffi 52(^2) ffi 52(^2) = a. This fulfills the first part of our plan. 970 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS at a;100 in the product (1 + x + x2 + . .. )(1 + x2 + x4 + ...)(1 + x5 + x10 + . ■(1 + x10 + x20 + ...)(l + x20 +x40 + ...)• ■ (1 + x50 + x100 + ...) = 1 1 1 1 1 1 1 -x 1 -x2 1 -x5 1 -x10 1 -x20 1-x50' The result can be obtained using the software of SAGE, for instance (the names of the used commands are pretty self-descriptive, aren't they?): sage: f=1/(1 — x)*l/(l-xA2)*l/(lx A5)\ *1/(1- xA10)*l/(l -xA20)*l/(l -xA50) sage: r=taylor(f,x,0,100) sage: r.coeff(x,100) 4562 □ 13.1.3. Expand the following functions into power series: L> x+2' jj\ x2+x+l ll> 2x3+3x2+l- Solution. i) X x/2 x + 2 2-(-x) l-(-x/2) n q oo „ •E Aj \\ . . _ 1 ^ =----1-------h=> (-1) -. 9 4 8 ^ 9™ ii) We perform partial fraction decomposition: x2 +x + 1 X2 + X + 1 2x3 + 3x2 + l (x -l)2(2x + 1) A B H--7 + ■ C 2x + l ' x-l ' (a;-l)2' finding out that A = B = i and C = 1; hence 1/3 1/3 a;2 + a; + 1 + 2a;3 + 3a;2 + 1 1 + 2a; 1-x Eoo n= -((-2)"-l)) + (n + l) □ 13.1.4. Find the generating functions of the following sequences: i) (1,2,3,4,5,...), ii) (1,4,9,16,...), iii) (1,1,2,2,4,4,8,8,...), Further, consider any edge (yiv2,2/12/2) £ EJ in G, where (ui>yi) £ -Ei. and hence «2 = 2/2- Suppose that gi(yi) ffi 'S2(y2) = 5i(wi)ffi32(w2)- Then, gi (yi)(Bg2(v2) = 5i(«i)ffi 52(^2). Clearly, the terms (72(^2) can be canceled (it is an operation in a vector space), leading to g\(j/i) = <7i(i>i). This contradicts the properties of the Sprague-Grundy function g\ of the game G\. This proves the second part of the theorem. □ The following useful result is a direct corollary of this theorem: Corollary. A vertex v\v2 in the sum of games is labeled by P if and only ifgi(v{) = g2{v2). For example, if three piles of tokens are combined in the simplified Nim game (it is always allowed to take only one or two tokens), the first player always wins, if all three piles have the same number of tokens, not divisible by three. The individual functions gi (k) for the individual piles equal the remainder after dividing k by 3. It follows that, when summing the first two Nim pile games, the value g(y) = 0 is obtained for the initial position. Summing again with another pile game gives g(y) =^ 0. In the original game, the individual piles are described by g{k) =k (any number of tokens can be chosen, hence the function g grows in this way). The losing positions are those, where the binary sum of the numbers of tokens is zero. For example, if the two of the initial piles are of equal size, then a simple winning strategy is to remove the third one completely and always make the remaining two equal after the opponent's move. Remark. Further details are omitted in this text. It can be proved that every finite directed acyclic graph is isomorphic to a finite sum of suitably generalized games of Nim. In particular, the analysis of the simple game and construction of the function g basically (at least implicitly) gives a complete analysis of all impartial games. 3. Remarks on Computational Geometry A large amount of practical problems consist in con-structing or analyzing some finite geometrical l '^iT-yf1' objects in Euclidean spaces, mainly in 2D or 3D. '^nls *s a very ^usy °f kotn applications and 1 y^%i^- research. At the same time, most of the algorithms and their complexity analysis are based on graph theoretical and further combinatorial concepts. We provide several glimpses into this beautiful topic.12 We discuss convex hulls, triangulations, and Voronoi diagrams and focus on a few basic approaches only. The beautiful book Computational Geometry, Algorithms and Applications by de Berg, M., Cheong, O., van Kreveld, M., Over-mars, M., published by Springer (1997) can be warmly recommended, http://www.springer.com/us/book/9783540779735 971 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS iv) (9, 0,0,2 ■ 16,0,0,4 ■ 25, 0,0,8 ■ 36,...), v) (9,1, -9,32,1, -32,100,1, -100,...,). o 13.1.5. In how many ways can we buy n pieces of the following five kinds of fruit if we do not distinguish between particular pieces of a given kind, we need not buy all kinds, and: • there is no restriction on the number of apples we buy, • we want to buy an even number of bananas, • the number of pears we buy must be a multiple of 4, • we can buy at most 3 oranges, and • we can buy at most 1 pomelo. 13.3.1. Convex hulls. We start with a simple and practical problem. In the plane R2, suppose n points X = {v1,. are given and the task is to find their convex hull CH(X). As learned in the Chapter 4, CH(X) is given by a convex polygon and it is desired to find it effectively. First we have to decide how CH(X) should be encoded as a data structure. Choose the connected list of edges. There is the cyclic list of the vertices v{ in the polygon, sorted in the counter clock-wise order, together with pointers towards the oriented segments between the consecutive vertices (the edges). Moreover, there is the list of edges pointing to their tail and head vertices. There is a simple way, to get CH(X). Namely, create the oriented edges for all pairs (yi,Vj) of the points in X, and decide whether belongs to CH(X) by testing whether all the other points of X are on the left of (in the obvious sense). It is known already from chapter one, that this is tested in constant time by means of the determinant. Clearly belongs to CH(X) if and only if all the latter tests are positive. In the end, the order in which to sort the edges and vertices in the output is found. This does not look like a good algorithm, since 0{n2) edges need to be tested against 0{n) points. Hence, cubic time complexity is expected. But there is a strong simple improvement available. Consider the lexicographic order on the points v{ with respect to their coordinates. Then build the convex hull consecutively and run through the tests only for the edges having the last added vertex as their tail. Solution. The generating function for the sequence (a„) where an is the wanted number of ways to buy n pieces of fruit is (1 + x + x2 + ■ ■ ■ )(1 + x2 + x4 ■ ■ ■ )(1 + x4 + xs + ■ ■ ■ )■ ■ (l + a; + a;2 + a;3)(l + a;) = 1 1 1 1-x4 ^ =---o ' -7---(l+x) = 1-x 1-x2 1-x4 1-x v ' 1 ~ [1-xf By the generalized binomial theorem, we have (1 — a;)-3 = Y^=o ("J2)1™ - Therefore, the wanted number of ways satisfies an = (™+2). □ 13.1.6. Using the generalized binomial theorem, prove again the following combinatorial identities: • ELo(-i)fcO = o, Gift Wrapping Convex Hull Algorithm Input: A set of points X = {v1,..., vn} in the plane. Output: The requested edge list for CH(X). (1) Initialization. Take the smallest vertex v0 in the lexicographic order with respect to the coordinates, and set v active = v0- (2) Main cycle. • Test edges with tail wactiVe, until e belonging to CH(X) is found. • add e to CH(X) and set its head to be the wactive • if ^active 7^ vo, then repeat the cycle. Obviously, the most left and lowest vertex v0 in X is in CH(X). Since CH(X) is a cycle (as a directed graph), the algorithm works correctly. It is necessary to be careful about possible collinear edges in CH(X) and the lack of robustness of the test for those nearly collinear. 972 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Solution. Substituting into the binomial theorem (i+""=(j)+(t>+G)*"+-+(:>" the numbers x = 1 and x = —1, we obtain the first and second identities, respectively. Then, the third one can be obtained by viewing both sides of the binomial theorem "continuously" and using the properties of derivatives. □ 13.1.7. In a box, there are 30 red, 40 blue, and 50 white balls. Balls of one color are indistinguishable. In how many ways can we select 70 balls? Solution. The wanted number is equal to the coefficient at x70 in the product (1 + x + ■ ■ ■ + x30)(l + X + ---+ xi0)(l + X + ---+ x50). This product can be rearranged to (1 - a;)-3(l - a;31)(l - xA1)(l - x51), whence, using the generalized binomial theorem, we obtain 2) + (2) X+ (2)^ '' ) (l--31--41--51+-72+- • 0 Hence, the coefficient at x70 is clearly (70+2) - (70+2,~31) 1061. □ 13.1.8. Prove that n YJHk = (n+\)(Hn+1-\). k=l Solution. The necessary convolution can be obtained as the product of the series ^A- and jA- In j^—. Hence: r 1—x 1—x 1—x k=l whence the wanted identity follows easily. □ 13.1.9. Solve the recurrence ao = «i = 1, an = a„_i + 2a„_2 + (-1)™. Solution. As always, it may be a good idea to write out a few terms of the sequence (however, this will not help us much in this case; still, it can serve as verification of the result).2 Step 1: an = a„_i + 2a„_2 + (-l)n[n > 0] + [n = 1]. Step 2: A(x) = xA(x) + 2x2A(x) + j^+x. Despite the statement in Concrete mathematics, this sequence can already be found in The On-Line Encyclopedia of Integer Sequences. This simple improvement reduces the worst running time of the algorithm to 0{n2). The worst case can be obtained if all the points v{ appear on one circle, and unluckily the right next point is always found as the very last one in partial tests. But the actual running time is much better, at most 0{ns), where s is the size of the CH(X). For example, in situations where the distribution of the points in the plane is random with normal distribution (see the chapter 10 for what this means), then it is known that the expected size would be logarithmic. At the same time, finding CH(X) for X distributed on a circle is equivalent to sorting the points along the circle. So the worst time run cannot be better than O(nlogn) for all algorithms (cf. 13.1.17). 13.3.2. The sweep line paradigm. We illustrate several main approaches to computational geometry algorithms for the same convex hull problem. The latter algorithm is close to the idea of having a special object L running through all the objects in the input X consecutively, taking care of constructing the relevant partial output of the algo-^— rithm on the run. This is the event structure describing all the events needing consideration, and the sweep line structure carrying all the information to deal with the individual events. The procedure is similar to the search over a graph discussed earlier. For a shortcut, this may reduce the dimension of the problem (e.g. from 2D to ID) on the cost of implementing dynamical structures. To start, initialize the queue of the events and begin dealing with them. At each step, there are the still sleeping events (those further in the queue not yet treated), the current active events (those under consideration) and the already processed ones. With the CH(X), this idea can be implemented as follows. Initialize the lexicographically ordered points Vi in X. Notice that the first and last ones necessarily appear in the CH(X). This way, CH(X) splits into the two disjoint chains of edges between them. We call them the upper and the lower convex hulls. Hence the entire construction can be split into the upper convex hull and lower convex hull. 973 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Step 3: A(x) 1 + x + x2 (\-2x)(\ + x)2' Step 4: an = l2n+ (±n + §)(-!)". □ 13.1.10. Quicksort - analysis of the average case. Our task is to determine the expected number of comparisons made by Quicksort, a well-known algorithm for sorting a (finite) sequence of elements. An example of a simple divide-and-conquer implementation: if L == []: retu rn [] return qsort ([x for x in L[l:] if x=1 It is not too difficult to construct a formula for the number of comparisons (we assume that the particular orders of the sequence to be sorted are distributed uniformly). The following parameters are important for the analysis of the algorithm: i) The number of comparisons in the divide phase: n—1. ii) The uniformness assumption: the probability that L [0] is the fc-th greatest element of the sequence is ^. iii) The sizes of the sorted subsequences in the conquer phase: k — 1 and n — k. We thus get the following recurrent formula for the expected number of comparisons: 0 ] ]) As in the diagram, moving through the events one by one, it is only needed to check, whether the edge joining the last [ BiMoVie active vertex with the current one makes a left or right turn (as usual, right means the clockwise orientation). If right, then add the edge to list, if left, omit the recent edges one by one, until the right turn is obtained. Sweep Line Upper Convex Hull n ^ Cn = n - 1 + V - (Cfc_i + Cn-k). z—' n k=l It is possible to solve this recurrence (using certain tricks which can be learned to some extent) even without using generating functions. 2 (n- Cn=n-1 + - Vcfc-i n ^-^ k=l n nCn=n(n-l) + 2j2C''k-i k=l n-1 (n-l)(n-2) + 2^C, Input: A set of points X = {vi,..., vn} in the plane. Output: The directed path UCH(X). (1) Initialization. • Set the event structure to be the lexicographically ordered list of points v&lst,..., vi!ist. There is no special sweep line structure but the indicator distinguishing the stage of the event. • Set the active event to be vactive = v&ist and initiate the UCH(X) as the trivial path with one vertex iifirst (this is the current last vertex of the path in construction). (2) Main cycle. • Set the active event to the next point v in the queue, and consider the potential edge e having wactive as the tail and the last vertex in UCH(X) as head. • Check whether the UCH(X) is to the left of e (check it only against the last edge in the current symmetry of both sums TJCH(X)). If so, add e and the wactive to the UCH(X). If not, remove edges in UCH(X) one by one, until the test turns positive. • Repeat the cycle until the next event is the «iast. multiply by n l)C„-i = \n - i)\n - A) -t- z k=l nCn = (n + l)C„_i + 2(n - 1) We have thus obtained a much simpler recurrence: nCn = (n + l)C„_i + 2(n - 1). On the other hand, this equation contains non-constant coefficients as well. .It is easy to check that the algorithm is correct. Exactly n the same expr^s^^^^ considered, and at each of it up to 0{n) vertices can be removed in the current UCH(X). This occurs subtracted anfLr^p^ but in fect nQne of tfle vertices is added again to the UCH(X) after removal. It follows that the asymptotic estimate for the main cycle run is 0{n) in total and it is the ordering in the initialization dominating with its 0{n log n) time. Clearly the linear 0{n) memory is sufficient and so the optimal solution is achieved again for the convex hull problem. 974 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS We can also note that the recurrence has been simplified to the extent that the values Cn can be computed iteratively. Nevertheless, it is advantageous to express these values explicitly as a function of n (or at least to approximate them). First, we use a slight trick: dividing both sides by n(n + 1): Cn _Cn_i 2(n-l) 71+1 71 7l(71 + 1) Now, we "expand" this expression (telescope, we can also use the substitution Bn = Cn/n+ 1): Cn _ 2(7i-l) 2(7i-2) 24 d 71+1 7l(71+l) (71-1)71 "' 2-3 2 ' Hence r 71-1 k — 2 71+1 ~ ^ (fc+l)(fc + 2)' This can be summed up using partial fraction decomposition, for instance: (fc+l)(fc+2) fc+2 fc+1 = 2 (Hn+i — 2 + — TTT • This leads to 71 + 1 \ 71+1/ whence Cn = 2(ti +l)ffn+1- 4(7i + l) + 2 = 5Zfc=i i is me sum of the first ti terms of the harmonic progression). At the same time, we can give the bound Hn ~ Ii ~f + "7> whence C„ ~ 2(?i + 1) (ln(?i + 1) + 7 - 2) + 2. 13.1.11. Using the generating function F(x) = x/(l—x—x2) for the Fibonacci sequence, find the generating function for the "semi-Fibonacci" sequence (_Fo, F2, F4,...). O 13.1.12. The fan of order ti is a graph on ti + 1 vertices, which are labeled 0,1,..., ti, with the following edges: vertex 0 is connected to all other vertices, and for each k satisfying 1 < k < n, vertex k is connected to vertex k + 1. How many spanning trees does this graph have? Solution. Denoting by vn the wanted number of spanning trees, we clearly have vi = 1, and since the fan of order 2 is the triangle graph K3, we have v2 = 3. Further, we are going to show that for ti > 1, the following recurrence3 holds: n-1 Vn = Vn-1 + E Vk + 1' V0 = 0-k=0 Using this recurrent formula to calculate more values u„, we find out that i)3 = 8, i>4 = 21, which suggests a hypothesis about connection with the Fibonacci sequence in the form u„ — F2„. This can be proved easily by induction. 13.3.3. The divide and conquer paradigm. Another very standard idea is to divide the entire problem into pieces, apply recursively the same procedure on them, and merge the partial results together. These are the two phases of the divide and conquer approach. This paradigm is common in many areas, cf. 13.1.10. With convex hulls, adopt the gift wrapping approach for the conquer phase. The idea is to split recursively the task producing disjoint "left CH(X)" and "right CH(X)" and to merge them by finding the upper and lower "tangent edge" of those two parts. Divide and Conquer Convex Hull Input: Set of points X = {wi,..., vn} in the plane, ordered lexicographically. Output: The directed path CH(X). (1) Divide. If ti < 3, return the CH(X). Otherwise, split X = Xi U X2 into two subsets of (roughly) same sizes, respecting the order (i.e. all vertices in X\ smaller then those in X2). (2) Merge. • Start with the edge joining the largest point in CHlXi) and the smallest in CH(X2), and iteratively balance it to the lower tangent segment e; to OT(Xi) and CH(X2). • Proceed similarly to get the upper tangent eu. • Merge the relevant parts of the CH(Xi) and CH(X2) with the help of e; and eu. Perhaps the merge step requires some more explanation. The situation is illustrated in the diagram. For the upper tangent, first fix the right-hand vertex of the initial edge joining the two convex polygons. Then find the tangent to the left polygon from this vertex. Then fix the head of the moving edge and find the right hand touch point of the potential tangent. After a finite number of exchanges like this, the edge stabilizes. This is the upper tangent edge eu. Observe that during the balancing, we move only clockwise on the right-hand polygon and counter-clockwise on the other one. Notice also that it is the smart divide strategy which prevents any of the points of the input X\ to appear inside of the CH(X2) and vice versa. Again the analysis of the algorithm is easy. The typical merge time is asymptotically linear and then the recursive call 975 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS For a fixed spanning tree of the fan of order n, let k be the greatest integer of the set {l,...,n — 1} such that the spanning tree contains all edges of the path (0,1,2,3,..., k). This spanning tree cannot contain the edges {0,2},..., {0, k}, {k, k + 1}; therefore, there are the same number of spanning trees for a fixed k as in the fan of order n — k with vertices 0,k + l,k + 2,...,n, i. e. Dn_i. Further, we must count one spanning tree for fc = n and those spanning trees which do not contain the edge {0,1} (thus they must contain the edge {1,2}) - they are obtained from fans of order ti — 1 on vertices 0, 2,..., 7i. We have thus obtained the wanted recurrence Vn = Vn-l + Vn-1 + Vn-2 H-----\-V0 + 1. Now, we have the general formula n-1 Vn = Vn-1 + y^fc + 1 - [n = 0], k=0 whence the usual procedure for finding the generating function V(x) of this sequence yields oo ^ V(x) = x ■ V(x) + Vkxn + \~~x~ - 1 = 71=0 k<7l OO k=0n>k E k=0 oo E k=0 VkX VkX n>k X X 1- +1- 1 — x 1 — X V(x) 1 -X + 1 V(x) yields O (log 71) runs of the procedures. The total estimated time is again O (71 log 71). Notice, there is no initialization in the procedure itself, lust assume that the points in X are already ordered. Hence another 0(n log 71) time must be added to prepare for the very first call. The memory necessary for running the algorithm is estimated by 0{n) if the recursions are implemented properly. The solution of the equation V(x) = xV(x) + jz^V(x) + x 1 — 3a; + x2 ' whence using the standard method (partial fraction decomposition) or the previous problem leads to the result vn = F2n. □ Recursively connected sequences. Sometimes, we are able to express the wanted number of ways or events only in terms of more mutually connected sequences. 13.1.13. In how many ways can we cover a 3 x 71 rectangle with 1x2 domino pieces? Evaluate this value for 71 = 20. Solution. We can easily find out that c\ = 0, c2 = 3, c3 = 0, and it is reasonable to set c0 = 1 (this is nor merely convention; there is indeed a unique empty covering). 13.3.4. The incremental paradigm. This approach con-1 sists in taking the input objects one by one and consecutively build the required resulting structure. This is particularly useful if the application does not ^ allow to have all the points available in the beginning. Imagine incrementally building the convex hull of shots into the target, as they occur. Another good use is in the randomized algorithms, where all the data is known in the beginning, but treated in a random order. Typically the expected time for running is then very good, while there might be much less effective, but improbable, worst time runs. The former case is easy to illustrate on the convex hull problem. In each step employ the merge step of the very degenerate version of the divide and conquer algorithm, merging CH(Xi) of the previously known points X\, while X2 = x_CH(X2) = {vk} is just the new point. But an extra step is —needed to check whether Vk is inside of CH(Xi) or not. If it is, then skip the new point and wait for the next one. If not, then merge. The worst time of this algorithm is 0{n2), but as with the gift wrapping method, it depends on the actual size of the output as well as on the quality of the algorithm checking whether is Vk inside of the CH(Xi). We illustrate the second case with a more elaborate convex hull algorithm. The main idea is to keep track of the position of all points in X with respect to the convex hull CH(Xk) of the first k of them (in the fixed order chosen at the beginning). With this goal in mind, keep the dynamical structure of a bipartite graph G whose first group of vertices consists of those points which have not been processed yet, while the other group contains all the faces of the current convex polygon S = CH(Xk) (call them faces, not to be confused with the edges in the graph in G). Remember the faces in S are oriented. Such a face e is in conflict with the point v if the face is "visible" from v, i.e. v is in the right-hand halfplane determined by e. Keep all points joined to each of their faces in conflict in the bipartite graph. Call G the graph of conflicts. The algorith can now be formulated: 976 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS We are looking for a recursive formula-discussing the behavior "on the edge", we find out that cn = 2r„_i + c„_2, rn = c„_i + r„_2, ro = 0, r\ = 1, where rn is the number of coverings of the rectangle 3 x n without one of the corner tiles. The values of cn and rn for the first few non-negative integers n are: 0 1 2 3 4 5 6 7 Randomized Incremental Convex Hull Algorithm 1 0 3 0 11 0 41 0 0 1 0 4 0 15 0 56 • Step 1: cn = 2r„_i + c„_2 + [n = 0], r„ = c„_i • Step 2: C(x) = 2xR(x)+x2C(x)+l, R(x) = xC(x)+x2 R(x). + i Step 3: C(x) = 1 R(x) 1 - 4a;2 + x4 ' w 1 - 4a;2 + x4 ' • Step 4: We can see that both are functions of a;2. We can thus save much work if we consider the function D (z) = 1/(1 -Az + z2). Then, we have C(x) = (l-x2)D(x2), i. e., [x2n]C(x) = [a;2re](l - - a;2)D(a;2) = [a;re](l -x)D(x), so c2n = dn- d„_i. The roots of 1 — 4a; + a;2 are 2 + V3 and 2 — V3, whence the standard procedure yields (2 + v^)™ (2 - V^)™ C2n + 3-V3 3 + V3 lust like with the Fibonacci sequence, the second term is negligible for large values of n and is always between 0 and 1. Therefore, "(2 + V3)n~ C2n- 3-V3 For instance, c2q = 413403. □ 13.1.14. Using generating functions, find the number of ones in a random bit string. Solution. Let B the set of bit strings, and for b e B, let \b\ denote the length of b and j(b) the number of ones in it. The generating function is of the form B(x) = YJ*w = Y.2nxn = l-1 •>eB n>0 ction for the num C(z) = X>) 2x beB n>0 The generating function for the number of ones is Input: A set A = {vi,... ,vn}, n > 3, of at least four points in the plane. Output: The edge list R of the convex hull CH(X). (1) Initialization. Fix a random order on X. Choose the first three points as X0, create the list of conflicts for the edge list R = CH(X0) (i.e. state which of the three faces are seen from which points) and remove the three points from X. (2) Main cycle. Repeat until the list X is empty: • choose the first point v e X; • if there are some conflicts of v in G, then - remove all the faces in conflict with v from both R and G, - find the two new faces (the uper and lower tangents from the new point v to the existing CH(X) - they are easily found by taking care of the "vertices without missing edges"), - add the two new faces to both R and G and find all their conflicts; • remove the point v from the list X, and from the graph G. The complete analysis of this algorithm is omitted. Notice that finding the newly emerging conflicts is easy since it is only necessary to check the potential conflicts of points which are in conflict with the faces incident to those two vertices in G before the update, to which the two new faces are attached. beB It can be proved that the expected time for this algorithm is 0{n log n), while the worst time is 0{n2). The complete framework for the analysis of randomized algorithm is nicely explained in the book mentioned in the very beginning of this part, see page 972. 13.3.5. Convex hull in 3D. Many applications need convex hulls of finite sets of points in higher di-*/ mensions, in particular in 3D. There are several ways of adjusting 2D algorithms to their 3D versions. First, it needs to be stated what the right structure is for the CH(X). As seen in 13.1.21, the convex polyhedra in R3 can be perfectly described by planar graphs. In order to modify the algorithms into 3D, some good encoding for them is needed. We want to find all vertices with edges or faces which are incident or neighbouring in time proportional to the output. 977 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS A string b can be obtained from the one bit shorter string b' by adding either a zero or a one, i. e., j(b) is the sum of j(b') ones and j(b') + 1 ones. Therefore, ax) = + 2j(&'))^|b'l+1 = E/I+1+2E 0) This is nicely achieved by the double connected edge Hence b'eB xB(x) + 2xC(x) C(x) b'eB b'eB ■ x(l-2x)- (1 - 2x) and the n-th coefficient is c„ = 2n~1(~21) = n2n~1. This number gives the number of ones in strings of length n, and there are bn = 2n such strings. Therefore, the expected number of ones in such string is = §, which is, of course, what we have anticipated. □ 13.1.15. Find the generating function and an explicit formula for the n-th term of the sequence {a„} defined by the recurrent formula ao = 1, «i = 2 an = 4a„_i — 3a„_2 + 1 for n > 2. Solution. The universal formula which holds for all n e Z is an = 4a„_i - 3a„_2 + 1 - 3[n = 1]. Multiplying by xn and summing over all n, we get the follow- oo ing equation for the generating function: A (x Hence, we can express 3x2 - 3x + 1 3 1 lists. ; Ib' I+1 = Double connected edge list - DCEL Let G = (V, E, S) be a planar graph. The double connected edge list is the list E such that each edge e is equipped with the pointers • VI, V2 to tail and head of e • Fl, F2 to the two incident faces (the left one with respect to the directed edge e first) • PI and P2 to the edges following along the face Fl and along the face F2, respectively (in the counter clockwise directions). At the same time, keep the list of vertices and the list of faces, always with just one pointer towards some of the incident edges. Next, look at the 2D incremental algorithm above and try to imagine what needs changing there to get it to work in 3D. First, we have to deal with the 2D faces S of the DCEL of the convex hull and instead of their boundary vertices, deal with boundary edges. Again, all the faces with conflicts with the just processed point have to be removed (see the picture). This leads to a directed cycle of edges with one of the pointers Fi missing (call them "unsaturated edges"). Finally, instead of adding two new faces in 2D, add the tangent cone of faces joining the point v with the unsaturated edges must be added. Of course the graph of conflicts must also be updated. A{X) (1-1)2(1-3z) Therefore, the coefficient at xn is 3, _w,/-l a„ = — 4 41-1 '-(-i)k(~: ---(n + l) + -3n 4 2K ' 4 13.1.16. Solve the following recurrence using generating functions: ao = l,a.! = 2 an = 5a„_i — 4a„_2 n > 2 Solution. The universal formula is of the form an = 5a„_i - 4a„_2 - 3[n = 1] + [n = 0] Multiplying both sides by xn and summing over all n, we obtain A(x) = 5xA(x) - 4x2A(x) - 3a; + 1. Randomized Incremental 3D Convex Hull Algorithm Input: A set X = {vi,..., vn}, n > 4, of at least five points in the space R3. Output: The DCEL R for the convex hull CH(X). (1) Initialization. Fix a random order on X. Choose the first four points as Ao, create the list of conflicts G and the DCEL R for CH(X0) (i.e. tell which of the four faces are seen from which points) and remove the four points from X. 978 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Hence l-3x 2 1 1 1 Mx) =--—^-7 = - ■--+ and (l-4x)(l-2;) 3 1-x 3 l-4i ^ = |(n1)+|(n)(-4)B = £^- □ 13.1.17. A cash dispenser can provide us with banknotes of values 200, 500, and 1,000 crowns. In how many ways can we pick 7,000 crowns? Use generating functions to find the solution. Solution. The problem can be reformulated as looking for the number of integer solutions of the equation 2a + 5&+10c=70; a, b, c > 0. This number is equal to the coefficient at x70 in the function G(x) = (l+x2+x4+- ■ ■)(l+x5+xw+- ■ ■)(l+x10+x20+- ■ This function is equal to G(x) 1 1 1 1-x2 1-x5 1-x10 and since 1-x10 = 1 + xb and ■ 1-x 10 = 1 + x2 + x4 + x6 + x°, 1 — x5 1 — X2 we can transform it into the form (1 + x2 + x4 + x6 + xs)(l + x5) G{x) = (I- By the binomial theorem, we have oo (l-^)3 = k=0 „10fc Therefore, G(x) equals (l+x2+x4+x5+x6+x7+x8+x9+x11+x13)Y,(-1)k k=0 k The term x70 can be obtained only as 7 ■ 10 + 0, i. e., the coefficient at x70 is equal to [a;™]G(a;) = -(-73) = m = ^=36. □ (2) Main cycle. Until the list X is empty, repeat: • take the first point v e X; • if there are some conflicts of v in G, then - remove all the faces of R in conflict with v (from both R and G), and take care of the edges e in R left without one of the incident faces, - build the "tangent cone" from the new point v to the current R by connecting v to the latter "unsaturated" edges, - add the new faces to both R and G and find all their conflicts (again, note that the check for new conflicts can be restricted to the points which were in conflict with the faces incident to those edges where the cone has been attached to the previous R). • remove the point v from the list X and from the graph G. A detailed analysis is omitted. As with the 2D case, the expected running time for this algorithm is 0{n log n). By the very well adapted DCEL data structure for the convex hull, it is a very good algorithm. The divide and conquer algorithm from the 2D case can be easily adapted, too. Skipping details, the initial ordering of the input points lexicographically allows to recursively call the same procedure producing two DCELs of disjunct convex polytopes. This allows us to apply a more sophisticated "gift wrapping" approach when merging the results. A sort of "tubular collar" wrapping of the two polytopes to create their convex hull is desired. Imagine rotating the hyperplanes similarly as with the lines in 2D in order to get the first edge in the tubular set of faces to be added. Then the first plane containing one of the missing new faces is obtained. Continue breaking the plane along the new edges, until both directed cycles along of which the collar is attached to the previous two polytopes are closed. All that is done by bending the planes by the smallest possible angle in each step and checking to arrive at the right position. Of course, the DCEL structure is essential ^(update all the data properly in time proportional to the size of the changes. With reasonably smart implementation, this algorithm achieves the optimal 0{n log n) running time. Both of the latter algorithms can be generalized to all higher dimensions, too. 13.1.18. Find the probability of getting exactly k heads after n times tossing the coin. Solution. Represent H outcome as x and T as y. We see that all possible outcomes of n tosse sare represented by expansion of j(x) = (x + y)n. The coefficient at xkyn~k is the number of outcomes with exactly k heads. The required (() k) probability is, therefore, v"2„ . □ 13.3.6. Voronoi diagrams. The next stop is at one of the most popular and useful planar divisions (and |L_. searching in them). For a finite set of points X = {v1,..., vn} in the plane R2, there is the obvious equivalence relation ~ on R2. Define v ~ w if and only if they share the uniquely given closest point v e X. Write VR(vi) for the equivalence class corresponding to v{. Define: 979 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.1.19. Find the probability of getting exactly k heads after n times tossing the coin if the probabilities of a head and a tail equal p and q, respectively, P + q = 1. Solution. Represent H outcome as px and T as q. We see that all possible outcomes of n tosses are represented by expansion of f(x) = (px + q)n. The coefficient at xk is the number of outcomes with exactly k heads. The required probability is, therefore, p 9 2^"^'^ ■ ^ 13.1.20. There are coins Cx,..., Cn For each k coin Ck is biased so that when tossed the probability that it falls heads is 2T+1- If n coms are tossed, what is the probability of an odd number of heads! Solution. Letpfc = 2khp[ and qk = 1 — pk- The generating function can be written as n n f(x) = Y[(qi + Pix) = £ amxm, i=l i=l with am the probability of getting exactly m heads. Observe that /(I) = £ am = £ a2k + £ a2k+i- and VORONOI DIAGRAM Hence, £ a2fc+1 = i(/(l) - /(-I)) = 1(1 - as /(l) = 1 and /(-l, -35T... 2n+1 - 2n+1 + V n + V 13 5 2n-l _ 1 [-] 13.1.21. Let n be a positive integer. Find the number of polynomials P(x) with coefficients from the set {0,1,2,3} such that P(2) = n. Solution. Let P(x) = ao + a±x + a2x2 + ... akxk + ... is such polynomial. Then its coefficients should satisfy the equation a0 + 1a\ + 4a2 + . .. 2kak + ... = n with integer coefficients 0 < ak < 3. The number of such solutions is given by the coefficient at xn in f(x) = (l+a;+a;2+a;3)(l+a;2+a;4+a;6)(l+a;4+a;8+a;12). OO = n(i+^fc+^2(2fc)+^3(2fc)) k=0 JIH 1 2k+2 1 1 — XZ 1 k=0 1 1 x2" (l-x)(l-x2) For a given set of points X = {vx,. ..,«„} (not all colinear), the Voronoi regions am VR{v{) = {xe R2; ||a; - v,\\ < \\x - vk\\, for all Vk G X, k ^ i}. This is an intersection of n — 1 open half-planes bounded by lines, so it is an open convex set. Its boundary is a convex polygon. The Voronoi diagram VD(X) is the planar graph whose faces are the open regions VR(vi), while the boundaries of VR(vi) yield the edges and vertices. 4(1 -x) 2(1 -x)2 4(1 + x) -—- = \^n = Ooo ( \ + ( Care is needed about collinearity since if all the points v{ are on same line in R2, then their Voronoi regions are strips in the plane bounded by parallel lines. Under all other circumstances, the planar graph VD(X) from the latter definition is well denned and connected. By definition, the vertices p of VD(X) are the points in the plane, such that at least three points v,w,u e A are the same distance from p and no more points of X are inside the circle through v,w,u. If there are no more points of X on the latter circle, then the degree of this vertex pis 3. The most degenerate situation occurs if all the points of X are on one circle. Then, obviously, the Voronoi regions are delimited by two half lines, all emanating from the center of the circle and cutting the angles properly. The construction of the VD(X) is then equivalent to ordering of the points by angles. At least 0(n log ti) time is needed for the worst case estimate in any algorithm building the Voronoi diagrams. Some of the Voronoi regions are unbounded, others bounded. If just two points v and w are considered, then the axis of the segment vw is the boundary for the regions VR(y) and VR(w). In particular, the region VR(y) must be bounded for each v in the interior of the convex hull of X. On the contrary, consider an edge in the CH(X) with incident vertices v and w, and the "outer" half-axis of the segment vw. If one considers any interior point u in the CH(X), then it is in the other halfplane with respect to the segment vw. Sooner or later the points in the latter half-axis are closer to v and w than to u. It follows that both VR(y) and VR(w) are unbounded. Summarizing: Lemma. Each Voronoi region VR(v) of the Voronoi diagram VD(X) is an open convex polygon region. It is unbounded if and only if v belongs to the convex hull of X. • 13.3.7. An incremental algorithm. Each Voronoi diagram represents a planar division and adding a new point p as a vertex, it is quite obvious how to update the diagram. First assume, we know which VR(v)the point p hits. Then choose the center v, split the region VR(y) by the relevant part of the axis of the segment pv. Add this new edge e into the updated VD(X), l'jsimultaneously)cV§ating the two new faces and removing the l4?(i^one. The: new edge e hits the boundary of the current 980 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Hence, the required coefficient is + 1. □ ,2017}, such 13.1.22. Find the number of subsets of {1, that the sum of its elements is divisible by 5. Solution. For subset S = {s1, s2, • • •, sk} define a(S) xSlxs2 ...xSk. Then „2017-1 .(1 + : £ a(S) = (l + x)(l + x2) SC{1,...,2017} The coefficient at xn in f(x) is the number of subsets in {1,..., 2017} with the sum of its elements being exactly n. Let f(x) = J2anXn. We are looking for the sum A5 = >j =exp(2f),j = 0,...,4bethe a0 + a5 + a10 +. Leto; 5-th roots of unity with o;0 1 :exp( 5 1. Then (/(wo) + /M + /(o,2) + /(o,3) + /(o,4)). VR(y) in either two points or one point (if the new edge is unbounded). These hits show what is the next region of the updated diagram to be split. "Walk" further with the new hit at the boundary revealing the next center of region playing the role of v above. Ultimately this walk consecutively splits the visited old regions and creates the new directed cycle of edges bounding the new region, or it has an unbounded path of boundary edges, if the new region is unbounded. See the diagram for an illustration. Obviuosly f(u0) = /(l) = 22017. For 1 < j < 4 fiuj) = ((1+a, Xl+^-Xl+a; )(l+o, )(l+o, ^(l+o^l+o, ) = 2 Therefore, ^r(o,J) = 2403(4-l-l-l) = 2' 403 J = l Therefore, the answer is | (22017 + 2403), which is an integer as 2 is the last digit in 22017 and 8 is the last digit in 2403. □ 13.1.23. tion Using generating functions, solve recursive equa- ak+1 = 2ak + Ak, ax = 3. Solution. From a1 = 2a0 + 4° follows a0 = 1. Multiplyu-ing both sides of the recursion by xk by summation on k we obtain E fc=0 ak+1x kxk. k=0 k=0 Therefore, for generating function j(x) = akXk we have fc=0 ^ oo ak+1x k+l 2f(x) + k=0 k=0 that is -(f(x)-\) = 2f{x) + T Therefore, i + Ax + l-2a; (1 - 2x)(l - Ax) 1-22 1-42 If the new point p is on the boundary, i.e. hitting one of the edges or vertices in VD(X), then the same algorithm works. Just start with one of the incident regions. So far this looks easy, but how does one find the relevant region hit by the new point? An efficient structure to search for it on the run of the algorithm is desired. Build an acyclic directed graph G for that purpose. The vertices in G are all the temporary Voronoi regions as they were created in the individual incremental steps. Whenever a region is split by the above procedure, new leafs in G are created. Draw edges towards these leafs from all the earlier regions which have some nontrivial overlap. Of course, care must be taken how the old regions are overlap with the new ones, but this is not difficult. We illustrate the procedure on the diagram, updating from one point to four points. 981 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Expanding f(x), we obtain: 1 oo 1 oo k=0 k=0 and so, ak = \2k + §4fe. □ 13.1.24. In how many ways can n balls be distributed in 4 boxes if the first box has at least two balls. Solution. The generating function for the first box is bi(x) = x2 + x3 + x4 + ... = The generating function for each other box is b(x) = 1 + x + x2 + x3 + ... = For all 4 boxes it has to be f(x) = bi(x)(b(x)) 3 _ _£l (l-x rz. The coefficient at xn inj(x) that indicates the number of options to distribute n balls, equals -2 + 4-1,4-1) l)+i'3)- □ 13.1.25. How many sequences of {0,1,2,3} of length n have at least 2 zeroes? Solution. Exponential generating function for 0- entry is bo(x) = ^ + ^ + ... = ex-l-x . Exponential generating function for other entries is ex as there are no restrictions. Hence, the exponential generating function in question is j(x) = e4x — e3x —xe3x. Theanswer to the question is n\ times the coefficient at xn in j(x), which is nK—r —r nl nl on—1 n—— =4n-3n-n3n-1. □ Incremental Voronoi Diagram Input: The set of points X = {vi,..., vn} in the plane, not all collinear. Output: The DCEL of VD(X) and the search graph G. (1) Initialization. Consider the first two points X0 = {vi,v2J and create the DCEL for VD(Xo) with two regions. Create the acyclic directed graph G (just root and two leaves). (2) Main cycle. Repeat until there are no new points z e X: • localize the VR(y) hit by z (by the search in G) • perform the path walk finding the boundary of the new region VR(z) in VD(X) • update the DCEL for VD (X) and the acyclic directed search graph G. This algorithm is easy to implement. It produces directly a search structure for finding the proper Voronoi regions of given points. Unfortunately, it is very far from optimal in both aspects - the worst running time is 0{n2), and the worst depth of the search acyclic graph is 0{n). If this is treated as a randomized incremental algorithm, the expected values is better, but not optimal either. Below is a useful modification via triangulations. 13.3.8. Delaunay triangulation. One remarkable feature of the Voronoi diagrams should not remain unnoticed. Right after the definition of the Voronoi diagram, an important fact was mentioned. The vertices of the planar graph VD(X) are centers of circles containing at least three points of X, and no other points of X are inside of the circle. If the dual graph to VD(X) (see 13.1.22 for the definition) is considered, then this is again a tessellation of the plane into convex regions. It is called the Delaunay tessellation DT(X). In the generic case, the degrees of all vertices in VD(X) are 3 (i.e. no four points of X lie on one circle). This is the Delaunay triangulation.13 Notice that it easy to turn any Delaunay tesselation into a triangulation by adding the necessary edges to triangulate the convex regions with 13Although the name sounds French, Boris Nikolaevich Delone (1890 - 1990) was a Russian mathematician using the French transcription of his name in his publications. His name is associated with the triangulation because of his important work on this from 1934. 982 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS more edges. Any of these refined tesselations is called the Dalaynay triangulation associated to the VD(X). In general, a planar graph T is called a triangulation of its vertices X c R2, \X\ = n, if all its bounded faces have just 3 vertices. It is easy to see that each triangulation T has r = 2ti—2—k triangles and v = 3n—3—k edges, where k is the number of vertices in the CH(X). By the Euler formula (13.1.20) n—u+t+1 = 2 (there is an unbounded face on top of all the triangles). Now, every triangle has 3 edges, while there are k edges around the unbounded face. It follows that 3r + k = 2v. It remains to solve the two linear equations for r and v. The triangulations are extremely useful in numerical mathematics and in computer graphics as the typical background mesh for processing of approximate values of functions. Of course, there are many triangulations on a given set and one of the qualitative requests is to aim at triangles as close to the equilateral triangles as possible. This could be phrased as the goal to maximize the minimal angles inside the triangles. A practical way to do this is to write the angle vector of the triangulation ■A(T) = (7l,72,---,73r), where 7, are the angles of all the triangles in T sorted by their value, 7j < jk for all j < k. A triangulation T on X is said to be angle optimal, if A(T) > A(Tr) for all triangulations V on the same set of vertices X, in the lexicographic ordering. In particular, an angle optimal triangulation achieves the maximum over the minimal angles of the triangles. Surprisingly, there is a very simple (though not very effective) procedure to produce (one of) the angle optimal triangulations. Consider any two adjacent triangles and check the six angle sequences of their interior angles. If the current position of the diagonal edge provides the worse sequence, flip it. See the diagram. The flip is necessary if and only if one of the vertices outside the diagonal is inside the circle drawn through the remaining three vertices. Since each such flipping of an edge inside of a tringula-tion T definitively increases the angle vector, the following algorithm must stop and achieve an angle optimal triangulation: 983 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Edge Flipping Delaunay Input: Any triangulation T of points X in the plane. Output: An angle minimal triangulation T of the same set X. (1) Main cycle. Repeat until there are no edges to flip: • Find an edge which should be flipped and flip it. Theorem. A triangulation T on a set of points X in the plane R2 is angle optimal if and only if it is a Delaunay triangulation associated to the Voronoi diagram VD(X). Proof. Consider any Delaunay triangulation T associated to VD(X) and one of the vertices p of VD(X). Let v1,..., Vk be all the points of X lying on the circle determining p. Fix an edge with two neighbouring endpoints on a circle. All triangles with the third vertex on the circle above the edge share the same angle. A simple check now verifies that different ways of triangulating the same region of VD(X) with more than 3 boundary edges lead always to the same angle vector. In particular, there are no flips at all necessary in the above algorithm if one starts with the Delaunay trinagulation T. Hence the angle optimal triangulation is arrived at. In order to prove the other implication, recall the comments on the diagram above. All triangles in the angle optimal triangulations T have the following two properties: (1) the circle drawn through their three vertices do not include any other point in its interior: (2) the circle having any of their edges as diameter does not have any other point in its interior. Consider the dual graph G of T and consider its realization in the plane by drawing the vertices as the centers of the circles drawn through the vertices of individual triangles, while the edges are the segments joining them. If there are not more than 3 points on any of those circles, then G = VD(X) is obtained. In the degenerate situations, all the triangles sharing the circle produce the same vertex in the plane and some of the relevant edges degenerate completely. Identify those collapsing elements in G, to get the right VD(X). □ 13.3.9. Incremental Delaunay. We return to the general idea for the Voronoi diagram, namely to design an algorithm which constructs both VD(X) and DT(X) and which behaves very well in its randomized implementation. The idea is straightforward. Use the incremental approach as with the Voronoi diagrams for refining the consecutive Delaunay triangulations, employing the flipping of edges method. By looking at the diagram, the Voronoi algorithm is easily modified. Care must be taken of three different cases for the new points - hitting the unbounded face, hitting one of the internal triangles, hitting one of the edges. 984 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Incremental Delaunay Triangulation Input: The set of points X = {vi,..., vn} in the plane, not all collinear. Output: The DCEL of both DT(X) and the search graph G for this triangulation. (1) Initialization. Consider the first three points Xq = {vi,v2, v3}. Create the DCEL for DT(X0) with two regions, and create the CH(Xo) (the connected edge list). Create the acyclic directed graph G (just root and two leaves). (2) Main cycle. Repeat until there are no new points z e X: • Localize the face A inDT(Xk) hit by z (by the search in G) • if z is in the unbounded face, then - add the new triangles A1,..., Ai to DT(Xk) by joining z to visible edges in CH(Xk) - update the CH(X). • if z hits a (bounded) triangle A, then split it into the three new triangles A1,A2,A3. • if z hits an edge e, then split the adjacent bounded triangles into A1,..., A4 (only two, if an edge in CH{Xk)) is hit). • Create a queue Q of not yet checked modified triangles and repeat as long Q is not empty: - take the first A from the queue Q, look for its neighbour not in Q and not yet checked, and flip the edge if necessary; - if an edge is flipped, put the newly modified triangles into Q Detailed analysis of the algorithm is omitted. It is almost obvious that the algorithm is correct. It is only necessary to prove that the proposed version of the flip edge algorithm update ensures that after each addition of the new point z in fcth step, the correct Delaunay triangulation of Xk arises. Once an edge is flipped, then it is not necessary to consider it any time later. Finally, if the Voronoi diagram is needed instead, it can be obtained from the DT(X) in linear time. Obviously the search structures can be used directly. Surprisingly enough, it turns out that the expected number of total flips necessary over all the run is of size 985 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 0{n log n). Hence the algorithm achieves the perfect expected time O (n log n). Detailed analysis of this beautiful example of results in computational geometry can be found in the section 9.4. of the book by Berg at al., cf. the link on page 971. 13.3.10. The beach line Voronoi. The Voronoi algorithm provides a perfect example for the the sweep line paradigm, where the extra structure to keep track of the events has to be quite smart.14 Imagine the horizontal line L (parallel to the x axis) flows in the top-down direction and meets the points in X = {vi,..., vn}. Of course VD(X) including all points above the current position of L cannot be drawn, since it depends also on the points below the line. It is better to look at the part Rl in the plane, Rl = {p G K2; dist(p, L) > dist(p, vt), vt e X above L}. This is exactly the part of R2 which can be tesselated into the Voronoi diagram with the information collected at the current position of L. Obviously, Rl is bounded by a continuous curve consisting of parts of parabolas, since for one point v{ this is the case, and the intersection of the conditions is relevant. Call the boundary of Rl the beach line E>l- The vertices on Bl draw the VD(X) when L is moving. Since the Voronoi diagram consists of lines, we do not even compute the parabolas and take care of the arrangements of the still active parts of parabolas in the beachline, as determined by the individual points. New parts of the beachline arise when the line L meets one of the points. Add all the points to an ordered list in the obvious lexicographic order. Call them the point events. The active arc in the beachline disappears if the line L meets the bottom of the circle drawn through three points determining a vertex in the Voronoi diagram. Such an event is called a circle event. Both types of the events are illustrated in the diagram above. There is a striking difference between them. The point events always initiate a new arc and start "drawing" two edges of the Voronoi diagram. They initiate previously unknown circle events. This is mostly called the Fortune Voronoi algorithm. Not because this is such a lucky construction, but rather because the algorithm was published by Steven Fortune of the Bell Laboratories in 1986 986 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS The circle events might disappear without creating a genuine vertex in VD(X). Look at the diagram at the s point event. The new s,r,q circle event is encountered there. But this would not create a vertex in the diagram if there was the next point u somewhere close enough to the indicated vertex. One could find it out as soon as such a point event u is met. Such "ineffectively disappearing" circle events are called false alarms. On the contrary the p, q, r circle event shown in the diagram gives rise to the indicated vertex. Summarizing, the emerged circle events must be inserted properly into the ordered queue of events and handled properly at each of the point events. Further details are not considered here. When implemented properly, this algorithm runs in the optimal 0{n log n) time and 0{n) storage. See again the above mentioned book by Berg et al. (section 7) for details. 13.3.11. Geometric transformations. Various geometric transformations of the basic ambient space can often help to transform one problem into another one. This is illustrated in a beautiful construction relating the convex hulls and the Voronoi diagrams. Of course transformations which behave well on lines and planes and preserve incidences should be well thought of. The affine and projective transformations behave well in this respect as in the fourth chapter. Introduce a more interesting one - the spherical inversion. In the plane R2, consider the unit circle x2 + y2 = 1. For arbitrary v = (x, y) =^ (0,0) define Clearly p is abijection on R2 \ {(0,0)}. The geometric meaning of such a transform is clear from the formula, see the diagram. "The general" point v is sent to a point on the same line through the origin, but with reciprocal size. The unit sphere is the set of all fixed points. The same principle works for all dimensions, so we may equally well (and more interestingly) consider v e R3 in the sequel. Next follows the crucial property of p. Lemma. The mapping p maps the spheres and planes in R3 onto spheres and planes. The image of a sphere is a plane if and only if the sphere contains the origin. 987 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Proof. Consider a sphere C with the center c and radius r. The equation for its general points p reads \\p-42 = r2. By drawing a few images as in the diagram above, it is easily guessed that the images will be a circle with the center s = ||e||21_;,2 c (i.e. again on the same line through the origin). Now consider q = p(p) and compute (using just 2p ■ c = \\p\\2 + \\c\\2 — r2 from the latter equation) 4 _P___C ||2 IMI2 ||c||2-r2" 1 +7^^N2-2, \\P\\2 (l|c||2-r2)2" " (l|c||2-r2)||p||2 1 „ ,„ 1 (||c||2 -r2)2" 2 c||2 — 1 The latter computation assumes || c|| ^ r. Fix the center c and consider diameter approaching r from below or above. Then the image is a circle with the fixed center s and a fast growing radius. In the limit position, the line is obtained as requested. (Check the computation directly yourself, if any doubts.) □ The continuity of ip has got important consequences. Consider a general plane y (not containing the origin). The inversion ip maps one of the half-spaces determined by y to the interior of the image sphere. The other half-space maps to the unbounded complement of the sphere. The latter is of course the half-space containing the origin. The efficient link between the Voronoi diagrams and convex hulls can now be explained. Assume a set of points X = ..., vn} in the plane is given. View them as the points in the plane z = 1 in R3, i.e. add the same coordinates z = 1 to all the points (x, y) in X. For simplicity, assume that no three of them are collinear end no four them lie on the same circle. The spherical inversion p maps the entire plane z = 1 to the sphere S with center c = (0,0,1/2) and radius 1/2. Write wi,..., wn for the images w{ = p(vi). Now, consider CH(Y) for the set of the images Y = {wi,... ,wn}. This is a convex polytope with all vertices on the sphere S. All its faces represent planes not containing the origin (this is due to the assumption that no three points of X are collinear). Split the faces of CH(Y) into those "visible" from the origin and the "invisible" ones. In the latter case, all points of Y are on the same side of plane y generated by the face as the origin. This implies that all the other points are outside of the image sphere = p(y). In particular, there are no points of X inside of the intersection of S*M with the plane z = 1. This is the defining condition for obtaining one of the vertices of the Voronoi diagram. Since the map p preserves incidencies, the entire DCEL for VD(X) is easily reconstructed from the DCEL of CH(Y) and vice versa. 988 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS This resembles the construction of the dual graph, i.e. the Delaunay triangulation DT(X) from the Voronoi diagram, with further geometric transformation in the back. Last, but not least, the faces of CH(Y) visible from the origin are worth mentioning too. For the same reason as above, all the points of Y appear on the other side from the origin and so, all the points in X are inside the image sphere. This means the diagram of furthest points instead of the Voronoi diagram of the closest ones is obtained. This is a very useful tool in several areas of mathematics, see some of the exercises (??) for further illustration. 4. Remarks on more advanced combinatorial calculations 13.4.1. Generating functions. The worlds of discrete and continuous mathematics meet all the time. There are already many instances of useful interactions. With some slight exaggeration, we can claim that all results in analysis were achieved by an appropriate reduction of the continuous tasks to some combinatorial problem (for instance, integration of rational functions is reduced to partial fraction decomposition). In the opposite direction, we demonstrate how handy continuous methods can be. We begin with a simple combinatorial question: There are four 1-crown coins, five 2-crown coins, and three 5-crown coins at our disposal. Suppose we want to buy a bottle of coke which costs 22 crowns. In how many ways can we pay the exact amount of money with the given coins? We are looking for integers i,j,k such that i+j +k = 22 and i e {0,1,2, 3,4}, j e {0,2,4, 6,8,10}, k e {0,5,10,15}. Consider the product of polynomials (over the real numbers, for instance) (x° + x1 + x2 + x3 + xA)(x° + x2 + xA + x6+ + xs + x10)(x° + x5 + x10 + x15). It should be clear that the number of solutions equals the coefficient at x22 in the resulting polynomial. This corresponds to the four possibilities of choosing the values i, j, fc: 3-5 + 3- 2 + 1-1,3-5 + 2- 2 + 3-1, 2-5 + 5- 2 + 2-1, and 2-5 + 4-2 + 4-1. This simple example deserves more attention. The coefficients of the particular polynomials represent sequences of numbers, referring to how many times we can achieve the given value with one type of coins only. Work with an infinite sequence to avoid a prior bound on how many available values there can be. Encode the possibilities in infinite sequences (1,1,1,1,1,0,0,...) 1-crowns (1, 0,1,0,1,0,1,0,1,0,1, 0,0,...) 2-crowns (1, 0,0,0,0,1, 0,0,0,0,1, 0,0,0,0,1, 0,0,...) 5-crowns. 989 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Each such sequence with only finitely many non-zero terms can be assigned a polynomial. The solution of the problem is given by the product of these polynomials, as noted before. This is an instance of a general procedure for handling sequences effectively. Generating function of a sequence Definition. An (ordinary) generating function for an infinite sequence a= (a0,ai,a2,...)isa (formal) power series oo a(x) = ao + a\x + a2x2 + ■ ■ ■ = cnx1. i=o The values a{ are considered in some fixed field K, normally the rational numbers, real numbers, or complex numbers. In practice, there are several standard ways for denning and using generating functions: - to find an explicit formula for the n-th term of a sequence; - to derive new recurrent relations between values (although generating functions are often based on recurrent formulae themselves); - for calculation of means or other statistical dependencies (for instance, the average time complexity of an algorithm); - to prove miscellaneous combinatorial identities; - to find an approximate formula or the asymptotic behaviour when the exact formula is too hard to get. We shall see examples of some of these. 13.4.2. Operations with generating functions. Several basic operations with sequences correspond to simple operations over power series (which can be easily proved by performing the relevant operation with the power series): • Component wise, the sum (a, + b{) of the sequences corresponds to the sum a(x) + b(x) of the generating functions. • Multiplication (a ■ a{) of all terms by a given scalar a corresponds to the same multiplication a ■ a(x) of the generating function. • Multiplication of the generating function a (x) by amono-mial xk corresponds to shifting the sequence k places to the right and filling the first k places with zeros. • In order to shift the sequence k places to the left (i.e. omit the first k terms), subtract the polynomial bk (x) corresponding to the sequence (a0,..., ak_1,0,...) from a(x), and then divide the generating function by the expression xk. • Substitution of a polynomial / (x) for x leads to a specific combination of the terms of the original sequence. They can be expressed easily for f(x) = ax, which corresponds to multiplication of the fc-th term of the sequence by the scalar value ak. The substitution f(x) = xn inserts n — l zeros between each pair of adjacent terms. 990 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS The first and second rules express the fact that the assignment of the generating function to a sequence is an isomorphism of the two vector spaces (over the field in question). There are other important operations which often appear when working with generating functions: • Differentiation with respect to x: the function a!(x) generates the sequence (ai, 2a2, 3a3,...); the term at index k is (k + l)aj;+i (i.e. the power series is differentiated term by term). • Integration: the function f* a(t)dt generates the sequence (0, a0, i<2i, i<22, \a3,...); for k > 1, the term at index k is equal to jdk-i (clearly, differentiation of the corresponding power series term by term leads to the original function a(x)). • Product of power series: the product a(x)b(x) is the generating function for the sequence (c0, c1, c2,...), where i.e. the terms of the product are up to the same as in the product (a0 + aix + a2x2 + ■ ■ ■ + akxk)(b0 + b\x + b2x2 + ■ ■ ■ bkxk). The sequence (c„) is also called the convolution of the sequences (sn), (bn). 13.4.3. More links to continuous analysis. There are useful examples of generating functions. Most of them are seen when working with power series in the third part of chapter six. Perhaps the reader recognizes the generating function given by the geometric series: a(x) = ——— = 1 + x + x2 + ... , 1 — x which corresponds to the constant sequence (1,1,1,...). From the sixth chapter, this power series converges for x e (—1,1) and equals 1/(1 — x). It works the other way round as well: Expand this function into its Taylor series at the point 0. The original series is obtained. This "encoding" of a sequence into a function and then decoding it back is the key idea in both the theory and practice of generating function. Generally, consider any sequence a{ with ^faZ bounded. Then there is a neighbourhood on which its generating function converges (see i on page 340). For example, an easy check shows that this happens whenever an | = 0{nk) with a constant exponent k > 0. On this neighbourhood, the generating functions can be worked with as with ordinary functions. In particular, one can add, multiply, compose, differentiate, and integrate them. All the equalities obtained carry over to the relevant sequences. 991 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS Recall several very useful basic power series and their sums: 1 1-x ln(l+ *) = £(-!)"-» n>l 1 — x n n>l Ex n (2n+11!' n>0 Exr^ n>0 V a;2™ n>0 V ' 13.4.4. Binomial theorem. Recall the standard finite binomial formula (a + b)r = ar(l + c)r = orEl=0 Qc", where r6N,0/fl,kC, c = b/a. Even if the power r is not a natural J— number, the Taylor series of (1 + x)r can still be computed. This yields the following generalization: Generalized binomial theorem Theorem. For any r G K, k G N, write fr\ ^ r(j. _ i)(r _ 2)... (r - k + 1) xk k) k\ (in particular (p) = 1, having empty product divided by 1 in the latter formula). The power series expansion \k, k>0 converges on a neighbourhood of zero, for each r£l, The latter formula is called the generalized binomial theorem In particular, the function ^xy, n G N can be expanded into the series 1 /l + n-l\ (k + n-l\ k Proof. The theorem is obvious if r G N since it is then the finite binomial formula. So assume r is not a natural number and thus zero is never obtained when evaluating (£). First, differentiate the function a(x) and evaluate all the derivatives in x = 0. Obviously aW(0) =r{r-l)...{r-k + 1)(1 + af^o = r(r - 1) ■ ■ ■ (r - k + 1) 992 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS which provides the coefficients ak = (rk) of the series. In 5.4.5 There are several simple tests to decide about convergence of a number series. The ratio test helps here: rk+l (r-l)...Q r r ak+i z ^ _ (n+i)! z ^ _ r-n akXk ~ r(7—1)...(t—n+i) xk _n+i n! By the ratio test, the radius of convergence is 1 for all r <£ N. The generalized binomial formula for negative integers is a straightforward consequence. Substituting —x for the argument just kills the signs appearing in the generalized binomial coefficients. □ 13.4.5. Examples. The formulae with r as a negative in-€5U teger are very useful in practice. The simplest one is the geometric series with r = — 1. Write down two more of them. 1 (1 n>0 71 + 2 X _— = V, (1-x)3 ^\ 2 V ' n>0 V The same results can be obtained by consecutive convolutions. Indeed, for the generating function a(x) of a sequence (a0, ai, a2,...), jz^a^x) is the generating function for the sequence of all the partial sums (a0, a0+ai, a0+a1+a2,...). For instance, 1 -In- 1 1 — x 1 — x is the generating function of the harmonic numbers Hn = l + \ + --- + -. 2 71 13.4.6. Difference equations. Typically, the generating functions can be very useful, if the sequences are defined by relations between their terms, vjtr^agg—g ^ instructive example of such an application is the complete discussion of the solutions of linear difference equations with constant coefficients. This is examined in the second part of chapter one, see 1.2.4. Back there, a formula is derived for first-order equations, the uniqueness and existence of the solution is justified, after only "guessing" the solution. Now, it can be truly derived. First, sort out the well-known example of the Fibonacci sequence, given by the recurrence Fn+2 = Fn + Fn+i, F0 = 0, Fi = 1, and write F(x) for the (yet unknown) generating function of this sequence. We want to compute F(x) and so obtain an explicit expression for the nth Fibonacci number. The defining equality can be expressed in terms of F(x) if we use our operations for shifting the terms of the sequence. Indeed, xF(x) corresponds to the sequence (0,F0,F1,F2,...), and x2F(x) does to (0,0,^,^,...). Therefore, the generating function G(x) = F(x) -xF(x) - x2F(x) 993 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS represents the sequence (F0,Fi-FoM. ..,0,...). Substitute in the values F0 = 0, F1 = 1 (the initial condition). Obviously G(x) = x and hence (1 - x - x2)F(x) = x. F(x) is a rational function. It can be rewritten as a linear combination of simple rational functions. This is helpful, since a linear combination of generating functions corresponds to the same combination of the sequences. Rational functions can be decomposed into partial fractions, see 6.2.7. Using this procedure, we a generating function for 1/(1 — x — x2). Namely, write r e N 1 A B n*) = 1 „ „2 = —- + 1 — X — X2 X — X\ X — X2 a b + 1 — Xix 1 — A2a; where A, B are suitable (generally) complex constants, and xi, x2 are, the roots of the polynomial in the denominator. The ultimate constants a, b, Ai, and A2 can be obtained by a simple rearrangement of the particular fractions. This leads to the general solution for the generating function oo F(x) = J2(aXi +bX2)xn, 71 = 0 and so the general solution of the recurrence is known as well. In the present case, the roots of the quadratic polynomial are 1±2V^. Hence Ai;2 = Y±^7g- The partial fraction decomposition equality gives x = a(l- —^jfx) +b(l and so a = — b = Finally the requested solution is obtained. Compare this procedure to the approach in 3.2.2 and 3.B.I. This expression, full of irrational numbers, is an integer. The second summand is approximately (1 — \/5)/2 ~ —0,618. Its value is negligible for large n. Hence Fn can be computed by evaluating just the first summand and approximating to the nearest integer. Of course, the same procedure can be applied for general fc-th order homogeneous linear difference equations. Consider the recurrence Fn+k = o;0Fn + ■ ■ ■ + ak-iFn+k-i- The generating function for the resulting sequence is , n q(x) F(x) = 1-k^2-. 1 — OL$XK — ■ ■ ■ — a^-iX where the polynomial g(x) of order at most k — 1 is determined by the chosen initial conditions. Using partial fraction decomposition, the general result follows as in subsection 3.2.4. 994 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.4.7. The general method. Power series are a much stronger tool for solving recurrences. The point is that one is not restricted to linearity and homogeneity. Using the following general approach, recurrences that seem intractable at first sight can quite often be managed. The first steps are just algorithmic, while the final solution of the equation on the generating function may need very diverse approaches. In order to be able to write down the necessary equations efficiently, adopt the convention of the logical predicate [S(n)] which is attached before the expression it should govern. Simply multiply by the coefficient 1 if S (n) is true, and by zero otherwise. For instance, the equation Fn = Fn_2 + Fn_x + [n = 1]1 + [n = 0]1 defines the above Fibonacci recurrence with initial conditions F0 = 1 andFi = 2. Method to resolve recurrences Recurrent definitions of sequences (a0,ai,...) may be solved in the following 4 steps: (1) Write the complete dependence between the terms in the sequence as a single equation expressing an in terms of terms with smaller indices. This universal formula must hold for all n e N (supposing a_i = a_2 = ••• = 0). (2) Both sides of the equation are multiplied by xn. Then sum the resulting expressions over all n e N. One of the summands is J2n>o anXn, which is the generating function A (x) for the sequence. Rearrange other summands so that they contain only the terms A(x) and some other polynomial expressions. (3) Solve the resulting equation withrespectto A(a;) explicitly. (4) The function A(x) is expanded into the power series. Its coefficients at xn are the requested values of an. As an example, consider a second order linear difference equation with constant coefficients, but a non-linear right hand side. The recurrence is an = 5a„_i — 6a„_2 — n with initial conditions ao = 0, a± = 1. The individual steps in the latter procedure are as follows: Step 1. The universal equation is clear, up to the initial conditions. First check n = 0, which yields no extra term, but then n = 1 enforces the extra value 2 to be added. Hence, an = 5a„_i - 6a„_2 -n+[n = 1]2. Step 2. 52 anxn = 5x 52 an-\xn~x - Qx2 52 an-2Xn~2 n>0 n>0 n>0 -52nx7l+2x n>0 Next, one of the terms is nearly the power series for (1—a;)-2. Thus remove one x there in order to get the equality on A(x) 995 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS in the form as required (ignore the negative values of indices since all a_i, a_2.... vanish by assumption). A(x) = hxA(x) - Qx2A(x) - x (1_1x)2 + 2a;. Step3. Find the roots 2 and 3 of me polynomial l—5a;+6a;2 = (1 — 2x) (1 — 3a;). An elementary calculation yields 2a;3 - Ax2 + x ^ = (l-2a;)(l-3a;)(l-a;)2' Step 4. Partial fraction decomposition directly leads to the result A(x) =----h 2-----—---. w 4 1-3a; 1 - 2a; 2 (1 - x)2 4 1 - x This corresponds to the solution an = -\T + 2^-\n-\. The first eight terms in the sequence are 0,1, 3, 6, 8, —1, —59, -296. 13.4.8. Plane binary trees and Catalan numbers. The next application of the generating functions an-X, swers the question about the number bn of non-isomorphic plane binary trees on n vertices (cf. 13.1.18 for plane trees). Treat these trees in the form of the root (of a subtree) with a pair [the left binary subtree, the right binary subtree]. Examine the initial values of n, namely &o = l,&i = l,&2 = 2,&3 = 5. It is more or less obvious that for n > 1, the sequence bn satisfies the recurrent formula bn = b0bn-i + hbn-2 H-----h bn-ib0, and this is actually close to a convolution of two equal sequences. Rearrange the expression so that it holds for all n e N0: bn = E bkbn-k-i + [n = 0]1. 0 0+ is co, but the generating function for our sequence must have the value b0 = 1 at 0. In the last step, expand B(x) into a power series. The expansion can be obtained using the generalized binomial theo- {l-Axfl2 = YJ{ll\-^)k k>0 ^ ' k>l Dividing 1 — VI — 4x by the expression 2x leads to k>l n>0 l/2\ (-Ax)n n I n + 1 Substitute the (-4)™ multiple of (~]{2) into the definition of the generalized binomial numbers. A straightforward check shows (—4)™ (~J/2) = (2™), which yields a final, much neater, formula for the coefficients. We conclude that the number of plane binary trees on n vertices equals 1 (In Or,. = n + l\ n These are known as the Catalan numbers. They occur surprisingly often: • the number of well-parenthesized words of length In, i.e. words consisting of n opening and n closing parentheses so that no prefix of the word contains more closing parentheses than closing ones; • this also corresponds to the number of ways an unsup-plied vending machine can accept n 5-crown coins and n 10-crown coins for 5-crown orders so that it can always give the change (hence the probability that a random ordering is satisfactory can also be found) • the number of monotonie paths from [0,0] to [n, n] along the sides of the unit squares of the grid such that the path does not cross the diagonal • the number of triangulations of a convex (n + 2)-gon. The intuitive reasoning for this is that they come from the expansion of the square root within B(x) and quadratic equalities appear often in real world. 997 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.4.9. Quicksort analysis. The next task is to determine the expected number of comparisons made by the Quicksort, a well-known algorithm for sorting a (finite) sequence of elements. This is the following divide and conquer type of algorithm: Procedure Qsort Input: A (non-sorted) list of elements L = (L[0],..., L[n]) Output: The sorted list L with the same elements. (1) if L is empty, then return the empty list (). (2) Divide phase. Create a sublist L1 by going through L and leaving only the elements x with L[0] > x, while putting the other elements into the list L2. (3) Conquer phase. Combine the lists L = Qsort (Li) + (L[0]) + Qsort(L2) and return the list L. We analyze how many comparisons are needed. Assume that all possible orderings of the list L to be sorted are distributed uniformly. The following parameters are crucial: • The number of comparisons in the divide phase is n — 1. • The assumption uniformity ensures that the probability of L [0] being the fc-th greatest element of the sequence • The sizes of the sublists to be sorted in the conquer phase are k — 1 and n — k. There is the following recurrent formula for the expected number of comparisons Cn: n (1) Cn = n-\ + Y-(Ck-1+Cn-k). k=l One could work the steps of the general method directly, but the symmetry of the two summands allows a rewrite (1), multiplying by n at the same time n (2) nC7„ = n(n-l) + 2^C7fc-i. k=l In the first step, care is needed concerning about n = 0. In the denning recurrence (1), n = 0 is not treated at all (since the equation does not make sense). So the convention must be extended to include Co = 0 in the computation. Then the equation (2) defines the C\ = 0 properly. It is not necessary to add any terms in view of the initial conditions. Next, multiply both sides by xn and add n Y, nCnxn = Yn(n- l)xn + 2 ^ ^ Ck-ixn. n>0 n>0 n>0fc=l All the terms look familiar. The left hand side shows the derivative of the generating function C(x) = J2n>o Cnxn if one x is removed. The first term on the right is the series for (1 — a;)3, up to a constant and shift of powers by 2. Finally the last term is the convolution with (1 — a;)-1, up to one x 998 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS and the coefficient 2. Hence the equation (l-a;)3 l-x' The third step is straightforward see 8.3.3. Divide by x to obtain C (x) =-^—C(x) + 2X 1 -X ' (1 - f -2- dx The corresponding integrating factor ise J 1~x x = (1 a;)2. Hence 2a; ((l-a;)2C(a;))' 1-a;' and finally ( 1 1 C(a;) = 2--^ In : ^(1-a;)2 l-x (1 The first terms in the bracket corresponds to the convolution of two known sequences, so it contributes to Cn by n 1 n 1 £ (n-fc + l) = (n + l)£ n k=l k=l = (n + l)Hn -n=(n + l)(Hn+1 - 1), where Hn are the harmonic numbers. The result is Cn = 2(n + l)(Hn+1-l)-2n. Notice in 13.1.10, the very same recurrence is solved by different (more direct and simpler) tricks, without any differential equations involved. Since the harmonic numbers Hn are easily approximated by In n = J™ A dx, the analysis shows that the estimated time cost of quicksort is O (n log n). But it is easy to see that the worst time case is 0{n2) (in this version it happens if the list was already ordered properly - then L1 is always empty and the depth of the recursion is linear). 13.4.10. Exponential generating functions. Another approach to generating functions is to take the exponential ex = J2n>o h.x^ as tnePower series corresponding to the constant sequence (1,1,...). In general, this is called the exponential generating functions A(a;) = £a„ —. n>0 Here are a few elementary examples: (1,1,1,...), 1 e.g.f. 1-X (1,1,2,6, 24, In--—^4(0,1,1,2,6,24,...) 1 — x The slight modification of the definition (just forgetting about the ^ coefficient) is responsible for a very different behaviour, compared to the ordinary generating functions. The elementary operations are: 999 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS • Multiplication of A(x) by x yields the sequence with terms an = na„_i. • Differentiation of A(x) shifts the sequence to the left. • Integration of A(x) shifts the sequence to the right. • The product of functions A(x) and B(x) corresponds to the sequence with terms hn = J2k (fc)afc^n-fc, the binomial convolution of an and bn. As before, the exponential generating functions might become useful when resolving recurrences. Here is a simple example. Define the sequence by the initial conditions go = 0, gi = 1 and the formula gn = -2ngTl_i + ^ fc>0 gkg-n-k- At the first glance, seeing the binomial convolution suggests trying the exponential version. Write G(x) for the corresponding power series and proceed in the usual four steps again. Step 1. Complete the formula to accommodate the initial conditions: gn = -2ngn-i + ^ ( " \gkgn-k + [n = 1]. k=0 There seems to be a subtle point about go here, because the equation gives go = go2, with two solutions 0 and 1. The proper choice of go now yields the correct value for g\, but the right solution G is chosen later. Step 2. Multiply by and add over all n, to obtain G{x) = -2xG(x) + G{x)2 + x. Step 3. Now, solve the easy quadratic equation, arriving at G(x) = 1/2(1 + 2x± \/\+Ax2). The evaluation at zero provides go- Hence the right choice for go = 0 is the minus sign. Hence, , 1 + 2x - VI + 4a;2 G(x) =---. Step 4. Apply the generalized binomial theorem, to expand G(x) into a power series. Seel3.4.8. ^2fc-2 k v ~ \ k-1 V7! + 4a;2 = 1 + ^ \ ■ (-l)k~1 ■ 2 ■ ■x2k. k>l Further, since , V- xn 1 + 2a; - VI + 4a;2 G^ = 1^^ = -j-' n>0 52fc+i = 0 and 1 /2k — 9\ 92k = (-i)k ■i[k_1y m = (-i)k ■ (2*0! ■ ck. where C„ is the n-th Catalan number. 1: 1000 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.4.11. Cayley's formula. We conclude this chapter by a more complicated example. Cayley's formula computes the number of trees (i.e. graphs with unique paths between all pairs of vertices) on n given vertices, K(Kn) = nn~2. The notation refers to the equivalent formulation to find all spanning trees in the complete graph Kn. Equivalently, in how many ways can a tree be realized on n vertices with the vertices labeled. For example, already the path Pn can be realized in nl ways, so there must be very many of them. This result is proved with the help of the exponential generating functions. Write Tn = K(Kn) for the unknown values. It is easily shown that Ti = T2 = 1, T3 = 3, T4 = 16. For instance, consider trees on 4 vertices. Out of the (g) = 20 potential graphs with exactly three edges, those where the edges form a triangle must not be counted. There are (g) = 4 of them. In the diagram, there are four different possibilities, and each of them can be rotated into another three, hence the solution is 16. The recurrent formula can be obtained by fixing one of the vertices and add together the possibilities for all available degrees of this vertex. This suggests looking rather at the number Rn of the rooted trees. It is clear that Rn = nTn because there are n possibilities to place the root at each of the trees. Also, one can work with one fixed ordering of the vertices in Kn and multiply the result by nl in the end. In this way, go through the possible degrees m of the first vertex and for each m to find the different possibilities for the sizes ki... ,km of the corresponding subtrees. Obviously k\ + ■ ■ ■ + km = n — 1, all k{ > 0, and since the labeling of all vertices is fixed, all the orders of the subtrees must be considered as equivalent. Multiply the contribution by and similarly for each of the possibilities of the subtrees. The recurrent formula is m>0 fciH-----\-k7n=n—l Of course, Rq = 0, R\ = 1 and, already using the formula, R2 = 2ui = 2. Next, R3 = 3u2 + 3V = 9, RA = 4R3 + 2ARxR2 + 4i?i3 = 64, all as expected. The first step of the standard procedure is accomplished. Next, write R(x) = J2n>o ^nhxn- '^le sum is the coefficient at xn~x in the m-th power of the series R(x). Therefore, nl ^-^ ml m>0 1001 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS and hence have the required equation on R: R(x) =xen{x' . There are several ways, of solving such functional equa-maybe getitas Hons. Here iS One SUCh tOOl Without prOOf. application of residue theorem in chapter 9 Theorem (Lagrange inverse formula). Consider an analytic function f, /(0) = 0 and /'(0) 7^ 0. Then there (locally) is the analytic inverse of f, i.e w = g(z) = J2n>1 Qn^r and z = f(g(z)). Moreover, for all n > 0, Sn = ^0(^=1 (/h) )• In this case, solve the equation x = ^|^y, so that we may apply the latter theorem with g = R and f(w) = ^§-. It follows that 1 / \7 [xn]R(x) = -K"1] {-^—) 1 1 w n1 1 \w/ew) 71(71-1)! In particular, Rn = ti™-1 and so, T = __ = 77™-2. 1002 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS J. Additional exercises to the whole chapter 13.J.1. Determine the number of edges that must be added into i) the cycle graph Cn on n vertices, ii) the complete bipartite graph Km^n in order to obtain a complete graph. O 13.J.2. Let the vertices of KG be labeled 1, 2,..., 6 and let every edge {i,j} be assigned the integer [(i + j) mod 3] +1. How many maximum spanning trees are there in this graph? O 13.J.3. Let the vertices of K7 be labeled 1, 2,..., 7 and let every edge {i,j} be assigned the integer [(i + f) mod 3] +1. How many maximum spanning trees are there in this graph? O 13.J.4. Let the vertices of K5 be labeled 1, 2,..., 5 and let every edge {i, j} be assigned the integer: 1 if i + j is odd; 2 if i + j is even. How many maximum spanning trees are there in this graph? O 13.J.5. Let the vertices of K5 be labeled 1, 2,..., 5 and let every edge {i, j} be assigned the integer: 1 if i + j is odd; 2 if i + j is even. How many minimum spanning trees are there in this graph? O 13.J.6. Let the vertices of KG be labeled 1, 2,..., 6 and let every edge {i,j} be assigned the integer: 1 if i + j leaves remainder 1 upon division by 3; 2 if i + j leaves remainder 2 upon division by 3; 3 if i + j is divisible by 3; How many minimum spanning trees are there in this graph? O 13.J.7. Let the vertices of KG be labeled 1, 2,..., 6 and let every edge {i,j} be assigned the integer: 1 if i + j leaves remainder 1 upon division by 3; 2 if i + j leaves remainder 2 upon division by 3; 3 if i + j is divisible by 3; How many maximum spanning trees are there in this graph? O 13.J.8. Icosian Game - find a Hamiltonian cycle in the graph consisting of the vertices and edges of the regular dodecahedron. o Solution. See Wikipedia4. □ 13.J.9. Does there exist a Hamiltonian cycle in the Petersen graph? O Solution. No (however, when any one of the vertices is removed, the resulting graph is already Hamiltonian). This can be shown by enumerating all 3-regular Hamiltonian graphs on 10 vertices and finding a cycle of length less than 5 in each of them. □ 13.J.10. If G = (V, E) is Hamiltonian and 0 W C V, then G\W has at most \ W\ connected components. Give an example of a graph where the converse does not hold. O 13.J.11. Find a maximum flow and the corresponding minimum cut in the following weighted directed graph: a 7 b ^Wikipedia, Icosian game, http: / /en. wikipedia . org/wiki/ Icosian_game (as of Aug. 8, 2013, 13:24 GMT). o 1003 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS o 13.J.15. Find the generating functions of the following sequences: i) (1;2;1;4;1;8;1;16;...) ii) (1;1;0;1;1;0;1;1;...) iii) (l;-l;2;-2;3;-3;4;-4;...) O Solution. i) (1; 2; 1; 4; 1; 8; 1; 16;...) = (1; 0; 1; 0;...) + (0; 2; 0; 4; 0; 16;...). Thus, we find the generating functions for each sequence separately. As for the first one, consider the sequence (1,1,1,1,1,...). It is generated by the function The zeros can be inserted by substituting x2 for x. As for the second sequence, we proceed similarly, starting with (1; 2; 4; 8; 16;...), then multiplying by two, inserting zeros, and finally shifting to the right by multiplying by x. 1004 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 11) (1;1;0;1;1;0;1;1;...) = (1; 0; 0; 1; 0; 0; 1...) + (0; 1; 0; 0; 1; 0; 0; 1...). n 1 4- 2x l> 1-x2 + l-2x2 ill) (l-x2)2 + (1-^2)2 □ 13.J.16. Find the coefficient at x17 in (x3 + x4 + x5 + .. .)3. O Solution, (a;3 + x4 + x5 + ...)3 = (i-x)3 ~ x9 ' (i-x)3 ■ ^e w& thus looking for the coefficient at a;8 in jj~ys- This is equal to (^.i. e. 45. □ 13.J.17. There are 30 red, 40 blue, and 50 white balls in a box (balls of the same color are indistinguishable). In how many ways can we pick up 70 balls from the box? O Solution. Clearly, the number of possibilities is equal to the coefficient at xro in the expression (1 + X + ■ ■ ■ + x30)(l +X + ---+ x40)(l +X + ---+ x50). Mere rearrangements lead to (l+x+- ■ ■+x30)(l+x+- ■ -+x40)(l+x+- ■ -+x50) = M 1 v, ... (l-x31)(l-x41)(l-x51). (1 — x)a Applying the generalized binomial theorem, we obtain the solution (722) — f^1) — (31) — f^1). □ 13.J.18. What is the probability that a roll of 12 dice results in the sum of 30? Hint: Express the number of possibilities when the sum is 30. Consider [x + x2 + x3 + x4 + x5 + x6)12. O Solution. The resulting probability is the ratio of the number of favorable cases to the number of all cases. Clearly, the latter is 612. Now, let us compute the number of favorable cases. Consider the expression (x + x2 + x3 + x4 + x5 + x6)12. Then, the number of favorable cases is the coefficient at a;30. We have: (a;+a;2+a;3+a;4+a;5+-6'112 c(l -a;6)\12 12 fl-xe 1 — x J \ 1 — x Therefore, we are interested in the coefficient at a;18 in 1 T6\ 12 i 12 L^L) = (i _ i2a;6 + QQX12 _ 220^18) . _J_ 1 — X J 1 — X It follows from the generalized binomial theorem that the number of favorable cases is □ 13.J.19. A fruit grower wants to plant 25 new trees, having four species at his disposal. However, his wife insists that there be at most 1 walnut, at most 10 apples, at least 6 cherries, and at least 8 plums. In how many ways can he fulfill his beloved's wishes? Hint: We are interested in the coefficient at x25 in the expression (l + x)(l + x + ---+ x10)(x6 + xr + ...)(x8+x9+ ...). o Solution. (l+x)(l+x+- ■ ■+x10)(x6+x7+.. .)(xs+x9+. ..)= ^(l-^Xl-s11). (1 X) Therefore, we are looking for the coefficient at x11 in (1 — a;2 — x11...) ■ m}x\a , which is equal to (g4) — (g2) — (3). □ 1005 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS 13.J.20. Express the general term of the sequences denned by the following recurrences: i) ai = 3, a,2 = 5, an+2 = 4an+i — 3a„ for n = 1, 2, 3 .... ii) a0 = 0,ai = 1, an+2 = 2an+i — 4a„ for n = 0,1, 2, 3 .... o Solution. i) an = 2 + 3"-1. ii) a„ = ±v^3 ■ ((1 + - (1 - V7^)™). □ 13.J.21. Solve the recurrence where each term of the sequence (ao, ai, a2,...) is equal to the arithmetic mean of the preceding two terms. O Solution. an = k (—§)" + /. □ 13.J.22. Solve the recurrence an+2 = ^an+\an with the initial conditions a0 = 2, ai = 8. Create a new sequence bn = log2 an. O 13.J.23. Solve the recurrence given by En\ak . fc>0 Multiply both sides by ^- a« \E\, i. e., 24 > 35. 13.B.15. i) Yes. This follows immediately from the Kuratowski theorem (K5 has 10 edges and 1^3,3 has 9). ii) No. Consider K5 or 1^3,3. in) No. There are many counterexamples, for instance 1^3,3 with another vertex and an edge leading to it. iv) No. Consider K5. v) No. Consider 1^3,3. vi) The same as (ii). vii) No. Consider Cn. viii) No. Consider K5. ix) No. Consider Cn. 13.B.17. The first code does not represent a tree (it has a proper prefix with the same number of zeros and ones). There is a tree corresponding to the second code. 13.C.4. The procedure is incorrect. As a counterexample, consider a cycle graph with one edge of length two and all other edges of length one. 13.C.5. Applying any algorithm for finding a minimum spanning tree, we find out that the wanted length is 12154 (the spanning tree consists of edges LPe, LP, LNY, PeT, MCNY). 13.D.5. We find a maximum flow of size 15 and the cut [1,6], [1, 3], [2,4], [2,3] of the same capacity. 13.D.7. We know from the theory and the result of the above exercise that the minimum capacity of a cut is 9. There are more maximum flows in the network. For instance, we can set f{a) = 2, f(b) = 4, f(c) = 1, f(h) = 1, f(j) = 4, f(f) = 2, f(i) = 7, and f{v) = 0 for all other edges v of the graph. 13.E.11. _ 4! -4! _ 27 "T-35 13.E.12. |f. 13.1.4. i) We know from the exercise of subsection 13.4.3 that the generating function of the sequence (1, 2, 3,4,...) is ^^^2 ■ ii) Since we have (by the previous exercise as well) X o.g.f. (1 -x)2 mA^me pro derivaci tÁŠto funkce x \ 1 + X o.g.f. (0,1,2,3,...), [l-x)2l (1-x)3 (1-1,2-2,3-3,...). Let us emphasize that this problem could also be solved using the fact that ^frj^s iii) We have 1 £!4 (1,1,1,1...), o.gf. fn + 2\ 1-X 1 o.g.f. 1 - 2x 1 o.g.f. 1 - 2x2 X o.g.f, 1 - 2x2 (1,2,4,8,...), (1,0,2,0,4,0,...), (0,1,0,2,0,4,...), 1008 CHAPTER 13. COMBINATORIAL METHODS, GRAPHS, AND ALGORITHMS whence we get the result 1 + X o.g.f. !_2x2 • ■ (1,1,2,2,4,4,8,8,...). o.g.f. iv) We know from the above that f(x) = (l2, 22, 32,...), hence f{x) - (1 + 4x) o.g.f. 2 ^ 52 X2 , , , Substituting 2x3 for x, we obtain /(^)-a + ^3) ^ (9,0,0, 2. 16,0,0,4. 25,...). v) If we denote the result of the previous problem as F(x), then the result of this one is F(x)-x2F(x) + T^. 13.1.11. x/(l - 3x + x2) 13.J.1. i) The complete graph on n vertices has "("2~1'1 edges, the cycle graph on n vertices has n edges. Therefore, "("2~1'1 — n edges must be added to the cycle graph. ii) Similarly as above, we get the result (m+")(^1+"~1) _ m ■ n. 13.J.2. There are five edges whose value is 3: four of them lie on the cycle 23562 and the remaining one is the edge 14. Therefore, they form a disconnected subgraph of the complete graph, so the spanning tree must contain at least one edge of value 2. Thus, the total weight of a maximum spanning tree is at most 4-3 + 2 = 14. And indeed, there exist spanning trees with this weight. We select all the edges of value 3 except for one that lies on the mentioned cycle and connect the resulting components 2356 and 14 with any edge of value 2. There are four such edges. Altogether, there are 4 • 4 = 16 maximum spanning trees. 13.J.3. The edges of value 1 form a subgraph with two connected components, namely {1, 2,4, 5, 7} a {3, 6}. Further, there are six edges of value 2 that lead between these two components. Therefore, the total weight of a minimum spanning tree is 6 • 1 + 2 = 8. Moreover, there are exactly three cycles in the former component, each of length 4, and each of the 6 edges of this component belongs to exactly two of the three cycles. In order to obtain a tree from this component, we must omit two edges, which can be done in 6 • 4/2 ways. Altogether, we get 12 • 6 = 72 minimum spanning trees. 13.J.4. 18. 13.J.5. 12. 13.J.6. 16. 13.J.7. 16. 13.J.11. The minimum cut is given by the set {Z, A, E}. Its value is 32. 13.1.12. The minimum cut is given by the set {B, D, S}. Its value is 40. 13.J.13. The minimum cut is given by the set {F, S, D}. Its value is 29. 13.J.14. The minimum cut is given by the set {F, S}. Its value is 39. 1009 Index Lf-decomposition, 172 ^-neighborhood, 264 e-net, 457 fc-times (continuously) differentiable, 481 fc-combination, 13 fc-edge-connected, 634 A-linear, 107 fc-permutation, 13 fe-permutations, 15 fe-vertex-connected, 633 (Riemann) measurable, 373 Taylor expansion with a remainder, 345 characteristic polynomial of the mapping, 111 commutative group, 5 commutative rings, 5 degree of nilpotency, 163 factor vector space, 166 generated, 204 isomorphism, 99 linear combination, 93 linearly dependent, 93 linearly independent, 93 matrix for changing the basis, 102 matrix of the mapping f, 101 normal, 161 normal matrices, 162 parallelepiped, 209 root subspace, 165 scalars, 6 Sylvester criterion, 228 angle between vector subspaces, 215 A binary relation, 37 a critical point of order k, 349 a Laurent series centered at xo, 383 a representative, 40 Abel's theorem, 383 absolute value, 260 absolutely convergent, 292 absorbing, 152 addition inverse, 74 adjacency list, 629 adjacency matrix, 629 adjacent, 622 adjoint mapping, 159 adjoint matrix, 159 admissible vector x, 136 affine combinations of points, 207 affine coordinate system, 204 affine coordinates, 26 affine frame, 204 affine hull, 205 affine map, 211 affine mappings of the plane, 29 affine subspace, 204 algebra, 249 algebraic complement, 89 Algebraic multiplicity, 115, 163 algebraically adjoint matrix, 91 algorithm ElGamal, 614 alternating series, 294 an asymptote with a slope, 351 an inflective point, 350 analytic formula, 223 analytic function, 347 angle on M, 518 integral operators, 438 interior, 447 interior point, 264 interior point of a subset, 456 interpolation polynomial, 251 interval [xo, xi], 257 invariant subspace, 112 inverse, 77 inverse Fourier transform, 439 inverse function, 282 inverse matrix to the rotation matrix, 32 inverse relation, 39 inversion in permutationcr, 85 invertible matrix, 77 isolated point, 265, 456 isometry, 453 isomorphism, 625 Jacobi symbol, 600 Jacobi theorem, 228 Jacobian matrix of the mapping, 488 JarnAnk's algorithm, 652 Jordan blocks, 164 Jordan curve theorem, 645 Jordan decomposition, 164 Jordan measure, 374 k- combinations, 15 k-combinations with repetitions, 15 kernel of linear mapping, 99 kernel of the integral operator l, 438 Kronecker delta, 75 Kruskal's algorithm, 651 Kuratowski theorem, 646 l'Hospital's rule, 285 Lagrange algorithm, 226 Lagrange interpolation polynomial, 253 Lagrange's mean value theorem, 284 Laplace expansion, 90 Laplace transform, 443 law of cosines, 215 law of inertia, 227 law of quadratic reciprocity, 597, 598 leading principal minors, 89 leading principal submatrices, 89 leaf, 640 least common multiple, 567 left-sided limit, 268 Legendre polynomials, 418 Legendre symbol, 595 Leibniz criterion, 294 Leibniz rule, 280 length of a curve, 374 Leslie model for population growth, 146 level sets, 498 limes superior, 294 limit, 261, 267 limit point, 456 limit point of the set A, 263 limit points of a subset A C X, 447 line segment, 208 Linear algebra, 24 linear approximation, 255 linear combinations, 82 linear combinations of vectors, 24 linear difference equation of first order, 9 linear form r\ on U, 514 linear forms, 103 linear functionals, 435 linear mapping, 28 linear mapping (homomorphism), 99 linear programming problem, 134 linear restrictions, 134 linearly dependent, 82 linearly independent, 82 Lipchitz continuous, 459 Lipschitz continuity, 493 local parametrization of the manifold, 515 locally finite cover by parametrizations, 518 logarithmic function with base a, 276 logarithmic order of magnitude, 547 loop, 622 low pass filter, 428 lower bound, 259 lower Riemann sum, 366 Lucas's test, 609 Möbius function, 579 Möbius inversion formula, 579 Malthusian population growth, 9 mapping, 7 mapping from a set A to the set B, 37 Markov chain 151 Markov process, 151 mathematical analysis, 249 mathematical induction, 10,14 matrices, 28 maximum, 485 mean value, 374 member of the determinant 84 Menger's theorem, 634 Mersenne primes, 573 method of Lagrange multipliers, 501 metric, 445 metric on the graph, 635 metric space, 445 minimum, 485 minimum excluded value, 661 minimum spanning tree, 651 Minkowski inequality, 449 1013 INDEX minor, 89 minor complement, 89 modules over rings, 73 Monte Carlo methods, 24 morphism, 625 multidimensional interval, 503 multiple, 566 multiplicative function, 580 mutually perpendicular, 161 natural logarithm, 276 natural spline, 257 negative definite, 228 negative semidefinite, 228 negatively definite, 487 negatively semidefinite, 487 neighborhood of a point, 264 Newton integral, 359 nilpotent, 163 Nim, 658 non-homogeneous linear difference equations, 143 Norm, 148 norm, 155, 445 norm of the partition, 365 normal space, 500 normal vector, 498 normalised, 104, 153 normalized vectors, 31 normed vector space, 445 nowhere dense, 451 number tt, 297 number of solutions of a congruence, 588 objective function, 134 odd, 85 One-sided derivatives, 277 one-to-one, 38 onto, 37 open, 446 open e-neighbourhood, 446 open cover, 265,456 open intervals, 264 open set, 264 order of a modulo m, 582 order of an integer modulo m, 582 order of magnitude, 547 ordered field, 259 ordered trees, 643 ordering, 39 orientation, 36,218 orientation of the manifold, 517 oriented euclidean (point) space, 218 oriented manifold with boundary, 522 oriented manifolds, 518 oriented vector space, 218 origin of the affine coordinate system, 204 orthogonal, 104, 153 orthogonal basis, 104 orthogonal complement, 105, 154 orthogonal group, 157 orthogonal mapping, 112 orthogonal matrices, 113,157 orthogonal mother wavelet, 427 orthogonal system of functions, 419 orthogonally diagonalisable, 161 orthonormal basis, 153 orthonormal system of functions, 419 orthonormalised basis, 104 oscillates, 291 osculating circle, 353 outdegree deg_ v, 627 outer product, 220 outgoing, 622 pairwise coprime, 570 parametric description, 26, 205 parametrized by the length, 356 parent, 642 Parity of permutation, 85 Parseval equality, 156 Parseval's theorem, 420 partial derivatives of order k, 481 partial derivatives of the function /, 476 partial sums, 291 particular solution, 133 partition, 40 Pascal triangle, 15 path, 623, 625 path graph of length n, 623 path of length n, 625 perfect numbers, 573 periodic, 298 periodic function, 422 permutation, 12 permutation of the set X, 84 permutation with repetitions, 15 perpendicular, 31,104,161 perpendicular projection, 105 Perron-Frobenius theory, 147 Petersen graph, 624 phase frequency, 425 Picard's approximation, 532 planar graph, 645 plane trees, 643 Pocklington-Lehmer, 609 points, 203 polar basis, 226 polynomial order of magnitude, 547 polynomials, 8 positive definite, 227 positive direction, 32 positive matrix, 147 positive semidefinite, 228 positively definite, 163,487 positively semidefinite, 163, 487 power function xa, 275 power mean with exponent r, 288 power residue, 594 power series, 295 power series centered at x0, 300 predecessor, 642 preimage, 38 primality witness, 609 prime, 570 Primitive matrix, 147 primitive root, 584 principal matrices, 89 principal minors, 89 principle of inclusion-exclusion, 20 private key, 612, 614 probability function, 17 projection, 105 projective maps, 231 1014 INDEX Projective plane V2, 229 projective quadric, 235 projective transformations, 231 projectivization of a vector space, 230 proper, 267, 277 proper rational functions, 363 pseudoinverse matrix, 175 pseudoprime, 605 public key, 612, 614 pullback of the form r\ by ip, 515 QR decomposition, 175 quadratic forms, 223 quadrics, 223 Rabin cryptosystem, 613 radius of convergence, 295 range, 7 rank of the matrix, 82 rank of the quadratic form, 224 ratio of points, 212 rational functions, 274 rays, 208 real-valued functions of a real variable, 249 recurrence relation, 9 reduced residue system, 581 reflection through a line, 33 reflexive, 39 regular collineations, 231 regular square matrix, 77 residual capacity, 655 Riccati equation, 529 Riemann integral, 365 Riemann measurable, 504 Riemann measure of the set, 373 Riemann sum, 365 Riemann-Stieltjes integral sum, 386 right-continuous or left-continuous, 272 right-sided limit, 268 Rolle's theorem, 284 root of the polynomial, 251 root of the tree, 641 root vector, 165 rooted trees, 641 rotation or curl of the vector field, 524 rows of the matrix, 74 RSA, 612 Saarus rule, 84 sample space, 17 sampling interval r, 443 scalar functions, 7 Scalar product, 104 scalar product, 31,74,153 scale, 36 second-order partial derivatives, 481 self-adjoint, 160 self-adjoint matrices, 160 semiaxes, 225 semipath, 655 separated variables, 370 sequence an converges to a, 268 sequentially continuous, 369 series of functions, 295 set of solutions, 588 shift of the plane, 25 signature of a quadratic form, 227 simplex, 208, 209 Simpson's rule, 386 sine Fourier series, 427 singular values of the matrix, 173 sink, 654 size of a flow, 654 size of the vector, 153 smooth, 342 solution, 525 solvable, 136 source, 654 spanning subgraph, 626 spanning tree, 649 spectral radius of matrix A, 148 Spectrum of linear mapping, 115 spectrum of linear mapping, 163 Sprague-Grundy function, 661 Sprague-Grundy theorem, 662 square matrix, 75 square wave function, 425 Standard affine space A„, 202 standard basis of K", 98 standard maximalisation problem, 134 standard rninirnisation problem, 134 standard unitary space, 153 stationary point of the function, 485 stationary points, 344, 501 Steinitz exchange theorem, 97 Steinitz's theorem, 648 stochastic matrices, 152 stochastically independent, 20, 21 strategy, 658 strict extremum, 485 subdeterminant, 89 subgraphs, 626 submatrix of the matrix A, 89 subspaces, 208 successor, 642 sum of impartial games, 661 sum of subspaces, 95 supremum, 259 surface and volume of a solid of revolution, 376 surjective, 37 symmetric, 39, 86 symmetric bilinear form, 108 symmetric mappings, 160 symmetric matrices, 160 symmetrization, 622 tail of the edge, 622 tangent hyperplane, 480 tangent line, 255 tangent line to the curve c, 476 tangent plane, 480 tangent space, 500 tangent space TU, 514 tangent vector, 476, 513 Taylor expansion with a remainder, 345 Taylor polynomial of fc-th degree, 345 the backward difference, 358 the central difference, 358 the class of funcitons Ck (A), 342 the curvature of the curve, 356 the differential of function /, 352 the differention of the second order, 358 The domain of the relation, 37 1015 INDEX The Euclidean plane, 30 the existence of a neutral element, 5 the existence of a unit element, 5 the existence of an inverse element, 5 the forward difference, 358 the Frenet frame, 356 The fundamental theorem of algebra, 343 the graph of a function, 38 the indefinite integral, 358 the integral mean value theorem, 369 the lower Riemann integral, 369 the main normal, 356 the primitive function, 358 the second derivative, 342 the sources and sinks of the network, 653 the uniform continuity, 369 the upper Riemann integral, 369 the Weierstrass test, 382 topology, 264, 447 topology of the complex plane, 264 topology of the metric spaces, 447 topology of the real line, 264 torsion of the curve, 356 totally bounded, 457 Trace of mapping, 111 trace of matrix, 111 trail, 625 transformation, 489 transient, 152 transitive, 39 translation, 25, 203 transpose, 86 transposition, 84 trapezoidal rule, 385 tree, 640 triangle, 208, 623 triangle inequality, 155 trigonometric functions, 297 unbounded, 264 undirected graph, 622 uniform continuity, 368 uniformly bounded, 460 uniformly Cauchy, 380 unit decomposition subordinate to a locally finite cover, 519 unit matrix, 32, 75 unitary group, 157 Unitary isomorphism, 154 unitary mapping, 154 unitary matrices, 157 Unitary space, 153 universal formula, 667 unsaturated, 655 upper bound, 259 upper Riemann sum, 366 Vandermonde determinant, 252 variation, 387 vector, 72 vector field, 541 vector field X, 513 vector field X along the curve M, 513 vector functions, 354 vector functions of one real variable, 354 vector of restrictions, 134 Vector space, 92 vector subspace, 94 vectors, 24 vertices, 622 walk, 625 walk of length n, 625 wavelet mother function, 427 weak connectedness, 633 weakly connected, 639 weight, 636 zero curvature, 352 zero matrix, 74 zero measure, 388 zero vector, 24 1016 Based on the earlier textbook: Matematika drsně a svižně Jan Slovák, Martin Panák, Michal Bulant a kolektiv published by Masarykova univerzita in 2013 1. edition, 2013 500 copies Typography, IATja and more, Tomáš lanoušek Print: Tiskárna Knopp, Černčice 24,549 01 Nové Město nad Metují