Preprint typeset in JHEP style. - HYPER VERSION hep-th/ Brief Introduction to Differential Geometry and General Relativity for Cosmologists Abstract: This is the review material of differential geometry and general relativity for students of Lecture: Cosmology Contents 1. Definition of basic concepts 1 2. Manifolds 11 3. Curvature 29 4. Gravitation 36 1. Definition of basic concepts This material is devoted to the introduction of basic ideas and concepts that are needed for understanding of Cosmology. We begin with the context of vector in general relativity. We assign to each point p in space-time set of all possible vectors located at that point; this set is known as the tangent space at p, or Tp. We have to emphasize that these vectors are located at a single point. Let us illustrate this idea on simple picture: p manifold M Tp In other words we think of Tp as an abstract vector space for each point in spacetime. A (real) vector space is a collection of objects (“vectors”) which, roughly 1 speaking, can be added together and multiplied by real numbers in a linear way. Thus, for any two vectors V and W and real numbers a and b, we have (a + b)(V + W) = aV + bV + aW + bW . (1.1) Every vector space has an origin, i.e. a zero vector which functions as an identity element under vector addition. It is important to stress that a vector is well-defined geometric object, as is a vector field, defined as a set of vectors with exactly one at each point in space-time 1 . However in most of the physical application it is useful to decompose vectors into components with respect to some set of basis vectors. Recall that a basis is any set of vectors which both spans the vector space (any vector is a linear combination of basis vectors) and is linearly independent (no vector in the basis is a linear combination of other basis vectors). For any given vector space, there will be an infinite number of legitimate bases, but each basis will consist of the same number of vectors, known as the dimension of the space. Let us presume that at each tangent space we set up a basis of four vectors eµ, with µ ∈ {0, 1, 2, 3} as usual. Then any abstract vector A can be written as a linear combination of basis vectors: A = Aµ eµ . (1.2) The coefficients Aµ are the components of the vector A. The real vector is an abstract geometrical entity, while the components are just the coefficients of the basis vectors in some convenient basis. A standard example of a vector in space-time is the tangent vector to a curve. A parameterized curve or path through space-time is specified by the coordinates as a function of the parameter xµ (λ). The tangent vector V (λ) has components V µ = dxµ dλ . (1.3) Then we denote the entire vector as V = V µ eµ. Once we have set up a vector we can define, known as the dual vector space. The dual space is usually denoted by an asterisk, so that the dual space to the tangent space Tp is called the cotangent space and denoted T∗ p . The dual space is the space of all linear maps from the original vector space to the real numbers. In other words if ω ∈ T∗ p is a dual vector, then it acts as a map from Tp → R such that: ω(aV + bW) = aω(V ) + bω(W) ∈ R , (1.4) where V , W are vectors and a, b are real numbers. These maps form a vector space themselves; thus, if ω and η are dual vectors, we have (aω + bη)(V ) = aω(V ) + bη(V ) . (1.5) 1 Mathematically the set of all the tangent spaces of a manifold M is called the tangent bundle, T(M). 2 To proceed we introduce a set of basis dual vectors θν by demanding θν (eµ) = δν µ . (1.6) Then every dual vector can be written in terms of its components, which we label with lower indices: ω = ωµθν . (1.7) There is alternative notation when we see elements of Tp (what we have called vectors) referred to as contravariant vectors, and elements of T∗ p (what we have called dual vectors) referred to as covariant vectors. offended. Another name for dual vectors is one-forms, a somewhat mysterious designation which will become clearer soon. The component notation leads to a simple way of writing the action of a dual vector on a vector: ω(V ) = ωµV ν θµ (eν) = ωµV ν δµ ν = ωµV µ ∈ R . (1.8) The form of (1.35) also suggests that we can think of vectors as linear maps on dual vectors, by defining V (ω) ≡ ω(V ) = ωµV µ . (1.9) Therefore, the dual space to the dual vector space is the original vector space itself. (The set of all cotangent spaces over M is the cotangent bundle, T∗ (M).) In that case the action of a dual vector field on a vector field is not a single number, but a scalar (or just “function”) on space-time. In space-time the simplest example of a dual vector is the gradient of a scalar function, the set of partial derivatives with respect to the space-time coordinates, which we denote by “d”: dφ = dφ dxµ θµ . (1.10) A straightforward generalization of vectors and dual vectors is the notion of a tensor. Just as a dual vector is a linear map from vectors to R, a tensor T of type (or rank) (k, l) is a multilinear map from a collection of dual vectors and vectors to R: T : T∗ p × · · · × T∗ p × Tp × · · · × Tp → R (k times) (l times) (1.11) Here, “×” denotes the Cartesian product, so that for example Tp × Tp is the space of ordered pairs of vectors. Multilinearity means that the tensor acts linearly in each of its arguments; for instance, for a tensor of type (1, 1), we have T(aω + bη, cV + dW) = acT(ω, V ) + adT(ω, W) + bcT(η, V ) + bdT(η, W) . (1.12) 3 From this point of view, a scalar is a type (0, 0) tensor, a vector is a type (1, 0) tensor, and a dual vector is a type (0, 1) tensor. The space of all tensors of a fixed type (k, l) forms a vector space; they can be added together and multiplied by real numbers. In order to construct a basis for this space, we need to define a new operation known as the tensor product, denoted by ⊗. If T is a (k, l) tensor and S is a (m, n) tensor, we define a (k + m, l + n) tensor T ⊗ S by T ⊗ S(ω(1) , . . . , ω(k) , . . . , ω(k+m) , V (1) , . . . , V (l) , . . . , V (l+n) ) = T(ω(1) , . . . , ω(k) , V (1) , . . . , V (l) )S(ω(k+1) , . . . , ω(k+m) , V (l+1) , . . . , V (l+n) ) .(1.13) In other words, first act T on the appropriate set of dual vectors and vectors, and then act S on the remainder, and then multiply the answers. Note that, in general, T ⊗ S = S ⊗ T. Using these rules it is straightforward to construct a basis for the space of all (k, l) tensors. We simply take tensor products of basis vectors and dual vectors. Then this basis will consist of all tensors of the form eµ1 ⊗ · · · ⊗ eµk ⊗ θν1 ⊗ · · · ⊗ θνl . (1.14) In component notation we then write our arbitrary tensor as T = Tµ1···µk ν1···νl eµ1 ⊗ · · · ⊗ eµk ⊗ θν1 ⊗ · · · ⊗ θνl . (1.15) The order of the indices is obviously important, since the tensor need not act in the same way on its various arguments. Now let’s turn to some examples of tensors. The most familiar example of a (0, 2) tensor in flat Minkowski space-time is the metric, ηµν. The action of the metric on two vectors is so useful that it gets its own name, the inner product (or dot product): η(V, W) = ηµνV µ Wν = V · W . (1.16) The norm of a vector is defined to be inner product of the vector with itself; unlike in Euclidean space, this number is not positive definite: When ηµνV µ V ν < 0 then we call V µ to be time-like for ηµνV µ V ν = 0 V µ is null or light-like and for ηµνV µ V ν > 0 we call V µ as space-like. Another tensor is the Kronecker delta δµ ν , of type (1, 1), which you already know the components of. Related to this and the metric is the inverse metric ηµν , a type (2, 0) tensor defined as the inverse of the metric: ηµν ηνρ = ηρνηνµ = δρ µ . (1.17) There is also the Levi-Civita tensor, a (0, 4) tensor: µνρσ =    +1 if µνρσ is an even permutation of 0123 −1 if µνρσ is an odd permutation of 0123 0 otherwise . (1.18) 4 Here, a “permutation of 0123” is an ordering of the numbers 0, 1, 2, 3 which can be obtained by starting with 0123 and exchanging two of the digits; an even permutation is obtained by an even number of such exchanges, and an odd permutation is obtained by an odd number. Thus, for example, 0321 = −1. With some examples in hand we can now be a little more systematic about some properties of tensors. First consider the operation of contraction, which turns a (k, l) tensor into a (k − 1, l − 1) tensor. Contraction is defined as the sum over one upper and one lower index: Sµρ σ = Tµνρ σν . (1.19) It is important to stress that we can contract an upper index with a lower index (as opposed to two indices of the same type). It is also important to stress that the order of the indices matters, so that you can get different tensors by contracting in different ways; thus, Tµνρ σν = Tµρν σν (1.20) in general. The metric and inverse metric can be used to raise and lower indices on tensors. That is, given a tensor Tαβ γδ, we can with the help of the metric to define new tensors which we choose to denote by the same letter T: Tαβµ δ = ηµγ Tαβ γδ , Tµ β γδ = ηµαTαβ γδ , Tµν ρσ = ηµαηνβηργ ησδ Tαβ γδ , (1.21) Again, it is important that summing does not change the position of an index relative to other indices, and also that “free” indices (which are not summed over) must be the same on both sides of an equation, while “dummy” indices (which are summed over) only appear on one side. As an example, we can turn vectors and dual vectors into each other by raising and lowering indices: Vµ = ηµνV ν ωµ = ηµν ων . (1.22) Further we refer to a tensor as symmetric in any of its indices if it is unchanged under exchange of those indices. Thus, if Sµνρ = Sνµρ , (1.23) we say that Sµνρ is symmetric in its first two indices, while if Sµνρ = Sµρν = Sρµν = Sνµρ = Sνρµ = Sρνµ , (1.24) 5 we say that Sµνρ is symmetric in all three of its indices. Similarly, a tensor is antisymmetric (or “skew-symmetric”) in any of its indices if it changes sign when those indices are exchanged; thus, Aµνρ = −Aρνµ (1.25) means that Aµνρ is antisymmetric in its first and third indices (or just “antisymmetric in µ and ρ”). If a tensor is (anti-) symmetric in all of its indices, we refer to it as simply (anti-) symmetric (sometimes with the redundant modifier “completely”). As examples, the metric ηµν and the inverse metric ηµν are symmetric, while the Levi-Civita tensor µνρσ and the electromagnetic field strength tensor Fµν are antisymmetric. Given any tensor, we can symmetrize (or antisymmetrize) any number of its upper or lower indices. The symmetrization is to defined as the sum of all permutations of the relevant indices and divide by the number of terms: T(µ1µ2···µn)ρ σ = 1 n! (Tµ1µ2···µnρ σ + sum over permutations of indices µ1 · · · µn) , (1.26) while antisymmetrization comes from the alternating sum: T[µ1µ2···µn]ρ σ = 1 n! (Tµ1µ2···µnρ σ + alternating sum over permutations of indices µ1 · · · µn) . (1.27) where alternating sum we mean that permutations which are the result of an odd number of exchanges are given a minus sign, thus: T[µνρ]σ = 1 6 (Tµνρσ − Tµρνσ + Tρµνσ − Tνµρσ + Tνρµσ − Tρνµσ) . (1.28) Notice that round/square brackets denote symmetrization/antisymmetrization. There is a special class of tensors that play an important role in physics. These tensors are known as differential forms (or just “forms”). A differential p-form is a (0, p) tensor which is completely antisymmetric. Thus, scalars are automatically 0-forms, and dual vectors are automatically one-forms. We also have the 2-form Fµν and the 4-form µνρσ. The space of all p-forms is denoted Λp , and the space of all p-form fields over a manifold M is denoted Λp (M). The number of linearly independent p-forms on an n-dimensional vector space is n!/(p!(n − p)!). So at a point on a 4-dimensional space-time there is one linearly independent 0-form, four 1-forms, six 2-forms, four 3-forms, and one 4-form. There are no p-forms for p > n, since all of the components will automatically be zero by antisymmetry. Given a p-form A and a q-form B, we can form a (p + q)-form known as the wedge product A ∧ B by taking the antisymmetrized tensor product: (A ∧ B)µ1···µp+q = (p + q)! p! q! A[µ1···µp Bµp+1···µp+q] . (1.29) 6 For example the wedge product of two 1-forms is (A ∧ B)µν = 2A[µBν] = AµBν − AνBµ . (1.30) Using this definition we obtain an important relation A ∧ B = (−1)pq B ∧ A . (1.31) The exterior derivative “d” allows us to differentiate p-form fields to obtain (p + 1)-form fields. It is defined as an appropriately normalized antisymmetric partial derivative: (dA)µ1···µp+1 = (p + 1)∂[µ1 Aµ2···µp+1] . (1.32) The reason why the exterior derivative deserves special attention is that it is a tensor, even in curved space-times, unlike its cousin the partial derivative. Another interesting fact about exterior differentiation is that, for any form A, d(dA) = 0 , (1.33) which is often written d2 = 0. This identity is a consequence of the definition of d and the fact that partial derivatives commute, ∂α∂β = ∂β∂α (acting on anything). Let us now introduce another operation on differential forms known as Hodge duality. We define the “Hodge star operator” on an n-dimensional manifold as a map from p-forms to (n − p)-forms, (∗A)µ1···µn−p = 1 p! ν1···νp µ1···µn−p Aν1···νp , (1.34) mapping A to “A dual”. Unlike our other operations on forms, the Hodge dual depends on the metric of the manifold (which should be obvious, since we had to raise some indices on the Levi-Civita tensor in order to define (1.34). Applying the Hodge star twice returns either plus or minus the original form: ∗ ∗ A = (−1)s+p(n−p) A , (1.35) where s is the number of minus signs in the eigenvalues of the metric (for Minkowski space, s = 1). It is important to stress that the Hodge duality is defined on Manifold where the metric structure is defined. Now we give brief review of physics in Minkowski space-time that will be useful in what follows. let’s review how physics works in Minkowski space-time. Let us consider the world-line of a single particle. This is specified by a map R → M, where M is the manifold representing space-time; we usually think of the path as a parameterized curve xµ (λ). Note that the tangent vector to this path is dxµ /dλ (note that it depends on the parameterization). Given path is characterized by the norm of the tangent vector. If the tangent vector is timelike/null/spacelike 7 t x spacelike null timelike dx -- d x ( ) λ µ µ λ at some parameter value λ, we say that the path is timelike/null/spacelike at that point. A fundamental object of the theory is the line element, or infinitesimal interval: ds2 = ηµνdxµ dxν . (1.36) From this definition it is tempting to take the square root and integrate along a path to obtain a finite interval. But since ds2 need not be positive, we define different procedures for different cases. For space-like paths we define the path length ∆s = ηµν dxµ dλ dxν dλ dλ , (1.37) where the integral is taken over the path. For null paths the interval is zero, so no extra formula is required. For time-like paths we define the proper time ∆τ = −ηµν dxµ dλ dxν dλ dλ , (1.38) which will be positive. Let’s move from the consideration of paths in general to the paths of massive particles (which will always be time-like). Since the proper time is measured by a clock traveling on a time-like world-line, it is convenient to use τ as the parameter along the path. That is, we use (1.38) to compute τ(λ), which (if λ is a good parameter in the first place) we can invert to obtain λ(τ), after which we can think 8 of the path as xµ (τ). The tangent vector in this parameterization is known as the four-velocity, Uµ : Uµ = dxµ dτ . (1.39) Since dτ2 = −ηµνdxµ dxν as follows from the invariance of the line element, the four-velocity is automatically normalized: ηµνUµ Uν = −1 . (1.40) (It will always be negative, since we are only defining it for time-like trajectories. You could define an analogous vector for space-like paths as well; null paths give some extra problems since the norm is zero.) In the rest frame of a particle, its four-velocity has components Uµ = (1, 0, 0, 0). A related vector is the energy-momentum four-vector, defined by pµ = mUµ , (1.41) where m is the mass of the particle. The mass is a fixed quantity independent of inertial frame; what you may be used to thinking of as the “rest mass.” Although pµ provides a complete description of the energy and momentum of a particle, for extended systems it is necessary to go further and define the energy-momentum tensor (sometimes called the stress-energy tensor), Tµν . This is a symmetric (2, 0) tensor which tells us all we need to know about the energy-like aspects of a system: energy density, pressure, stress, and so forth. A general definition of Tµν is “the flux of four-momentum pµ across a surface of constant xν ”. In more details let us consider general form of matter that can be characterized as a fluid — a continuum of matter described by macroscopic quantities such as temperature, pressure, entropy, viscosity, etc. In general relativity all interesting types of matter can be thought of as perfect fluids, from stars to electromagnetic fields to the entire universe. An alternative definition of a perfect fluid to be one with no heat conduction and no viscosity or it can be defined as a fluid which looks isotropic in its rest frame; these two viewpoints turn out to be equivalent. For our use we can think about a perfect fluid as one which may be completely characterized by its pressure and density. The most simple example of perfect fluid is dust. Dust is defined as a collection of particles at rest with respect to each other, or alternatively as a perfect fluid with zero pressure. Since by definition all particles have an equal velocity in any fixed inertial frame, we can imagine a “four-velocity field” Uµ (x) defined all over spacetime. (Indeed, its components are the same at each point.) Define the number-flux four-vector to be Nµ = nUµ , (1.42) where n is the number density of the particles as measured in their rest frame. Then N0 is the number density of particles as measured in any other frame, while Ni is the 9 flux of particles in the xi direction. Now we can imagine that each of the particles have the same mass m. Then in the rest frame the energy density of the dust is given by ρ = nm . (1.43) It is important to stress that ρ only measures the energy density in the rest frame. However we want to know the energy density in other frames. To proceed note that both n and m are 0-components of four-vectors in their rest frame. In more details, Nµ = (n, 0, 0, 0) and pµ = (m, 0, 0, 0). Therefore ρ is the µ = 0, ν = 0 component of the tensor p ⊗ N as measured in its rest frame. Then it is natural to define the energy-momentum tensor for dust as: Tµν dust = pµ Nν = nmUµ Uν = ρUµ Uν , (1.44) where ρ is defined as the energy density in the rest frame. Let us nor return to definition of ”perfect” fluid. The natural definition is that it is matter that is “isotropic in its rest frame.” This in turn means that Tµν is diagonal — there is no net flux of any component of momentum in an orthogonal direction. Furthermore, due to the isotropy of the perfect fluid in its rest frame the nonzero space-like components must all be equal, T11 = T22 = T33 . This fact implies that the only two independent numbers are T00 and one of the Tii . It is convenient to call the first of these the energy density ρ, and the second the pressure p. Then the energy-momentum tensor of a perfect fluid therefore takes the following form in its rest frame: Tµν =       ρ 0 0 0 0 p 0 0 0 0 p 0 0 0 0 p       . (1.45) We would like, of course, a formula which is good in any frame. In other words we have to write it in covariant manner. For dust we had Tµν = ρUµ Uν , so we might begin by guessing (ρ + p)Uµ Uν , which gives       ρ + p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0       . (1.46) using the fact that in the rest frame Uµ = (1, 0, 0, 0). To get the answer we want we must therefore add       −p 0 0 0 0 p 0 0 0 0 p 0 0 0 0 p       . (1.47) 10 It turns out however that this matrix has an obvious covariant generalization in the form pηµν . Thus, the general form of the energy-momentum tensor for a perfect fluid is Tµν = (ρ + p)Uµ Uν + pηµν . (1.48) This is an important formula for applications such as stellar structure and cosmology. Further examples of the energy-momentum tensors are the energy-momentum tensors of electromagnetism and scalar field theory. We will see their form in the main text. It is important to stress that Tµν is conserved that means vanishing of the “di- vergence”: ∂µTµν = 0 . (1.49) This is a set of four equations, one for each value of ν. The ν = 0 equation corresponds to conservation of energy, while ∂µTµk = 0 expresses conservation of the kth component of the momentum. 2. Manifolds In this section we generalize the notion of the flat space to the case of curved spacetime. In order to properly understand how it works we have to learn a bit about the mathematics of curved spaces. As the first step we study the notion of manifold. Note that we will work in n dimensions. A manifold (or sometimes “differentiable manifold”) is very important concept in mathematics and physics. One is certainly familiar with the properties of ndimensional Euclidean space, Rn , the set of n-tuples (x1 , . . . , xn ). We can imagine manifold as a space which may be curved and have a complicated topology, but in local regions looks just like Rn . (Here by “looks like” we do not mean that the metric is the same, but only basic notions of analysis like open sets, functions, and coordinates.) In other words we can imagine that the entire manifold is constructed by smoothly sewing together these local regions. Let us now present more rigorous definition of manifold. The most elementary notion is that of a map between two sets. (We assume you know what a set is.) If we have two sets M and N, a map φ : M → N is a relationship which assigns, to each element of M, exactly one element of N. In other words A map is a simple generalization of a function. This fact can be demonstrated on following picture: ϕ M N 11 Given two maps φ : A → B and ψ : B → C, we define the composition ψ ◦ φ : A → C by the operation (ψ ◦ φ)(a) = ψ(φ(a)). So a ∈ A, φ(a) ∈ B, and thus (ψ ◦ φ)(a) ∈ C. In pictures: ψ ϕ A B C ϕ ψ A map φ is called one-to-one (or “injective”) if each element of N has at most one element of M mapped into it, and onto (or “surjective”) if each element of N has at least one element of M mapped into it. (If you think about it, a better name for “one-to-one” would be “two-to-two”.) The set M is known as the domain of the map φ, and the set of points in N which M gets mapped into is called the image of φ. For some subset U ⊂ N, the set of elements of M which get mapped to U is called the preimage of U under φ, or φ−1 (U). A map which is both one-to-one and onto is known as invertible (or “bijective”). In this case we can define the inverse map φ−1 : N → M by (φ−1 ◦ φ)(a) = a. (Note that the same symbol φ−1 is used for both the preimage and the inverse map, even though the former is always defined and the latter is only defined in some special cases.) Thus: -1 M N ϕ ϕ Let us now introduce the notion of continuity of a map between topological spaces (and thus manifolds). Luckily the precise mathematical definition is not needed for us so that we can give an intuitive meaning of continuity and differentiability of maps φ : Rm → Rn between Euclidean spaces are useful. A map from Rm to Rn takes an m-tuple (x1 , x2 , . . . , xm ) to an n-tuple (y1 , y2 , . . . , yn ), and can therefore be thought of as a collection of n functions φi of m variables: y1 = φ1 (x1 , x2 , . . . , xm ) y2 = φ2 (x1 , x2 , . . . , xm ) · · · yn = φn (x1 , x2 , . . . , xm ) . (2.1) We will refer to any one of these functions as Cp if it is continuous and p-times differentiable, and refer to the entire map φ : Rm → Rn as Cp if each of its component 12 functions are at least Cp . Thus a C0 map is continuous but not necessarily differentiable, while a C∞ map is continuous and can be differentiated as many times as you like. C∞ maps are sometimes called smooth and we will consider these functions only. Then we call two sets M and N diffeomorphic if there exists a C∞ map φ : M → N with a C∞ inverse φ−1 : N → M; the map φ is then called a diffeomorphism. For further purposes we recall the definition of chain rule. Let us presume that we have maps f : Rm → Rn and g : Rn → Rl , and their the composition (g ◦ f) : Rm → Rl . g f g R R R m n l f We can represent each space in terms of coordinates: xa on Rm , yb on Rn , and zc on Rl , where the indices range over the appropriate values. The chain rule relates the partial derivatives of the composition to the partial derivatives of the individual maps: ∂ ∂xa (g ◦ f)c = b ∂fb ∂xa ∂gc ∂yb . (2.2) This relation is usually written as ∂ ∂xa = b ∂yb ∂xa ∂ ∂yb . (2.3) It is important that for m = n the determinant of the matrix ∂yb /∂xa is called the Jacobian of the map, and the map is called invertible whenever the Jacobian is nonzero. Using these well known definitions we proceed to the definition of manifold. In order to do this we firstly have to define the notion of an open set, on which we can put coordinate systems, and then sew the open sets together in an appropriate way. We start with the notion of an open ball, which is the set of all points x in Rn such that |x−y| < r for some fixed y ∈ Rn and r ∈ R, where |x−y| = [ i(xi −yi )2 ]1/2 . Note that this is a strict inequality — the open ball is the interior of an n-sphere of radius r centered at y. An open set in Rn is a set constructed from an arbitrary (maybe infinite) union of open balls. In other words, V ⊂ Rn is open if, for any y ∈ V , there is an open ball centered at y which is completely inside V . Roughly speaking, an open set is the interior of some (n−1)-dimensional closed surface (or the 13 r y open ball union of several such interiors). By defining a notion of open sets, we have equipped Rn with a topology — in this case, the “standard metric topology.” A chart or coordinate system consists of a subset U of a set M, along with a one-to-one map φ : U → Rn , such that the image φ(U) is open in R. (Any map is onto its image, so the map φ : U → φ(U) is invertible.) We then can say that U is an open set in M. (We have thus induced a topology on M, although we will not explore this.) U U M ϕ( ) R n ϕ A C∞ atlas is an indexed collection of charts {(Uα, φα)} which satisfies two conditions: 1. The union of the Uα is equal to M. In other words the Uα cover M. 2. The charts are smoothly sewn together. More precisely, if two charts overlap, Uα ∩ Uβ = ∅, then the map (φα ◦ φ−1 β ) takes points in φβ(Uα ∩ Uβ) ⊂ Rn onto φα(Uα ∩ Uβ) ⊂ Rn , and all of these maps must be C∞ where they are defined. One can see this more clearly from pictures Now finally, C∞ n-dimensional manifold (or n-manifold for short) is simply a set M along with a “maximal atlas”, one that contains every possible compatible chart. The requirement that the atlas be maximal is so that two equivalent spaces equipped with different atlases don’t count as different manifolds. This definition captures in formal terms our notion of a set that looks locally like Rn . Note that this definition does not rely on an embedding of the manifold in some higher-dimensional Euclidean space. In other words it’s important to recognize that the manifold has an individual existence independent of any embedding. In other words there is no reason to believe, for example, that four-dimensional space-time is stuck in some larger space. 14 Uα ϕ ( ) ϕ ( ) ϕ ϕ ϕ ϕ ϕ ϕ β α α β α β Uβ Uα α β Uβ -1 -1 these maps are only defined on the shaded regions, and must be smooth there. M R R n n It is important to stress the necessity of charts and atlases: many manifolds cannot be covered with a single coordinate system. The fact that manifolds look locally like Rn , which is most clearly seen from the construction of coordinate charts, introduces the possibility of analysis on manifolds, as for example differentiation and integration. Let us consider two manifolds M and N of dimensions m and n, with coordinate charts φ on M and ψ on N. Imagine we have a function f : M → N, M Nf R Rψ ϕϕ-1 m f ϕ-1 n -1 ψ ψ Since M and N are spaces with no appriory definition of differentiation we cannot nonchalantly differentiate the map f, since we don’t know what such an operation means. But the coordinate charts allow us to construct the map (ψ ◦ f ◦ φ−1 ) : Rm → Rn . From definition this is just a map between Euclidean spaces, and all of the concepts of advanced calculus apply. For example f, thought of as an N-valued function on M, can be differentiated to obtain ∂f/∂xµ , where the xµ represent Rm . More precisely, using the definition above we obtain ∂f ∂xµ ≡ ∂ ∂xµ (ψ ◦ f ◦ φ−1 )(xµ ) . (2.4) 15 For most practical purposes the short hand notation will be the appropriate one. Now we can proceed to the construction of various kinds of structure on manifolds. As the first step we begin with the vectors and tangent spaces. Remember the notion of a tangent space— the set of all vectors at a single point in space-time. The natural definition how to introduce the tangent space is to use our intuitive knowledge that there are objects called “tangent vectors to curves” which belong in the tangent space. Then we can consider the set of all parameterized curves through p. In other words the space of all (nondegenerative) maps γ : R → M such that p is in the image of γ. Then it is natural to define the tangent space as simply the space of all tangent vectors to these curves at the point p. To do this let us consider some coordinate system xµ any curve through p defines an element of Rn specified by the n real numbers dxµ dλ (where λ is the parameter along the curve), but this map is clearly coordinate-dependent, which is not what we want. In order to find coordinate independent formulation we proceed as follows. We define F to be the space of all smooth functions on M (that is, C∞ maps f : M → R). Each curve through p defines an operator on this space, the directional derivative, which maps f → df dλ (at p). Then we claim that the tangent space Tp can be identified with the space of directional derivative operators along curves through p. It can be shown that the space of directional derivatives is a vector space, and that it is the vector space we want (it has the same dimensionality as M, yields a natural idea of a vector pointing along a certain direction, and so on). In fact let us search a basis for the space. Consider again a coordinate chart with coordinates xµ . Then there is an obvious set of n directional derivatives at p, namely the partial derivatives ∂µ at p. p 1 ρ 2 ρ x x2 1 Then the partial derivative operators {∂µ} at p form a basis for the tangent space Tp. In fact any directional derivative can be decomposed into a sum of real numbers times partial derivatives. To see this let us consider an n-manifold M, a coordinate chart φ : M → Rn , a curve γ : R → M, and a function f : M → R. Then if λ is the parameter along γ, we want to expand the vector/operator d dλ in terms of the partials ∂µ. Using the chain rule (2.2), we have d dλ f = d dλ (f ◦ γ) = d dλ [(f ◦ φ−1 ) ◦ (φ ◦ γ)] 16 = d(φ ◦ γ)µ dλ ∂(f ◦ φ−1 ) ∂xµ = dxµ dλ ∂µf . (2.5) The first line simply takes the informal expression on the left hand side and rewrites it as an honest derivative of the function (f ◦ γ) : R → R. The second line just comes from the definition of the inverse map φ−1 (and associativity of the operation of composition). The third line is the formal chain rule (2.2), and the last line is a return to the informal notation of the start. Since the function f was arbitrary, we have d dλ = dxµ dλ ∂µ . (2.6) Thus, the partials {∂µ} do indeed represent a good basis for the vector space of directional derivatives, which we can therefore safely identify with the tangent space. We can see this in more details on following picture: f -1ϕ ϕϕ-1 f M R R γ ϕ γ f γ x µ R n This particular basis (ˆe(µ) = ∂µ) is known as a coordinate basis for Tp. There is no reason why we are limited to coordinate bases when we consider tangent vectors; it is sometimes more convenient, for example, to use orthonormal bases of some sort. However, the coordinate basis is very simple and natural, and we will use it almost exclusively throughout the course. One of the advantages of this abstract definition of vectors is that the transformation law is immediate. Since the basis vectors are ˆe(µ) = ∂µ, the basis vectors in some new coordinate system xµ are given by the chain rule (2.3) as ∂µ = ∂xµ ∂xµ ∂µ . (2.7) We can get the transformation law for vector components by the same technique used in flat space, demanding the vector V = V µ ∂µ be unchanged by a change of 17 basis. We have V µ ∂µ = V µ ∂µ = V µ ∂xµ ∂xµ ∂µ , (2.8) and hence (since the matrix ∂xµ /∂xµ is the inverse of the matrix ∂xµ /∂xµ ), V µ = ∂xµ ∂xµ V µ (2.9) Since the basis vectors are usually not written explicitly, the rule (2.13) for transforming components is what we call the “vector transformation law.” As the next step we study the transformation properties of one-forms. Once again the cotangent space T∗ p is the set of linear maps ω : Tp → R. The canonical example of a one-form is the gradient of a function f, denoted df. In fact it turns out that the gradients of the coordinate functions xµ provide a natural basis for the cotangent space. dxµ (∂ν) = ∂xµ ∂xν = δµ ν . (2.10) Therefore the gradients {dxµ } are an appropriate set of basis one-forms; an arbitrary one-form is expanded into components as ω = ωµ dxµ . The transformation properties of basis dual vectors and components follow from what is by now the usual procedure. We obtain, for basis one-forms, dxµ = ∂xµ ∂xµ dxµ , (2.11) and for components, ωµ = ∂xµ ∂xµ ωµ . (2.12) We will usually write the components ωµ when we speak about a one-form ω. The transformation law for general tensors follows this same pattern of replacing the Lorentz transformation matrix used in flat space with a matrix representing more general coordinate transformations. A (k, l) tensor T can be expanded T = Tµ1···µk ν1···νl ∂µ1 ⊗ · · · ⊗ ∂µk ⊗ dxν1 ⊗ · · · ⊗ dxνl , (2.13) and under a coordinate transformation the components change according to Tµ1···µk ν1···νl = ∂xµ1 ∂xµ1 · · · ∂xµk ∂xµk ∂xν1 ∂xν1 · · · ∂xνl ∂xνl Tµ1···µk ν1···νl . (2.14) Let us demonstrate on following example how to transform forms in practise. Consider a symmetric (0, 2) tensor S on a 2-dimensional manifold, whose components in a coordinate system (x1 = x, x2 = y) are given by Sµν = x 0 0 1 . (2.15) 18 or equivalently S = Sµν(dxµ ⊗ dxν ) = x(dx)2 + (dy)2 , (2.16) where in the last line the tensor product symbols are suppressed for brevity. Now consider new coordinates x = x1/3 y = ex+y . (2.17) This leads directly to x = (x )3 y = ln(y ) − (x )3 dx = 3(x )2 dx dy = 1 y dy − 3(x )2 dx . (2.18) x = (x )3 y = ln(y ) − (x )3 dx = 3(x )2 dx dy = 1 y dy − 3(x )2 dx . (2.19) We insert these results to the expression above and remember that tensor products don’t commute, so dx dy = dy dx ): and we obtain S = 9(x )4 [1 + (x )3 ](dx )2 − 3 (x )2 y (dx dy + dy dx ) + 1 (y )2 (dy )2 , (2.20) or Sµ ν = 9(x )4 [1 + (x )3 ] −3(x )2 y −3(x )2 y 1 (y )2 . (2.21) Notice that it is still symmetric. We did not use the transformation law (2.19) directly, but doing so would have yielded the same result, as you can check. As the next step we study exterior derivative d. The exterior derivative operator d forms an antisymmetric (0, p + 1) tensor when acted on a p-form. So the exterior derivative is a legitimate tensor operator; it is not, however, an adequate substitute for the partial derivative, since it is only defined on forms. The metric tensor in curved space is denoted as gµν (while ηµν is reserved specifically for the Minkowski metric). gµν is symmetric (0, 2) tensor that is also nondegenerate, meaning that the determinant g = |gµν| doesn’t vanish. This allows us to define the inverse metric gµν via gµν gνσ = δµ σ . (2.22) 19 The symmetry of gµν implies that gµν is also symmetric. The metric and its inverse may be used to raise and lower indices on tensors. The natural object that is directly related to metric tensor is the line element ds2 = gµν dxµ dxν . (2.23) For example, we know that the Euclidean line element in a three-dimensional space with Cartesian coordinates is ds2 = (dx)2 + (dy)2 + (dz)2 . (2.24) We can now change to any coordinate system we choose. For example, in spherical coordinates we have x = r sin θ cos φ y = r sin θ sin φ z = r cos θ , (2.25) which leads directly to ds2 = dr2 + r2 dθ2 + r2 sin2 θ dφ2 . (2.26) Obviously the components of the metric look different than those in Cartesian coordinates, but all of the properties of the space remain unaltered. A good example of a space with curvature is the two-sphere, which can be thought of as the locus of points in R3 at distance 1 from the origin. The metric in the (θ, φ) coordinate system comes from setting r = 1 and dr = 0 in (2.32): ds2 = dθ2 + sin2 θ dφ2 . (2.27) The metric tensor contains all the information we need to describe the curvature of the manifold. In Minkowski space we can choose coordinates in which the components of the metric are constant; but it should be clear that the existence of curvature is more subtle than having the metric depend on the coordinates, since in the example above we showed how the metric in flat Euclidean space in spherical coordinates is a function of r and θ. Later, we shall see that constancy of the metric components is sufficient for a space to be flat, and in fact there always exists a coordinate system on any flat space in which the metric is constant. A useful characterization of the metric is obtained by putting gµν into its canonical form. In this form the metric components become gµν = diag (−1, −1, . . . , −1, +1, +1, . . . , +1, 0, 0, . . . , 0) , (2.28) where “diag” means a diagonal matrix with the given elements. If n is the dimension of the manifold, s is the number of +1’s in the canonical form, and t is the number 20 of −1’s, then s − t is the signature of the metric (the difference in the number of minus and plus signs), and s + t is the rank of the metric (the number of nonzero eigenvalues). If a metric is continuous, the rank and signature of the metric tensor field are the same at every point, and if the metric is nondegenerative the rank is equal to the dimension n. We will always deal with continuous, nondegenerative metrics. If all of the signs are positive (t = 0) the metric is called Euclidean or Riemannian (or just “positive definite”), while if there is a single minus (t = 1) it is called Lorentzian or pseudo-Riemannian, and any metric with some +1’s and some −1’s is called “indefinite.” (So the word “Euclidean” sometimes means that the space is flat, and sometimes doesn’t, but always means that the canonical form is strictly positive; the terminology is unfortunate but standard.) The space-times of interest in general relativity have Lorentzian metrics. It can be shown that it is always possible to put the metric into canonical form at some point p ∈ M, but in general it will only be possible at that single point, not in any neighborhood of p. We will now define the Levi-Civita symbol to be exactly this ˜µ1µ2···µn — that is, an object with n indices which has the components specified above in any coordinate system. This is called a “symbol,” of course, because it is not a tensor; it is defined not to change under coordinate transformations. On the other hand we can define the Levi-Civita tensor as µ1µ2···µn = |g| ˜µ1µ2···µn . (2.29) It is this tensor which is used in the definition of the Hodge dual, (1.87), which is otherwise unchanged when generalized to arbitrary manifolds. Since this is a real tensor, we can raise indices, etc. One final appearance of tensor densities is in integration on manifolds. In ordinary calculus on Rn the volume element dn x picks up a factor of the Jacobian under change of coordinates: dn x = ∂xµ ∂xµ dn x . (2.30) There is actually a beautiful explanation of this formula from the point of view of differential forms, which arises from the following fact: on an n-dimensional manifold, the integrand is properly understood as an n-form. To see how this works, we should make the identification dn x ↔ dx0 ∧ · · · ∧ dxn−1 . (2.31) The expression on the right hand side can be misleading, because it looks like a tensor (an n-form, actually) but is really a density. Certainly if we have two functions f and g on M, then df and dg are one-forms, and df ∧ dg is a two-form. To see this 21 let us see how (2.31) changes under coordinate transformations. First notice that the definition of the wedge product allows us to write dx0 ∧ · · · ∧ dxn−1 = 1 n! ˜µ1···µn dxµ1 ∧ · · · ∧ dxµn , (2.32) since both the wedge product and the Levi-Civita symbol are completely antisymmetric. Under a coordinate transformation ˜µ1···µn stays the same while the one-forms change according to their transformation rules leading to ˜µ1···µn dxµ1 ∧ · · · ∧ dxµn = ˜µ1···µn ∂xµ1 ∂xµ1 · · · ∂xµn ∂xµn dxµ1 ∧ · · · ∧ dxµn = ∂xµ ∂xµ ˜µ1···µn dxµ1 ∧ · · · ∧ dxµn . (2.33) Multiplying by the Jacobian on both sides recovers (2.30). It is clear that the naive volume element dn x transforms as a density, not a tensor, but it is straightforward to construct an invariant volume element by multiplying by |g|: |g | dx0 ∧ · · · ∧ dx(n−1) = |g| dx0 ∧ · · · ∧ dxn−1 , (2.34) which is of course just (n!)−1 µ1···µn dxµ1 ∧· · ·∧ dxµn . In the interest of simplicity we will usually write the volume element as |g| dn x, rather than as the explicit wedge product |g| dx0 ∧ · · · ∧ dxn−1 ; it will be enough to keep in mind that it’s supposed to be an n-form. We finish this section with the introduction Stokes’s theorem. Imagine that we have an n-manifold M with boundary ∂M, and an (n−1)-form ω on M. (We haven’t discussed manifolds with boundaries, but the idea is obvious; M could for instance be the interior of an (n − 1)-dimensional closed surface ∂M.) Then dω is an n-form, which can be integrated over M, while ω itself can be integrated over ∂M. Stokes’s theorem is then M dω = ∂M ω . (2.35) You can convince yourself that different special cases of this theorem include not only the fundamental theorem of calculus, but also the theorems of Green, Gauss, and Stokes, familiar from vector calculus in three dimensions. Now we introduce a few extra mathematical techniques. Let us now discuss the problem how different maps between two manifolds M and N carry along tensor fields from one manifold to another. We therefore consider two manifolds M and N, possibly of different dimension, with coordinate systems xµ and yα , respectively. We imagine that we have a map φ : M → N and a function f : N → R. It is obvious that we can compose φ with f to construct a map (f ◦ φ) : M → R, which is simply a function on M. Such a construction is sufficiently useful that it gets its own name; we define the pullback of f by φ, denoted φ∗f, by φ∗f = (f ◦ φ) . (2.36) 22 M x f = f f φ R R R m n µ yα N * φ φ The name makes sense, since we think of φ∗ as “pulling back” the function f from N to M. We can pull functions back, but we cannot push them forward. If we have a function g : M → R, there is no way we can compose g with φ to create a function on N; the arrows don’t fit together correctly. But recall that a vector can be thought of as a derivative operator that maps smooth functions to real numbers. This allows us to define the pushforward of a vector; if V (p) is a vector at a point p on M, we define the pushforward vector φ∗ V at the point φ(p) on N by giving its action on functions on N: (φ∗ V )(f) = V (φ∗f) . (2.37) So to push forward a vector field we say “the action of φ∗ V on any function is simply the action of V on the pullback of that function.” Let us now give more concrete description. We know that a basis for vectors on M is given by the set of partial derivatives ∂µ = ∂ ∂xµ , and a basis on N is given by the set of partial derivatives ∂α = ∂ ∂yα . Therefore we would like to relate the components of V = V µ ∂µ to those of (φ∗ V ) = (φ∗ V )α ∂α. We can find the sought-after relation by applying the pushed-forward vector to a test function and using the chain rule (2.3): (φ∗ V )α ∂αf = V µ ∂µ(φ∗f) = V µ ∂µ(f ◦ φ) = V µ ∂yα ∂xµ ∂αf . (2.38) This result shows that the pushforward operation φ∗ can be considered as a matrix operator, (φ∗ V )α = (φ∗ )α µV µ , with the matrix being given by (φ∗ )α µ = ∂yα ∂xµ . (2.39) 23 The behavior of a vector under a pushforward has similar form as the vector transformation law under change of coordinates. However generally, when µ and α have different allowed values, then there is no reason for the matrix ∂yα /∂xµ to be invert- ible. Let us now discuss transformation properties of one-forms. Since one-forms are dual to vectors then it is natural to expect that one-forms can be pulled back (but not in general pushed forward). In fact, we remember that one-forms are linear maps from vectors to the real numbers. The pullback φ∗ω of a one-form ω on N can therefore be defined by its action on a vector V on M, by equating it with the action of ω itself on the pushforward of V : (φ∗ω)(V ) = ω(φ∗ V ) . (2.40) Once again, there is a matrix description of the pullback operator on forms, (φ∗ω)µ = (φ∗)µ α ωα, which we can derive using the chain rule. It is given by (φ∗)µ α = ∂yα ∂xµ . (2.41) That is, it is the same matrix as the pushforward (2.39). Let us now consider (0, l) tensor — one with l lower indices and no upper ones. Recall that it is a linear map from the direct product of l vectors to R. Then it is natural to pull back not only one-forms, but tensors with an arbitrary number of lower indices. The definition is simply the action of the original tensor on the pushed-forward vectors: (φ∗T)(V (1) , V (2) , . . . , V (l) ) = T(φ∗ V (1) , φ∗ V (2) , . . . , φ∗ V (l) ) , (2.42) where Tα1···αl is a (0, l) tensor on N. We can similarly push forward any (k, 0) tensor Sµ1···µk by acting it on pulled-back one-forms: (φ∗ S)(ω(1) , ω(2) , . . . , ω(k) ) = S(φ∗ω(1) , φ∗ω(2) , . . . , φ∗ω(k) ) . (2.43) Fortunately, the matrix representations of the pushforward (2.39) and pullback (2.41) extend to the higher-rank tensors simply by assigning one matrix to each index; thus, for the pullback of a (0, l) tensor, we have (φ∗T)µ1···µl = ∂yα1 ∂xµ1 · · · ∂yαl ∂xµl Tα1···αl , (2.44) while for the pushforward of a (k, 0) tensor we have (φ∗ S)α1···αk = ∂yα1 ∂xµ1 · · · ∂yαk ∂xµk Sµ1···µk . (2.45) Our complete picture is therefore: 24 φ* φ* ( ) ( )k 0 ( )k 0 l l 0 ( )0 φ NM It is important to stress that tensors with both upper and lower indices can generally be neither pushed forward nor pulled back. On the other hand it is important to stress that if φ is invertible (and both φ and φ−1 are smooth, which we always implicitly assume), then it defines a diffeomorphism between M and N. In this case M and N are the same abstract manifold. The of diffeomorphism is that we can use both φ and φ−1 to move tensors from M to N; this will allow us to define the pushforward and pullback of arbitrary tensors. Specifically, for a (k, l) tensor field Tµ1···µk ν1···µl on M, we define the pushforward by (φ∗ T)(ω(1) , . . . , ω(k) , V (1) , . . . , V (l) ) = T(φ∗ω(1) , . . . , φ∗ω(k) , [φ−1 ]∗ V (1) , . . . , [φ−1 ]∗ V (l) ) , (2.46) where the ω(i) ’s are one-forms on N and the V (i) ’s are vectors on N. In components this becomes (φ∗ T)α1···αk β1···βl = ∂yα1 ∂xµ1 · · · ∂yαk ∂xµk ∂xν1 ∂yβ1 · · · ∂xνl ∂yβl Tµ1···µk ν1···νl . (2.47) Since φ is invertible the inverse matrix ∂xν ∂yβ is well defined. Now we are ready to explain the relationship between diffeomorphism and coordinate transformations. We can interpret diffeomorphism as “active coordinate transformations”, while traditional coordinate transformations as “passive.” Since a diffeomorphism allows us to pull back and push forward arbitrary tensors, it provides another way of comparing tensors at different points on a manifold. Let us consider a diffeomorphism φ : M → M and a tensor field Tµ1···µk ν1···µl (x), we can form the difference between the value of the tensor at some point p and φ∗[Tµ1···µk ν1···µl (φ(p))], its value at φ(p) pulled back to p. This fact suggests that we could define another kind of derivative operator on tensor fields that characterized the rate of change of the tensor as it changes under the diffeomorphism. In order to do this we have to introduce a one-parameter family of diffeomorphism, φt. This family can be thought of as a smooth map R × M → M, such that for each t ∈ R 25 φt is a diffeomorphism and φs ◦ φt = φs+t. Note that this last condition implies that φ0 is the identity map. It can be show that one-parameter families of diffeomorphisms arises from vector fields (and vice-versa). Let us study what happens to the point p under the entire family φt. In fact it is clear that it describes a curve in M. Further the same thing will be true of every point on M and these curves fill the manifold (although there can be degeneracies where the diffeomorphisms have fixed points). Then we define a vector field V µ (x) as the set of tangent vectors to each of these curves at every point, evaluated at t = 0. We can proceed in opposite direction. We can define a one-parameter family of diffeomorphisms from any vector field. Given a vector field V µ (x), we define the integral curves of the vector field to be those curves xµ (t) which solve dxµ dt = V µ . (2.48) This equation should be interpreted in the opposite sense from our usual way — we are given the vectors, from which we define the curves. The diffeomorphisms φt represents “flow down the integral curves,” and the associated vector field is defined as the generator of the diffeomorphism. Let us consider vector field V µ (x) so that we have a family of diffeomorphisms parameterized by t. Then we can ask the question how fast a tensor changes as we travel down the integral curves. For each t we can define this change as ∆tTµ1···µk ν1···µl (p) = φt∗[Tµ1···µk ν1···µl (φt(p))] − Tµ1···µk ν1···µl (p) . (2.49) Note that both terms on the right hand side are tensors at p. T[ (p)]φt (p) p [T( (p))]φt tφ * T(p) x (t)µ φt M We then define the Lie derivative of the tensor along the vector field as £V Tµ1···µk ν1···µl = lim t→0 ∆tTµ1···µk ν1···µl t . (2.50) 26 The Lie derivative is a map from (k, l) tensor fields to (k, l) tensor fields, which is manifestly independent of coordinates. It is clear this derivative is linear, £V (aT + bS) = a£V T + b£V S , (2.51) and obeys the Leibniz rule, £V (T ⊗ S) = (£V T) ⊗ S + T ⊗ (£V S) , (2.52) where S and T are tensors and a and b are constants. It is important to stress that definition of Lie derivative does not depend on the metric structure of the manifold. For functions it reduces to the ordinary derivative on functions, £V f = V (f) = V µ ∂µf . (2.53) It can be shown that in components the Lie derivative takes the form £V Uµ = [V, U]µ . (2.54) where [V, U]µ = V ν ∂νUµ − Uν ∂νV µ (2.55) From this definition we immediately see that £V S = −£W V . It is because of (5.27) that the commutator is sometimes called the “Lie bracket.” To derive the action of £V on a one-form ωµ, begin by considering the action on the scalar ωµUµ for an arbitrary vector field Uµ . First use the fact that the Lie derivative with respect to a vector field reduces to the action of the vector itself when applied to a scalar: £V (ωµUµ ) = V (ωµUµ ) = V ν ∂ν(ωµUµ ) = V ν (∂νωµ)Uµ + V ν ωµ(∂νUµ ) . (2.56) Then use the Leibniz rule on the original scalar: £V (ωµUµ ) = (£V ω)µUµ + ωµ(£V U)µ = (£V ω)µUµ + ωµV ν ∂νUµ − ωµUν ∂νV µ . (2.57) Setting these expressions equal to each other and requiring that equality hold for arbitrary Uµ , we see that £V ωµ = V ν ∂νωµ + (∂µV ν )ων . (2.58) By a similar procedure we can define the Lie derivative of an arbitrary tensor field. The answer can be written £V Tµ1µ2···µk ν1ν2···νl = V σ ∂σTµ1µ2···µk ν1ν2···νl 27 −(∂λV µ1 )Tλµ2···µk ν1ν2···νl − (∂λV µ2 )Tµ1λ···µk ν1ν2···νl − · · · +(∂ν1 V λ )Tµ1µ2···µk λν2···νl + (∂ν2 V λ )Tµ1µ2···µk ν1λ···νl + · · · .(2.59) Let us now discuss another aspects of the diffeomorphism. There is one more use to which we will put the machinery we have set up in this section: symmetries of tensors. We say that a diffeomorphism φ is a symmetry of some tensor T if the tensor is invariant after being pulled back under φ: φ∗T = T . (2.60) It is important to stress that even if discrete symmetry (invariance under reflections) exists it is more common to have a one-parameter family of symmetries φt. If the family is generated by a vector field V µ (x), then (5.39) leads to £V T = 0 . (2.61) It can be shown that if T is symmetric under some one-parameter family of diffeomorphisms, it is possible to find a coordinate system in which the components of T are all independent of one of the coordinates (the integral curve coordinate of the vector field). The most important symmetries are those of the metric, for which φ∗gµν = gµν . (2.62) A diffeomorphism of this type is called an isometry. If a one-parameter family of isometries is generated by a vector field V µ (x), then V µ is known as a Killing vector field. The condition that V µ be a Killing vector is thus £V gµν = 0 , (2.63) or using the relation between Lie derivative and connection (µVν) = 0 . (2.64) This last version is Killing’s equation. Then if a space-time has a Killing vector, then we can find a coordinate system in which the metric is independent of one of the coordinates. It is important to stress that Killing vectors imply conserved quantities associated with the motion of free particles. To see this let us consider a motion of a free particle. Then xµ (λ) is a geodesic with tangent vector Uµ = dxµ dλ . Let us denote V µ as a Killing vector. Then we have Uν ν(VµUµ ) = Uν Uµ νVµ + VµUν νUµ = 0 , (2.65) 28 where the first term vanishes from Killing’s equation and the second from the fact that xµ (λ) is a geodesic. Consequently the quantity VµUµ is conserved along the particle’s world-line. More physically, by definition the metric is unchanging along the direction of the Killing vector. Then we can say that a free particle will not feel any “forces” in this direction, and the component of its momentum in that direction will consequently be conserved. For next purposes we also give a definition of the concept of a space with maximal symmetry. The maximally symmetric space is one which possesses the largest possible number of Killing vectors, which on an n-dimensional manifold is n(n+1)/2. It is possible that it may not be simple to actually solve Killing’s equation in any given space-time. On the other hand it is frequently possible to write down some Killing vectors by inspection. For example in R2 with metric ds2 = dx2 + dy2 , independence of the metric components with respect to x and y immediately yields two Killing vectors: Xµ = (1, 0) , Y µ = (0, 1) . (2.66) These clearly represent the two translations. The one rotation would correspond to the vector R = ∂/∂θ if we were in polar coordinates; in Cartesian coordinates this becomes Rµ = (−y, x) . (2.67) 3. Curvature We know that we can define functions, take their derivatives, consider parameterized paths, set up tensors, and so on any manifolds. On the other hand other concepts, such as the volume of a region or the length of a path, required some additional piece of structure, namely the introduction of a metric. Then we can show how the existence of a metric implies a certain connection, whose curvature may be thought of as that of the metric. The connection has to be defined when we attempt to address the problem that the partial derivative is not a good tensor operator. Our goal is to introduce the covariant derivative; that is, an operator which reduces to the partial derivative in flat space with Cartesian coordinates, but transforms as a tensor on an arbitrary manifold. In flat space in Cartesian coordinates, the partial derivative operator ∂µ is a map from (k, l) tensor fields to (k, l+1) tensor fields, which acts linearly on its arguments and obeys the Leibniz rule on tensor products. All of this continues to be true in the more general situation we would now like to consider, but the map provided by the partial derivative depends on the coordinate system used. We would therefore 29 like to define a covariant derivative operator to perform the functions of the partial derivative, but in a way independent of coordinates. We therefore require that be a map from (k, l) tensor fields to (k, l + 1) tensor fields which has these two properties: 1. linearity: (T + S) = T + S ; 2. Leibniz (product) rule: (T ⊗ S) = ( T) ⊗ S + T ⊗ ( S) . It can be shown that takes the form µV ν = ∂µV ν + Γν µλV λ , (3.1) where (Γµ)ρ σ (an n × n matrix known as connection coefficients, where n is the dimensionality of the manifold, for each µ). Further, we also demand that µV ν transform as (1, 1) tensor µ V ν = ∂xµ ∂xµ ∂xν ∂xν µV ν . (3.2) This requirement implies following transformation rules for metric coefficients Γν µ λ = ∂xµ ∂xµ ∂xλ ∂xλ ∂xν ∂xν Γν µλ − ∂xµ ∂xµ ∂xλ ∂xλ ∂2 xν ∂xµ∂xλ . (3.3) which of course is not the tensor transformation law; the second term on the right spoils it. The covariant derivative of a one-form takes the form µων = ∂µων − Γλ µνωλ . (3.4) Then it is clear that the connection coefficients encode all of the information necessary to take the covariant derivative of a tensor of arbitrary rank. The formula is quite straightforward; for each upper index you introduce a term with a single +Γ, and for each lower index a term with a single −Γ: σTµ1µ2···µk ν1ν2···νl = ∂σTµ1µ2···µk ν1ν2···νl +Γµ1 σλ Tλµ2···µk ν1ν2···νl + Γµ2 σλ Tµ1λ···µk ν1ν2···νl + · · · −Γλ σν1 Tµ1µ2···µk λν2···νl − Γλ σν2 Tµ1µ2···µk ν1λ···νl − · · · . (3.5) This is the general expression for the covariant derivative. It turns out that in order to define unique connection on a manifold with a metric gµν we have to introduce two additional properties: • torsion-free: Γλ µν = Γλ (µν). • metric compatibility: ρgµν = 0. 30 This requirement implies that we can express connection coefficients as functions of metric Γσ µν = 1 2 gσρ (∂µgνρ + ∂νgρµ − ∂ρgµν) . (3.6) This connection we have derived from the metric is the one on which conventional general relativity is based . It is known as Christoffel connection or as the LeviCivita connection. Let us mention once again that the exterior derivative is a well-defined tensor in the absence of any connection. Then if we use a symmetric (torsion-free) connection, the exterior derivative (defined to be the antisymmetrized partial derivative) happens to be equal to the antisymmetrized covariant derivative: [µων] = ∂[µων] − Γλ [µν]ωλ = ∂[µων] . (3.7) Now we define the notion of parallel transport that corresponds to the motion of a vector along a path, keeping constant all the while. The parallel transport is defined whenever we have a connection. In fact the crucial difference between flat and curved spaces is that, in a curved space, the result of parallel transporting a vector from one point to another will depend on the path taken between the points. More precisely, let us have a curve xµ (λ) and define the covariant derivative along the path to be given by an operator D dλ = dxµ dλ µ . (3.8) We then define parallel transport of the tensor T along the path xµ (λ) to be the requirement that, along the path, D dλ T µ1µ2···µk ν1ν2···νl ≡ dxσ dλ σTµ1µ2···µk ν1ν2···νl = 0 . (3.9) This is a well-defined tensor equation, since both the tangent vector dxµ /dλ and the covariant derivative T are tensors. This is known as the equation of parallel transport. For a vector it takes the form d dλ V µ + Γµ σρ dxσ dλ V ρ = 0 . (3.10) It is clear that the notion of parallel transport depends on the connection, and different connections lead to different answers. On the other hand since we consider the connection that is metric-compatible we obtain that the metric is always parallel transported D dλ gµν = dxσ dλ σgµν = 0 . (3.11) Then we show that the inner product of two parallel-transported vectors is preserved. In fact, if V µ and Wν are parallel-transported along a curve xσ (λ), we have D dλ (gµνV µ Wν ) = D dλ gµν V µ Wν + gµν D dλ V µ Wν + gµνV µ D dλ Wν 31 = 0 . (3.12) This means that parallel transport with respect to a metric-compatible connection preserves the norm of vectors, the sense of orthogonality, and so on. Now we are going to discuss the geodesics. To begin with we recall that the tangent vector to a path xµ (λ) is dxµ dλ . The condition that it be parallel transported is thus D dλ dxµ dλ = 0 , (3.13) or alternatively d2 xµ dλ2 + Γµ ρσ dxρ dλ dxσ dλ = 0 . (3.14) This is the familiar geodesic equation. It is important to stress that geodesics in general relativity are the paths followed by unaccelerated particles. To see this note that we can think about the geodesic equation as the generalization of Newton’s law f = ma for the case f = 0. In fact it can be shown that the equation of motion for a particle of mass m and charge q in general relativity takes the form d2 xµ dτ2 + Γµ ρσ dxρ dτ dxσ dτ = q m Fµ ν dxν dτ . (3.15) An important property of geodesics in a spacetime with Lorentzian metric is that the character (timelike/null/spacelike) of the geodesic (relative to a metriccompatible connection) never changes. This follows from the fact that parallel transport preserves inner products. Consequently the character of the curve does not change since it is determined by the inner product of the tangent vector with itself. There are also null geodesics, which satisfy the same equation, except that the proper time cannot be used as a parameter (some set of allowed parameters will exist, related to each other by linear transformations). We can also find some interesting application of geodetics. Namely the geodetics can be used for mapping the tangent space at a point p to a local neighborhood of p. To begin with notice that any geodesic xµ (λ) which passes through point p can be specified by its behavior at p. We parameterize the geodetics with parameter λ and choose to be equal to 0 for λ(p) = 0. Then the tangent vector at p is dxµ dλ (λ = 0) = kµ , (3.16) where kµ some vector at p (some element of Tp). Then there will be a point on the manifold M which lies on this geodesic where the parameter has the value λ = 1. We define the exponential map at p, expp : Tp → M, via expp(kµ ) = xν (λ = 1) , (3.17) 32 M x ( ) k T p µ p λ λ=1 ν where xν (λ) solves the geodesic equation subject to (3.60). For tangent vectors kµ near the zero vector, this map will be well-defined, and in fact invertible. Then in the neighborhood of p that is determined by validity of the map on this set of tangent vectors we can say that the tangent vectors themselves define a coordinate system on the manifold. In this coordinate system, any geodesic through p is expressed as xµ (λ) = λkµ , (3.18) for some appropriate vector kµ . Now we will study curvature that is related to the Riemann tensor. that is (1, 3) that is antisymmetric in the last two indices: Rρ σµν = −Rρ σνµ . (3.19) We can show that the Riemann tensor is related to covariant derivatives. In fact, let us consider a vector field V ρ , we take [ µ, ν]V ρ = µ νV ρ − ν µV ρ = ∂µ( νV ρ ) − Γλ µν λV ρ + Γρ µσ νV σ − (µ ↔ ν) = ∂µ∂νV ρ + (∂µΓρ νσ)V σ + Γρ νσ∂µV σ − Γλ µν∂λV ρ − Γλ µνΓρ λσV σ +Γρ µσ∂νV σ + Γρ µσΓσ νλV λ − (µ ↔ ν) = (∂µΓρ νσ − ∂νΓρ µσ + Γρ µλΓλ νσ − Γρ νλΓλ µσ)V σ − 2Γλ [µν] λV ρ . (3.20) The last term is simply the torsion tensor and hence we write [ µ, ν]V ρ = Rρ σµνV σ − Tµν λ λV ρ , (3.21) where the Riemann and torsion tensors are identified as Rρ σµν = ∂µΓρ νσ − ∂νΓρ µσ + Γρ µλΓλ νσ − Γρ νλΓλ µσ . (3.22) and T λ µν = Γλ µν − Γλ νµ (3.23) Since in GR we are mainly interested in Christoffel connection. In this case the connection is derived from the metric and consequently the associated curvature 33 may be thought of as that of the metric itself. Let us try to show following way. If we are in some coordinate system such that ∂σgµν = 0 (everywhere, not just at a point), then Γρ µν = 0 and ∂σΓρ µν = 0; thus Rρ σµν = 0 by (3.22). However since this is a tensor equation that implies that if it is true in one coordinate system it must be true in any coordinate system. Therefore, the statement that the Riemann tensor vanishes is a necessary condition for it to be possible to find coordinates in which the components of gµν are constant everywhere. Note that the Riemann tensor obeys Bianchi identity: [λRρσ]µν = 0 . (3.24) This identity is closely related to the Jacobi identity, since it basically expresses [[ λ, ρ], σ] + [[ ρ, σ], λ] + [[ σ, λ], ρ] = 0 . (3.25) It is frequently useful to consider contractions of the Riemann tensor. Firstly we form contraction known as the Ricci tensor: Rµν = Rλ µλν . (3.26) The Ricci tensor associated with the Christoffel connection is symmetric, Rµν = Rνµ , (3.27) as a consequence of the various symmetries of the Riemann tensor. Using the metric, we can take a further contraction to form the Ricci scalar: R = Rµ µ = gµν Rµν . (3.28) Another identity related to Riemann tensor is µ Rρµ = 1 2 ρR . (3.29) Let us define the Einstein tensor as Gµν = Rµν − 1 2 Rgµν , (3.30) that obeys µ Gµν = 0 . (3.31) The Einstein tensor is very important in general relativity. Let us illustrate the main ideas and notation on a simple example that is twosphere, with metric ds2 = a2 (dθ2 + sin2 θ dφ2 ) , (3.32) where a is the radius of the sphere (thought of as embedded in R3 ). It is simple example to calculate the connection coefficients from the metric above and we obtain Γθ φφ = − sin θ cos θ 34 Γφ θφ = Γφ φθ = cot θ . (3.33) and also Rθ φθφ = ∂θΓθ φφ − ∂φΓθ θφ + Γθ θλΓλ φφ − Γθ φλΓλ θφ = (sin2 θ − cos2 θ) − (0) + (0) − (− sin θ cos θ)(cot θ) = sin2 θ . (3.34) It is easy to check that all of the components of the Riemann tensor either vanish or are related to this one by symmetry. We can compute the Ricci tensor and we obtain Rθθ = gφφ Rφθφθ = 1 Rθφ = Rφθ = 0 Rφφ = gθθ Rθφθφ = sin2 θ . (3.35) Consequently the Ricci scalar is equal to R = gθθ Rθθ + gφφ Rφφ = 2 a2 . (3.36) We see that the Ricci scalar that for a two-dimensional manifold completely characterizes the curvature, is a constant over this two-sphere. This follows from the fact that two-sphere is “maximally symmetric,” manifold 2 . In any number of dimensions the curvature of a maximally symmetric space satisfies (for some constant a) Rρσµν = a−2 (gρµgσν − gρνgσµ) . (3.37) Two-sphere is an example of “positively curved” space-time where the Ricci scalar is positive. We can demonstrate this notation on following example where we can also find meaning of the negative curved space-time. 2 We give precise definition of this notion letter. 35 positive curvature negative curvature 4. Gravitation The main idea of the General theory of relativity is that the spacetime should be described as a curved manifold. In other words the famous Einstein idea is that gravity is a manifestation of spacetime curvature. Let us again introduce the Einstein tensor Gµν = Rµν − 1 2 Rgµν , (4.1) which always obeys µ Gµν = 0. Then the Einstein equation takes the form Gµν = κTµν (4.2) Note that the right-hand side is a covariant expression of the energy and momentum density in the form of a symmetric and conserved (0, 2) tensor, while the left-hand side is a symmetric and conserved (0, 2) tensor constructed from the metric and its first and second derivatives. Let us now contract both sides of (4.2) and we obtain (in four dimensions) R = −κT , (4.3) and using this we can rewrite (4.2) as Rµν = κ(Tµν − 1 2 Tgµν) . (4.4) This is the same equation, just written slightly differently. Einstein’s equations may be thought of as second-order differential equations for the metric tensor field gµν. There are ten independent equations (since both sides are symmetric two-index tensors), which seems to be exactly right for the ten unknown functions of the metric components. However, the Bianchi identity µ Gµν = 0 represents four constraints on the functions Rµν, so there are only six 36 truly independent equations in (4.4). In fact this is appropriate, since if a metric is a solution to Einstein’s equation in one coordinate system xµ it should also be a solution in any other coordinate system xµ . This means that there are four unphysical degrees of freedom in gµν (represented by the four functions xµ (xµ )), and we should expect that Einstein’s equations only constrain the six coordinate-independent degrees of freedom. It is important to stress that as differential equations, these are extremely complicated; the Ricci scalar and tensor are contractions of the Riemann tensor, which involves derivatives and products of the Christoffel symbols, which in turn involve the inverse metric and derivatives of the metric. Furthermore, the energy-momentum tensor Tµν will generally involve the metric as well. The equations are also nonlinear, that implies that two known solutions cannot be superposed to find a third. It is therefore very difficult to solve Einstein’s equations in any sort of generality. Then in order to solve them we have to perform some simplifying assumptions. The most popular sort of simplifying assumption is that the metric has a significant degree of symmetry, and we will talk later on about how symmetries of the metric make life easier. Now we demonstrate how Einstein’s equations can be derived from an action principle. The action should be the integral over spacetime of a Lagrange density (“Lagrangian” for short, although strictly speaking the Lagrangian is the integral over space of the Lagrange density): SH = dn xLH . (4.5) The Lagrange density is a tensor density, which can be written as √ −g times a scalar. What scalars can we make out of the metric? Since we know that the metric can be set equal to its canonical form and its first derivatives set to zero at any one point, any nontrivial scalar must involve at least second derivatives of the metric. The Riemann tensor is of course made from second derivatives of the metric, and we argued earlier that the only independent scalar we could construct from the Riemann tensor was the Ricci scalar R. What we did not show, but is nevertheless true, is that any nontrivial tensor made from the metric and its first and second derivatives can be expressed in terms of the metric and the Riemann tensor. Therefore, the only independent scalar constructed from the metric, which is no higher than second order in its derivatives, is the Ricci scalar. Hilbert figured that this was therefore the simplest possible choice for a Lagrangian, and proposed LH = √ −gR . (4.6) The equations of motion should come from varying the action with respect to the metric. In fact let us consider variations with respect to the inverse metric gµν , which are slightly easier but give an equivalent set of equations. Using R = gµν Rµν, 37 in general we will have δS = dn x √ −ggµν δRµν + √ −gRµνδgµν + Rδ √ −g = (δS)1 + (δS)2 + (δS)3 . (4.7) The second term (δS)2 is already in the form of some expression times δgµν ; let’s examine the others more closely. Recall that the Ricci tensor is the contraction of the Riemann tensor, which is given by Rρ µλν = ∂λΓλ νµ + Γρ λσΓσ νµ − (λ ↔ ν) . (4.8) We perform the variation of the Riemann tensor in such a way that we firstly perform variation of the connection coefficients and then we substitute into this expression. In fact, after some calculations we find the variation of the Riemann tensor in the form δRρ µλν = λ(δΓρ νµ) − ν(δΓρ λµ) . (4.9) Therefore, the contribution of the first term in (4.7) to δS can be written (δS)1 = dn x √ −g gµν λ(δΓλ νµ) − ν(δΓλ λµ) = dn x √ −g σ gµσ (δΓλ λµ) − gµν (δΓσ µν) , (4.10) where we have used metric compatibility. However the integral above is an integral with respect to the natural volume element of the covariant divergence of a vector; by Stokes’s theorem, this is equal to a boundary contribution at infinity which we can set to zero by making the variation vanish at infinity. Therefore this term does not contribute to the total variation. In order to calculate the (δS)3 term we have to use the variation δ(g−1 ) = 1 g gµνδgµν . (4.11) and consequently δ √ −g = − 1 2 √ −ggµνδgµν . (4.12) If we now return back to (4.7), and remembering that (δS)1 does not contribute, we find δS = dn x √ −g Rµν − 1 2 Rgµν δgµν . (4.13) However this should vanish for arbitrary variations and consequently we derive Einstein’s equations in vacuum: 1 √ −g δS δgµν = Rµν − 1 2 Rgµν = 0 . (4.14) 38 However we would like to get the non-vacuum field equations as well. In other words we consider an action of the form S = 1 8πG SH + SM , (4.15) where SM is the action for matter, and we have presciently normalized the gravitational action (although the proper normalization is somewhat convention-dependent). Following through the same procedure as above leads to 1 √ −g δS δgµν = 1 8πG Rµν − 1 2 Rgµν + 1 √ −g δSM δgµν = 0 , (4.16) and we recover Einstein’s equations if we set Tµν = − 1 √ −g δSM δgµν . (4.17) In fact (4.17) turns out to be the best way to define a symmetric energy-momentum tensor. We are mainly interested in the existence of solutions to Einstein’s equations in the presence of “realistic” sources of energy and momentum. The most common property that is demanded of Tµν is that it represent positive energy densities — no negative masses are allowed. In a locally inertial frame this requirement can be written as ρ = T00 ≥ 0. We write it in the coordinate-independent notation as TµνV µ V ν ≥ 0 , for all timelike vectors V µ . (4.18) This is known as the Weak Energy Condition, or WEC. It seems like a reasonable requirement however it is very restrictive. Indeed it is straightforward to show that there are many examples of the classical field theories which violate the WEC, and almost impossible to invent a quantum field theory which obeys it. Nevertheless, it is legitimate to assume that the WEC holds in most cases and it is violated in some extreme conditions. (There are also stronger energy conditions, but they are even less true than the WEC, and we won’t dwell on them.) We continue with the study of the Einstein equations where we now discuss the possibility of the introduction of a cosmological constant. In order to introduce it we add it to the conventional Hilbert action. We therefore consider an action given by S = dn x √ −g(R − 2Λ) , (4.19) where Λ is some constant. The resulting field equations are Rµν − 1 2 Rgµν + Λgµν = 0 , (4.20) and of course there would be an energy-momentum tensor on the right hand side if we had included an action for matter. Λ is the cosmological constant. In order to 39 find its meaning it is convenient to move the additional term in (4.20) to the right hand side, and think of it as a kind of energy-momentum tensor, with Tµν = −Λgµν (it is automatically conserved by metric compatibility). Then Λ can be interpreted as the “energy density of the vacuum,” a source of energy and momentum that is present even in the absence of matter fields. This interpretation is important because quantum field theory predicts that the vacuum should have some sort of energy and momentum. In ordinary quantum mechanics, an harmonic oscillator with frequency ω and minimum classical energy E0 = 0 upon quantization has a ground state with energy E0 = 1 2 ¯hω. A quantized field can be thought of as a collection of an infinite number of harmonic oscillators, and each mode contributes to the ground state energy. The result is of course infinite, and must be appropriately regularized, for example by introducing a cutoff at high frequencies. The final vacuum energy, which is the regularized sum of the energies of the ground state oscillations of all the fields of the theory, has no good reason to be zero and in fact would be expected to have a natural scale Λ ∼ m4 P , (4.21) where the Planck mass mP is approximately 1019 GeV, or 10−5 grams. Observations of the universe on large scales allow us to constrain the actual value of Λ, which turns out to be smaller than (4.21) by at least a factor of 10120 . This is the largest known discrepancy between theoretical estimate and observational constraint in physics, and convinces many people that the “cosmological constant problem” is one of the most important unsolved problems today. On the other hand the observations do not tell us that Λ is strictly zero, and in fact allow values that can have important consequences for the evolution of the universe. 40