Preprint typeset in JHEP style. - HYPER VERSION hep-th/
Brief Introduction to Diﬀerential Geometry
and General Relativity for Cosmologists
Abstract: This is the review material of diﬀerential geometry and general relativity
for students of Lecture: Cosmology
Contents
1. Deﬁnition of basic concepts 1
2. Manifolds 11
3. Curvature 29
4. Gravitation 36
1. Deﬁnition of basic concepts
This material is devoted to the introduction of basic ideas and concepts that are
needed for understanding of Cosmology.
We begin with the context of vector in general relativity. We assign to each point
p in space-time set of all possible vectors located at that point; this set is known as
the tangent space at p, or Tp. We have to emphasize that these vectors are located
at a single point. Let us illustrate this idea on simple picture:
p
manifold
M
Tp
In other words we think of Tp as an abstract vector space for each point in spacetime.
A (real) vector space is a collection of objects (“vectors”) which, roughly
1
speaking, can be added together and multiplied by real numbers in a linear way.
Thus, for any two vectors V and W and real numbers a and b, we have
(a + b)(V + W) = aV + bV + aW + bW . (1.1)
Every vector space has an origin, i.e. a zero vector which functions as an identity
element under vector addition.
It is important to stress that a vector is well-deﬁned geometric object, as is a
vector ﬁeld, deﬁned as a set of vectors with exactly one at each point in space-time
1
. However in most of the physical application it is useful to decompose vectors into
components with respect to some set of basis vectors. Recall that a basis is any set of
vectors which both spans the vector space (any vector is a linear combination of basis
vectors) and is linearly independent (no vector in the basis is a linear combination
of other basis vectors). For any given vector space, there will be an inﬁnite number
of legitimate bases, but each basis will consist of the same number of vectors, known
as the dimension of the space. Let us presume that at each tangent space we set up
a basis of four vectors eµ, with µ ∈ {0, 1, 2, 3} as usual. Then any abstract vector A
can be written as a linear combination of basis vectors:
A = Aµ
eµ . (1.2)
The coeﬃcients Aµ
are the components of the vector A. The real vector is an
abstract geometrical entity, while the components are just the coeﬃcients of the
basis vectors in some convenient basis.
A standard example of a vector in space-time is the tangent vector to a curve.
A parameterized curve or path through space-time is speciﬁed by the coordinates as
a function of the parameter xµ
(λ). The tangent vector V (λ) has components
V µ
=
dxµ
dλ
. (1.3)
Then we denote the entire vector as V = V µ
eµ. Once we have set up a vector we
can deﬁne, known as the dual vector space. The dual space is usually denoted by
an asterisk, so that the dual space to the tangent space Tp is called the cotangent
space and denoted T∗
p . The dual space is the space of all linear maps from the
original vector space to the real numbers. In other words if ω ∈ T∗
p is a dual vector,
then it acts as a map from Tp → R such that:
ω(aV + bW) = aω(V ) + bω(W) ∈ R , (1.4)
where V , W are vectors and a, b are real numbers. These maps form a vector space
themselves; thus, if ω and η are dual vectors, we have
(aω + bη)(V ) = aω(V ) + bη(V ) . (1.5)
1
Mathematically the set of all the tangent spaces of a manifold M is called the tangent bundle,
T(M).
2
To proceed we introduce a set of basis dual vectors θν
by demanding
θν
(eµ) = δν
µ . (1.6)
Then every dual vector can be written in terms of its components, which we label
with lower indices:
ω = ωµθν
. (1.7)
There is alternative notation when we see elements of Tp (what we have called vectors)
referred to as contravariant vectors, and elements of T∗
p (what we have called dual
vectors) referred to as covariant vectors. oﬀended. Another name for dual vectors
is one-forms, a somewhat mysterious designation which will become clearer soon.
The component notation leads to a simple way of writing the action of a dual
vector on a vector:
ω(V ) = ωµV ν
θµ
(eν)
= ωµV ν
δµ
ν
= ωµV µ
∈ R . (1.8)
The form of (1.35) also suggests that we can think of vectors as linear maps on dual
vectors, by deﬁning
V (ω) ≡ ω(V ) = ωµV µ
. (1.9)
Therefore, the dual space to the dual vector space is the original vector space itself.
(The set of all cotangent spaces over M is the cotangent bundle, T∗
(M).) In
that case the action of a dual vector ﬁeld on a vector ﬁeld is not a single number,
but a scalar (or just “function”) on space-time.
In space-time the simplest example of a dual vector is the gradient of a scalar
function, the set of partial derivatives with respect to the space-time coordinates,
which we denote by “d”:
dφ =
dφ
dxµ
θµ
. (1.10)
A straightforward generalization of vectors and dual vectors is the notion of a tensor.
Just as a dual vector is a linear map from vectors to R, a tensor T of type (or rank)
(k, l) is a multilinear map from a collection of dual vectors and vectors to R:
T : T∗
p × · · · × T∗
p × Tp × · · · × Tp → R
(k times) (l times) (1.11)
Here, “×” denotes the Cartesian product, so that for example Tp × Tp is the space of
ordered pairs of vectors. Multilinearity means that the tensor acts linearly in each
of its arguments; for instance, for a tensor of type (1, 1), we have
T(aω + bη, cV + dW) = acT(ω, V ) + adT(ω, W) + bcT(η, V ) + bdT(η, W) . (1.12)
3
From this point of view, a scalar is a type (0, 0) tensor, a vector is a type (1, 0)
tensor, and a dual vector is a type (0, 1) tensor.
The space of all tensors of a ﬁxed type (k, l) forms a vector space; they can be
added together and multiplied by real numbers. In order to construct a basis for this
space, we need to deﬁne a new operation known as the tensor product, denoted by
⊗. If T is a (k, l) tensor and S is a (m, n) tensor, we deﬁne a (k + m, l + n) tensor
T ⊗ S by
T ⊗ S(ω(1)
, . . . , ω(k)
, . . . , ω(k+m)
, V (1)
, . . . , V (l)
, . . . , V (l+n)
)
= T(ω(1)
, . . . , ω(k)
, V (1)
, . . . , V (l)
)S(ω(k+1)
, . . . , ω(k+m)
, V (l+1)
, . . . , V (l+n)
) .(1.13)
In other words, ﬁrst act T on the appropriate set of dual vectors and vectors, and
then act S on the remainder, and then multiply the answers. Note that, in general,
T ⊗ S = S ⊗ T.
Using these rules it is straightforward to construct a basis for the space of all
(k, l) tensors. We simply take tensor products of basis vectors and dual vectors.
Then this basis will consist of all tensors of the form
eµ1 ⊗ · · · ⊗ eµk
⊗ θν1
⊗ · · · ⊗ θνl
. (1.14)
In component notation we then write our arbitrary tensor as
T = Tµ1···µk
ν1···νl
eµ1 ⊗ · · · ⊗ eµk
⊗ θν1
⊗ · · · ⊗ θνl
. (1.15)
The order of the indices is obviously important, since the tensor need not act in the
same way on its various arguments.
Now let’s turn to some examples of tensors. The most familiar example of a
(0, 2) tensor in ﬂat Minkowski space-time is the metric, ηµν. The action of the metric
on two vectors is so useful that it gets its own name, the inner product (or dot
product):
η(V, W) = ηµνV µ
Wν
= V · W . (1.16)
The norm of a vector is deﬁned to be inner product of the vector with itself; unlike
in Euclidean space, this number is not positive deﬁnite: When ηµνV µ
V ν
< 0 then
we call V µ
to be time-like for ηµνV µ
V ν
= 0 V µ
is null or light-like and for
ηµνV µ
V ν
> 0 we call V µ
as space-like.
Another tensor is the Kronecker delta δµ
ν , of type (1, 1), which you already know
the components of. Related to this and the metric is the inverse metric ηµν
, a type
(2, 0) tensor deﬁned as the inverse of the metric:
ηµν
ηνρ = ηρνηνµ
= δρ
µ . (1.17)
There is also the Levi-Civita tensor, a (0, 4) tensor:
µνρσ =



+1 if µνρσ is an even permutation of 0123
−1 if µνρσ is an odd permutation of 0123
0 otherwise .
(1.18)
4
Here, a “permutation of 0123” is an ordering of the numbers 0, 1, 2, 3 which can be
obtained by starting with 0123 and exchanging two of the digits; an even permutation
is obtained by an even number of such exchanges, and an odd permutation is obtained
by an odd number. Thus, for example, 0321 = −1.
With some examples in hand we can now be a little more systematic about some
properties of tensors. First consider the operation of contraction, which turns a
(k, l) tensor into a (k − 1, l − 1) tensor. Contraction is deﬁned as the sum over one
upper and one lower index:
Sµρ
σ = Tµνρ
σν . (1.19)
It is important to stress that we can contract an upper index with a lower index
(as opposed to two indices of the same type). It is also important to stress that the
order of the indices matters, so that you can get diﬀerent tensors by contracting in
diﬀerent ways; thus,
Tµνρ
σν = Tµρν
σν (1.20)
in general.
The metric and inverse metric can be used to raise and lower indices on
tensors. That is, given a tensor Tαβ
γδ, we can with the help of the metric to deﬁne
new tensors which we choose to denote by the same letter T:
Tαβµ
δ = ηµγ
Tαβ
γδ ,
Tµ
β
γδ = ηµαTαβ
γδ ,
Tµν
ρσ
= ηµαηνβηργ
ησδ
Tαβ
γδ , (1.21)
Again, it is important that summing does not change the position of an index relative
to other indices, and also that “free” indices (which are not summed over) must be
the same on both sides of an equation, while “dummy” indices (which are summed
over) only appear on one side. As an example, we can turn vectors and dual vectors
into each other by raising and lowering indices:
Vµ = ηµνV ν
ωµ
= ηµν
ων . (1.22)
Further we refer to a tensor as symmetric in any of its indices if it is unchanged
under exchange of those indices. Thus, if
Sµνρ = Sνµρ , (1.23)
we say that Sµνρ is symmetric in its ﬁrst two indices, while if
Sµνρ = Sµρν = Sρµν = Sνµρ = Sνρµ = Sρνµ , (1.24)
5
we say that Sµνρ is symmetric in all three of its indices. Similarly, a tensor is antisymmetric
(or “skew-symmetric”) in any of its indices if it changes sign when those
indices are exchanged; thus,
Aµνρ = −Aρνµ (1.25)
means that Aµνρ is antisymmetric in its ﬁrst and third indices (or just “antisymmetric
in µ and ρ”). If a tensor is (anti-) symmetric in all of its indices, we refer
to it as simply (anti-) symmetric (sometimes with the redundant modiﬁer “completely”).
As examples, the metric ηµν and the inverse metric ηµν
are symmetric,
while the Levi-Civita tensor µνρσ and the electromagnetic ﬁeld strength tensor Fµν
are antisymmetric.
Given any tensor, we can symmetrize (or antisymmetrize) any number of its
upper or lower indices. The symmetrization is to deﬁned as the sum of all permutations
of the relevant indices and divide by the number of terms:
T(µ1µ2···µn)ρ
σ
=
1
n!
(Tµ1µ2···µnρ
σ
+ sum over permutations of indices µ1 · · · µn) ,
(1.26)
while antisymmetrization comes from the alternating sum:
T[µ1µ2···µn]ρ
σ
=
1
n!
(Tµ1µ2···µnρ
σ
+ alternating sum over permutations of indices µ1 · · · µn) .
(1.27)
where alternating sum we mean that permutations which are the result of an odd
number of exchanges are given a minus sign, thus:
T[µνρ]σ =
1
6
(Tµνρσ − Tµρνσ + Tρµνσ − Tνµρσ + Tνρµσ − Tρνµσ) . (1.28)
Notice that round/square brackets denote symmetrization/antisymmetrization.
There is a special class of tensors that play an important role in physics. These
tensors are known as diﬀerential forms (or just “forms”). A diﬀerential p-form
is a (0, p) tensor which is completely antisymmetric. Thus, scalars are automatically
0-forms, and dual vectors are automatically one-forms. We also have the 2-form
Fµν and the 4-form µνρσ. The space of all p-forms is denoted Λp
, and the space
of all p-form ﬁelds over a manifold M is denoted Λp
(M). The number of linearly
independent p-forms on an n-dimensional vector space is n!/(p!(n − p)!). So at a
point on a 4-dimensional space-time there is one linearly independent 0-form, four
1-forms, six 2-forms, four 3-forms, and one 4-form. There are no p-forms for p > n,
since all of the components will automatically be zero by antisymmetry.
Given a p-form A and a q-form B, we can form a (p + q)-form known as the
wedge product A ∧ B by taking the antisymmetrized tensor product:
(A ∧ B)µ1···µp+q =
(p + q)!
p! q!
A[µ1···µp Bµp+1···µp+q] . (1.29)
6
For example the wedge product of two 1-forms is
(A ∧ B)µν = 2A[µBν] = AµBν − AνBµ . (1.30)
Using this deﬁnition we obtain an important relation
A ∧ B = (−1)pq
B ∧ A . (1.31)
The exterior derivative “d” allows us to diﬀerentiate p-form ﬁelds to obtain (p +
1)-form ﬁelds. It is deﬁned as an appropriately normalized antisymmetric partial
derivative:
(dA)µ1···µp+1 = (p + 1)∂[µ1 Aµ2···µp+1] . (1.32)
The reason why the exterior derivative deserves special attention is that it is a
tensor, even in curved space-times, unlike its cousin the partial derivative. Another
interesting fact about exterior diﬀerentiation is that, for any form A,
d(dA) = 0 , (1.33)
which is often written d2
= 0. This identity is a consequence of the deﬁnition of d
and the fact that partial derivatives commute, ∂α∂β = ∂β∂α (acting on anything).
Let us now introduce another operation on diﬀerential forms known as Hodge
duality. We deﬁne the “Hodge star operator” on an n-dimensional manifold as a
map from p-forms to (n − p)-forms,
(∗A)µ1···µn−p =
1
p!
ν1···νp
µ1···µn−p Aν1···νp , (1.34)
mapping A to “A dual”. Unlike our other operations on forms, the Hodge dual
depends on the metric of the manifold (which should be obvious, since we had to
raise some indices on the Levi-Civita tensor in order to deﬁne (1.34). Applying the
Hodge star twice returns either plus or minus the original form:
∗ ∗ A = (−1)s+p(n−p)
A , (1.35)
where s is the number of minus signs in the eigenvalues of the metric (for Minkowski
space, s = 1). It is important to stress that the Hodge duality is deﬁned on Manifold
where the metric structure is deﬁned.
Now we give brief review of physics in Minkowski space-time that will be useful
in what follows. let’s review how physics works in Minkowski space-time.
Let us consider the world-line of a single particle. This is speciﬁed by a map
R → M, where M is the manifold representing space-time; we usually think of the
path as a parameterized curve xµ
(λ). Note that the tangent vector to this path is
dxµ
/dλ (note that it depends on the parameterization). Given path is characterized
by the norm of the tangent vector. If the tangent vector is timelike/null/spacelike
7
t
x
spacelike
null
timelike
dx
--
d
x ( )
λ
µ
µ
λ
at some parameter value λ, we say that the path is timelike/null/spacelike at that
point. A fundamental object of the theory is the line element, or inﬁnitesimal
interval:
ds2
= ηµνdxµ
dxν
. (1.36)
From this deﬁnition it is tempting to take the square root and integrate along a path
to obtain a ﬁnite interval. But since ds2
need not be positive, we deﬁne diﬀerent
procedures for diﬀerent cases. For space-like paths we deﬁne the path length
∆s = ηµν
dxµ
dλ
dxν
dλ
dλ , (1.37)
where the integral is taken over the path. For null paths the interval is zero, so no
extra formula is required. For time-like paths we deﬁne the proper time
∆τ = −ηµν
dxµ
dλ
dxν
dλ
dλ , (1.38)
which will be positive.
Let’s move from the consideration of paths in general to the paths of massive
particles (which will always be time-like). Since the proper time is measured by a
clock traveling on a time-like world-line, it is convenient to use τ as the parameter
along the path. That is, we use (1.38) to compute τ(λ), which (if λ is a good
parameter in the ﬁrst place) we can invert to obtain λ(τ), after which we can think
8
of the path as xµ
(τ). The tangent vector in this parameterization is known as the
four-velocity, Uµ
:
Uµ
=
dxµ
dτ
. (1.39)
Since dτ2
= −ηµνdxµ
dxν
as follows from the invariance of the line element, the
four-velocity is automatically normalized:
ηµνUµ
Uν
= −1 . (1.40)
(It will always be negative, since we are only deﬁning it for time-like trajectories.
You could deﬁne an analogous vector for space-like paths as well; null paths give
some extra problems since the norm is zero.) In the rest frame of a particle, its
four-velocity has components Uµ
= (1, 0, 0, 0).
A related vector is the energy-momentum four-vector, deﬁned by
pµ
= mUµ
, (1.41)
where m is the mass of the particle. The mass is a ﬁxed quantity independent of
inertial frame; what you may be used to thinking of as the “rest mass.” Although
pµ
provides a complete description of the energy and momentum of a particle, for
extended systems it is necessary to go further and deﬁne the energy-momentum
tensor (sometimes called the stress-energy tensor), Tµν
. This is a symmetric (2, 0)
tensor which tells us all we need to know about the energy-like aspects of a system:
energy density, pressure, stress, and so forth. A general deﬁnition of Tµν
is “the
ﬂux of four-momentum pµ
across a surface of constant xν
”. In more details let us
consider general form of matter that can be characterized as a ﬂuid — a continuum of
matter described by macroscopic quantities such as temperature, pressure, entropy,
viscosity, etc. In general relativity all interesting types of matter can be thought of
as perfect ﬂuids, from stars to electromagnetic ﬁelds to the entire universe. An
alternative deﬁnition of a perfect ﬂuid to be one with no heat conduction and no
viscosity or it can be deﬁned as a ﬂuid which looks isotropic in its rest frame; these
two viewpoints turn out to be equivalent. For our use we can think about a perfect
ﬂuid as one which may be completely characterized by its pressure and density.
The most simple example of perfect ﬂuid is dust. Dust is deﬁned as a collection
of particles at rest with respect to each other, or alternatively as a perfect ﬂuid with
zero pressure. Since by deﬁnition all particles have an equal velocity in any ﬁxed
inertial frame, we can imagine a “four-velocity ﬁeld” Uµ
(x) deﬁned all over spacetime.
(Indeed, its components are the same at each point.) Deﬁne the number-ﬂux
four-vector to be
Nµ
= nUµ
, (1.42)
where n is the number density of the particles as measured in their rest frame. Then
N0
is the number density of particles as measured in any other frame, while Ni
is the
9
ﬂux of particles in the xi
direction. Now we can imagine that each of the particles
have the same mass m. Then in the rest frame the energy density of the dust is given
by
ρ = nm . (1.43)
It is important to stress that ρ only measures the energy density in the rest frame.
However we want to know the energy density in other frames. To proceed note that
both n and m are 0-components of four-vectors in their rest frame. In more details,
Nµ
= (n, 0, 0, 0) and pµ
= (m, 0, 0, 0). Therefore ρ is the µ = 0, ν = 0 component
of the tensor p ⊗ N as measured in its rest frame. Then it is natural to deﬁne the
energy-momentum tensor for dust as:
Tµν
dust = pµ
Nν
= nmUµ
Uν
= ρUµ
Uν
, (1.44)
where ρ is deﬁned as the energy density in the rest frame.
Let us nor return to deﬁnition of ”perfect” ﬂuid. The natural deﬁnition is that
it is matter that is “isotropic in its rest frame.” This in turn means that Tµν
is
diagonal — there is no net ﬂux of any component of momentum in an orthogonal
direction. Furthermore, due to the isotropy of the perfect ﬂuid in its rest frame the
nonzero space-like components must all be equal, T11
= T22
= T33
. This fact implies
that the only two independent numbers are T00
and one of the Tii
. It is convenient
to call the ﬁrst of these the energy density ρ, and the second the pressure p. Then
the energy-momentum tensor of a perfect ﬂuid therefore takes the following form in
its rest frame:
Tµν
=






ρ 0 0 0
0 p 0 0
0 0 p 0
0 0 0 p






. (1.45)
We would like, of course, a formula which is good in any frame. In other words we
have to write it in covariant manner. For dust we had Tµν
= ρUµ
Uν
, so we might
begin by guessing (ρ + p)Uµ
Uν
, which gives






ρ + p 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0






. (1.46)
using the fact that in the rest frame Uµ
= (1, 0, 0, 0). To get the answer we want we
must therefore add 





−p 0 0 0
0 p 0 0
0 0 p 0
0 0 0 p






. (1.47)
10
It turns out however that this matrix has an obvious covariant generalization in the
form pηµν
. Thus, the general form of the energy-momentum tensor for a perfect ﬂuid
is
Tµν
= (ρ + p)Uµ
Uν
+ pηµν
. (1.48)
This is an important formula for applications such as stellar structure and cosmology.
Further examples of the energy-momentum tensors are the energy-momentum
tensors of electromagnetism and scalar ﬁeld theory. We will see their form in the
main text.
It is important to stress that Tµν
is conserved that means vanishing of the “di-
vergence”:
∂µTµν
= 0 . (1.49)
This is a set of four equations, one for each value of ν. The ν = 0 equation corresponds
to conservation of energy, while ∂µTµk
= 0 expresses conservation of the kth
component of the momentum.
2. Manifolds
In this section we generalize the notion of the ﬂat space to the case of curved spacetime.
In order to properly understand how it works we have to learn a bit about
the mathematics of curved spaces. As the ﬁrst step we study the notion of manifold.
Note that we will work in n dimensions.
A manifold (or sometimes “diﬀerentiable manifold”) is very important concept
in mathematics and physics. One is certainly familiar with the properties of ndimensional
Euclidean space, Rn
, the set of n-tuples (x1
, . . . , xn
).
We can imagine manifold as a space which may be curved and have a complicated
topology, but in local regions looks just like Rn
. (Here by “looks like” we do not
mean that the metric is the same, but only basic notions of analysis like open sets,
functions, and coordinates.) In other words we can imagine that the entire manifold
is constructed by smoothly sewing together these local regions. Let us now present
more rigorous deﬁnition of manifold.
The most elementary notion is that of a map between two sets. (We assume
you know what a set is.) If we have two sets M and N, a map φ : M → N is a
relationship which assigns, to each element of M, exactly one element of N. In other
words A map is a simple generalization of a function. This fact can be demonstrated
on following picture:
ϕ
M
N
11
Given two maps φ : A → B and ψ : B → C, we deﬁne the composition
ψ ◦ φ : A → C by the operation (ψ ◦ φ)(a) = ψ(φ(a)). So a ∈ A, φ(a) ∈ B, and thus
(ψ ◦ φ)(a) ∈ C. In pictures:
ψ ϕ
A
B
C
ϕ ψ
A map φ is called one-to-one (or “injective”) if each element of N has at most
one element of M mapped into it, and onto (or “surjective”) if each element of N
has at least one element of M mapped into it. (If you think about it, a better name
for “one-to-one” would be “two-to-two”.)
The set M is known as the domain of the map φ, and the set of points in N
which M gets mapped into is called the image of φ. For some subset U ⊂ N, the
set of elements of M which get mapped to U is called the preimage of U under
φ, or φ−1
(U). A map which is both one-to-one and onto is known as invertible
(or “bijective”). In this case we can deﬁne the inverse map φ−1
: N → M by
(φ−1
◦ φ)(a) = a. (Note that the same symbol φ−1
is used for both the preimage
and the inverse map, even though the former is always deﬁned and the latter is only
deﬁned in some special cases.) Thus:
-1
M N
ϕ
ϕ
Let us now introduce the notion of continuity of a map between topological
spaces (and thus manifolds). Luckily the precise mathematical deﬁnition is not
needed for us so that we can give an intuitive meaning of continuity and diﬀerentiability
of maps φ : Rm
→ Rn
between Euclidean spaces are useful. A map from
Rm
to Rn
takes an m-tuple (x1
, x2
, . . . , xm
) to an n-tuple (y1
, y2
, . . . , yn
), and can
therefore be thought of as a collection of n functions φi
of m variables:
y1
= φ1
(x1
, x2
, . . . , xm
)
y2
= φ2
(x1
, x2
, . . . , xm
)
·
·
·
yn
= φn
(x1
, x2
, . . . , xm
) .
(2.1)
We will refer to any one of these functions as Cp
if it is continuous and p-times differentiable,
and refer to the entire map φ : Rm
→ Rn
as Cp
if each of its component
12
functions are at least Cp
. Thus a C0
map is continuous but not necessarily diﬀerentiable,
while a C∞
map is continuous and can be diﬀerentiated as many times as
you like. C∞
maps are sometimes called smooth and we will consider these functions
only. Then we call two sets M and N diﬀeomorphic if there exists a C∞
map φ : M → N with a C∞
inverse φ−1
: N → M; the map φ is then called a
diﬀeomorphism.
For further purposes we recall the deﬁnition of chain rule. Let us presume
that we have maps f : Rm
→ Rn
and g : Rn
→ Rl
, and their the composition
(g ◦ f) : Rm
→ Rl
.
g f
g
R
R
R
m
n
l
f
We can represent each space in terms of coordinates: xa
on Rm
, yb
on Rn
, and
zc
on Rl
, where the indices range over the appropriate values. The chain rule relates
the partial derivatives of the composition to the partial derivatives of the individual
maps:
∂
∂xa
(g ◦ f)c
=
b
∂fb
∂xa
∂gc
∂yb
. (2.2)
This relation is usually written as
∂
∂xa
=
b
∂yb
∂xa
∂
∂yb
. (2.3)
It is important that for m = n the determinant of the matrix ∂yb
/∂xa
is called the
Jacobian of the map, and the map is called invertible whenever the Jacobian is
nonzero.
Using these well known deﬁnitions we proceed to the deﬁnition of manifold. In
order to do this we ﬁrstly have to deﬁne the notion of an open set, on which we can
put coordinate systems, and then sew the open sets together in an appropriate way.
We start with the notion of an open ball, which is the set of all points x in Rn
such that |x−y| < r for some ﬁxed y ∈ Rn
and r ∈ R, where |x−y| = [ i(xi
−yi
)2
]1/2
.
Note that this is a strict inequality — the open ball is the interior of an n-sphere of
radius r centered at y. An open set in Rn
is a set constructed from an arbitrary
(maybe inﬁnite) union of open balls. In other words, V ⊂ Rn
is open if, for any
y ∈ V , there is an open ball centered at y which is completely inside V . Roughly
speaking, an open set is the interior of some (n−1)-dimensional closed surface (or the
13
r
y
open ball
union of several such interiors). By deﬁning a notion of open sets, we have equipped
Rn
with a topology — in this case, the “standard metric topology.”
A chart or coordinate system consists of a subset U of a set M, along with
a one-to-one map φ : U → Rn
, such that the image φ(U) is open in R. (Any map
is onto its image, so the map φ : U → φ(U) is invertible.) We then can say that U
is an open set in M. (We have thus induced a topology on M, although we will not
explore this.)
U
U
M
ϕ( )
R
n
ϕ
A C∞
atlas is an indexed collection of charts {(Uα, φα)} which satisﬁes two
conditions:
1. The union of the Uα is equal to M. In other words the Uα cover M.
2. The charts are smoothly sewn together. More precisely, if two charts overlap,
Uα ∩ Uβ = ∅, then the map (φα ◦ φ−1
β ) takes points in φβ(Uα ∩ Uβ) ⊂ Rn
onto
φα(Uα ∩ Uβ) ⊂ Rn
, and all of these maps must be C∞
where they are deﬁned.
One can see this more clearly from pictures
Now ﬁnally, C∞
n-dimensional manifold (or n-manifold for short) is simply
a set M along with a “maximal atlas”, one that contains every possible compatible
chart. The requirement that the atlas be maximal is so that two equivalent spaces
equipped with diﬀerent atlases don’t count as diﬀerent manifolds. This deﬁnition
captures in formal terms our notion of a set that looks locally like Rn
.
Note that this deﬁnition does not rely on an embedding of the manifold in some
higher-dimensional Euclidean space. In other words it’s important to recognize that
the manifold has an individual existence independent of any embedding. In other
words there is no reason to believe, for example, that four-dimensional space-time is
stuck in some larger space.
14
Uα
ϕ ( )
ϕ ( )
ϕ
ϕ
ϕ ϕ
ϕ ϕ
β α
α β
α
β
Uβ
Uα
α
β
Uβ
-1
-1
these maps are only
defined on the shaded
regions, and must be
smooth there.
M
R
R
n
n
It is important to stress the necessity of charts and atlases: many manifolds
cannot be covered with a single coordinate system. The fact that manifolds look
locally like Rn
, which is most clearly seen from the construction of coordinate charts,
introduces the possibility of analysis on manifolds, as for example diﬀerentiation and
integration.
Let us consider two manifolds M and N of dimensions m and n, with coordinate
charts φ on M and ψ on N. Imagine we have a function f : M → N,
M Nf
R Rψ
ϕϕ-1
m
f ϕ-1 n
-1
ψ ψ
Since M and N are spaces with no appriory deﬁnition of diﬀerentiation we cannot
nonchalantly diﬀerentiate the map f, since we don’t know what such an operation
means. But the coordinate charts allow us to construct the map (ψ ◦ f ◦ φ−1
) :
Rm
→ Rn
. From deﬁnition this is just a map between Euclidean spaces, and all of
the concepts of advanced calculus apply. For example f, thought of as an N-valued
function on M, can be diﬀerentiated to obtain ∂f/∂xµ
, where the xµ
represent Rm
.
More precisely, using the deﬁnition above we obtain
∂f
∂xµ
≡
∂
∂xµ
(ψ ◦ f ◦ φ−1
)(xµ
) . (2.4)
15
For most practical purposes the short hand notation will be the appropriate one.
Now we can proceed to the construction of various kinds of structure on manifolds.
As the ﬁrst step we begin with the vectors and tangent spaces. Remember
the notion of a tangent space— the set of all vectors at a single point in space-time.
The natural deﬁnition how to introduce the tangent space is to use our intuitive
knowledge that there are objects called “tangent vectors to curves” which belong in
the tangent space. Then we can consider the set of all parameterized curves through
p. In other words the space of all (nondegenerative) maps γ : R → M such that p
is in the image of γ. Then it is natural to deﬁne the tangent space as simply the
space of all tangent vectors to these curves at the point p. To do this let us consider
some coordinate system xµ
any curve through p deﬁnes an element of Rn
speciﬁed
by the n real numbers dxµ
dλ
(where λ is the parameter along the curve), but this map
is clearly coordinate-dependent, which is not what we want. In order to ﬁnd coordinate
independent formulation we proceed as follows. We deﬁne F to be the space
of all smooth functions on M (that is, C∞
maps f : M → R). Each curve through
p deﬁnes an operator on this space, the directional derivative, which maps f → df
dλ
(at p). Then we claim that the tangent space Tp can be identiﬁed with the space of
directional derivative operators along curves through p. It can be shown that the
space of directional derivatives is a vector space, and that it is the vector space we
want (it has the same dimensionality as M, yields a natural idea of a vector pointing
along a certain direction, and so on). In fact let us search a basis for the space.
Consider again a coordinate chart with coordinates xµ
. Then there is an obvious set
of n directional derivatives at p, namely the partial derivatives ∂µ at p.
p
1
ρ
2
ρ
x
x2
1
Then the partial derivative operators {∂µ} at p form a basis for the tangent space
Tp. In fact any directional derivative can be decomposed into a sum of real numbers
times partial derivatives. To see this let us consider an n-manifold M, a coordinate
chart φ : M → Rn
, a curve γ : R → M, and a function f : M → R. Then if λ is
the parameter along γ, we want to expand the vector/operator d
dλ
in terms of the
partials ∂µ. Using the chain rule (2.2), we have
d
dλ
f =
d
dλ
(f ◦ γ)
=
d
dλ
[(f ◦ φ−1
) ◦ (φ ◦ γ)]
16
=
d(φ ◦ γ)µ
dλ
∂(f ◦ φ−1
)
∂xµ
=
dxµ
dλ
∂µf . (2.5)
The ﬁrst line simply takes the informal expression on the left hand side and rewrites
it as an honest derivative of the function (f ◦ γ) : R → R. The second line just
comes from the deﬁnition of the inverse map φ−1
(and associativity of the operation
of composition). The third line is the formal chain rule (2.2), and the last line is a
return to the informal notation of the start. Since the function f was arbitrary, we
have
d
dλ
=
dxµ
dλ
∂µ . (2.6)
Thus, the partials {∂µ} do indeed represent a good basis for the vector space of
directional derivatives, which we can therefore safely identify with the tangent space.
We can see this in more details on following picture:
f -1ϕ
ϕϕ-1
f
M
R
R
γ
ϕ γ
f γ
x
µ
R
n
This particular basis (ˆe(µ) = ∂µ) is known as a coordinate basis for Tp. There is
no reason why we are limited to coordinate bases when we consider tangent vectors;
it is sometimes more convenient, for example, to use orthonormal bases of some sort.
However, the coordinate basis is very simple and natural, and we will use it almost
exclusively throughout the course.
One of the advantages of this abstract deﬁnition of vectors is that the transformation
law is immediate. Since the basis vectors are ˆe(µ) = ∂µ, the basis vectors in
some new coordinate system xµ
are given by the chain rule (2.3) as
∂µ =
∂xµ
∂xµ
∂µ . (2.7)
We can get the transformation law for vector components by the same technique
used in ﬂat space, demanding the vector V = V µ
∂µ be unchanged by a change of
17
basis. We have
V µ
∂µ = V µ
∂µ
= V µ ∂xµ
∂xµ
∂µ , (2.8)
and hence (since the matrix ∂xµ
/∂xµ
is the inverse of the matrix ∂xµ
/∂xµ
),
V µ
=
∂xµ
∂xµ
V µ
(2.9)
Since the basis vectors are usually not written explicitly, the rule (2.13) for transforming
components is what we call the “vector transformation law.”
As the next step we study the transformation properties of one-forms. Once
again the cotangent space T∗
p is the set of linear maps ω : Tp → R. The canonical
example of a one-form is the gradient of a function f, denoted df. In fact it turns
out that the gradients of the coordinate functions xµ
provide a natural basis for the
cotangent space.
dxµ
(∂ν) =
∂xµ
∂xν
= δµ
ν . (2.10)
Therefore the gradients {dxµ
} are an appropriate set of basis one-forms; an arbitrary
one-form is expanded into components as ω = ωµ dxµ
.
The transformation properties of basis dual vectors and components follow from
what is by now the usual procedure. We obtain, for basis one-forms,
dxµ
=
∂xµ
∂xµ
dxµ
, (2.11)
and for components,
ωµ =
∂xµ
∂xµ
ωµ . (2.12)
We will usually write the components ωµ when we speak about a one-form ω.
The transformation law for general tensors follows this same pattern of replacing
the Lorentz transformation matrix used in ﬂat space with a matrix representing more
general coordinate transformations. A (k, l) tensor T can be expanded
T = Tµ1···µk
ν1···νl
∂µ1 ⊗ · · · ⊗ ∂µk
⊗ dxν1
⊗ · · · ⊗ dxνl
, (2.13)
and under a coordinate transformation the components change according to
Tµ1···µk
ν1···νl
=
∂xµ1
∂xµ1
· · ·
∂xµk
∂xµk
∂xν1
∂xν1
· · ·
∂xνl
∂xνl
Tµ1···µk
ν1···νl
. (2.14)
Let us demonstrate on following example how to transform forms in practise. Consider
a symmetric (0, 2) tensor S on a 2-dimensional manifold, whose components in
a coordinate system (x1
= x, x2
= y) are given by
Sµν =
x 0
0 1
. (2.15)
18
or equivalently
S = Sµν(dxµ
⊗ dxν
)
= x(dx)2
+ (dy)2
, (2.16)
where in the last line the tensor product symbols are suppressed for brevity. Now
consider new coordinates
x = x1/3
y = ex+y
. (2.17)
This leads directly to
x = (x )3
y = ln(y ) − (x )3
dx = 3(x )2
dx
dy =
1
y
dy − 3(x )2
dx . (2.18)
x = (x )3
y = ln(y ) − (x )3
dx = 3(x )2
dx
dy =
1
y
dy − 3(x )2
dx . (2.19)
We insert these results to the expression above and remember that tensor products
don’t commute, so dx dy = dy dx ): and we obtain
S = 9(x )4
[1 + (x )3
](dx )2
− 3
(x )2
y
(dx dy + dy dx ) +
1
(y )2
(dy )2
, (2.20)
or
Sµ ν =
9(x )4
[1 + (x )3
] −3(x )2
y
−3(x )2
y
1
(y )2
. (2.21)
Notice that it is still symmetric. We did not use the transformation law (2.19)
directly, but doing so would have yielded the same result, as you can check.
As the next step we study exterior derivative d. The exterior derivative operator
d forms an antisymmetric (0, p + 1) tensor when acted on a p-form. So the exterior
derivative is a legitimate tensor operator; it is not, however, an adequate substitute
for the partial derivative, since it is only deﬁned on forms.
The metric tensor in curved space is denoted as gµν (while ηµν is reserved specifically
for the Minkowski metric). gµν is symmetric (0, 2) tensor that is also nondegenerate,
meaning that the determinant g = |gµν| doesn’t vanish. This allows us
to deﬁne the inverse metric gµν
via
gµν
gνσ = δµ
σ . (2.22)
19
The symmetry of gµν implies that gµν
is also symmetric. The metric and its inverse
may be used to raise and lower indices on tensors.
The natural object that is directly related to metric tensor is the line element
ds2
= gµν dxµ
dxν
. (2.23)
For example, we know that the Euclidean line element in a three-dimensional space
with Cartesian coordinates is
ds2
= (dx)2
+ (dy)2
+ (dz)2
. (2.24)
We can now change to any coordinate system we choose. For example, in spherical
coordinates we have
x = r sin θ cos φ
y = r sin θ sin φ
z = r cos θ , (2.25)
which leads directly to
ds2
= dr2
+ r2
dθ2
+ r2
sin2
θ dφ2
. (2.26)
Obviously the components of the metric look diﬀerent than those in Cartesian coordinates,
but all of the properties of the space remain unaltered.
A good example of a space with curvature is the two-sphere, which can be thought
of as the locus of points in R3
at distance 1 from the origin. The metric in the (θ, φ)
coordinate system comes from setting r = 1 and dr = 0 in (2.32):
ds2
= dθ2
+ sin2
θ dφ2
. (2.27)
The metric tensor contains all the information we need to describe the curvature of
the manifold. In Minkowski space we can choose coordinates in which the components
of the metric are constant; but it should be clear that the existence of curvature is
more subtle than having the metric depend on the coordinates, since in the example
above we showed how the metric in ﬂat Euclidean space in spherical coordinates is a
function of r and θ. Later, we shall see that constancy of the metric components is
suﬃcient for a space to be ﬂat, and in fact there always exists a coordinate system
on any ﬂat space in which the metric is constant.
A useful characterization of the metric is obtained by putting gµν into its canonical
form. In this form the metric components become
gµν = diag (−1, −1, . . . , −1, +1, +1, . . . , +1, 0, 0, . . . , 0) , (2.28)
where “diag” means a diagonal matrix with the given elements. If n is the dimension
of the manifold, s is the number of +1’s in the canonical form, and t is the number
20
of −1’s, then s − t is the signature of the metric (the diﬀerence in the number of
minus and plus signs), and s + t is the rank of the metric (the number of nonzero
eigenvalues). If a metric is continuous, the rank and signature of the metric tensor
ﬁeld are the same at every point, and if the metric is nondegenerative the rank is
equal to the dimension n. We will always deal with continuous, nondegenerative
metrics. If all of the signs are positive (t = 0) the metric is called Euclidean or
Riemannian (or just “positive deﬁnite”), while if there is a single minus (t = 1) it
is called Lorentzian or pseudo-Riemannian, and any metric with some +1’s and
some −1’s is called “indeﬁnite.” (So the word “Euclidean” sometimes means that
the space is ﬂat, and sometimes doesn’t, but always means that the canonical form
is strictly positive; the terminology is unfortunate but standard.) The space-times
of interest in general relativity have Lorentzian metrics.
It can be shown that it is always possible to put the metric into canonical form
at some point p ∈ M, but in general it will only be possible at that single point, not
in any neighborhood of p.
We will now deﬁne the Levi-Civita symbol to be exactly this ˜µ1µ2···µn —
that is, an object with n indices which has the components speciﬁed above in any
coordinate system. This is called a “symbol,” of course, because it is not a tensor;
it is deﬁned not to change under coordinate transformations. On the other hand we
can deﬁne the Levi-Civita tensor as
µ1µ2···µn = |g| ˜µ1µ2···µn . (2.29)
It is this tensor which is used in the deﬁnition of the Hodge dual, (1.87), which is
otherwise unchanged when generalized to arbitrary manifolds. Since this is a real
tensor, we can raise indices, etc.
One ﬁnal appearance of tensor densities is in integration on manifolds. In ordinary
calculus on Rn
the volume element dn
x picks up a factor of the Jacobian under
change of coordinates:
dn
x =
∂xµ
∂xµ
dn
x . (2.30)
There is actually a beautiful explanation of this formula from the point of view of
diﬀerential forms, which arises from the following fact: on an n-dimensional manifold,
the integrand is properly understood as an n-form. To see how this works, we
should make the identiﬁcation
dn
x ↔ dx0
∧ · · · ∧ dxn−1
. (2.31)
The expression on the right hand side can be misleading, because it looks like a tensor
(an n-form, actually) but is really a density. Certainly if we have two functions f
and g on M, then df and dg are one-forms, and df ∧ dg is a two-form. To see this
21
let us see how (2.31) changes under coordinate transformations. First notice that
the deﬁnition of the wedge product allows us to write
dx0
∧ · · · ∧ dxn−1
=
1
n!
˜µ1···µn dxµ1
∧ · · · ∧ dxµn
, (2.32)
since both the wedge product and the Levi-Civita symbol are completely antisymmetric.
Under a coordinate transformation ˜µ1···µn stays the same while the one-forms
change according to their transformation rules leading to
˜µ1···µn dxµ1
∧ · · · ∧ dxµn
= ˜µ1···µn
∂xµ1
∂xµ1
· · ·
∂xµn
∂xµn
dxµ1 ∧ · · · ∧ dxµn
=
∂xµ
∂xµ
˜µ1···µn
dxµ1 ∧ · · · ∧ dxµn . (2.33)
Multiplying by the Jacobian on both sides recovers (2.30).
It is clear that the naive volume element dn
x transforms as a density, not a tensor,
but it is straightforward to construct an invariant volume element by multiplying by
|g|:
|g | dx0
∧ · · · ∧ dx(n−1)
= |g| dx0
∧ · · · ∧ dxn−1
, (2.34)
which is of course just (n!)−1
µ1···µn dxµ1
∧· · ·∧ dxµn
. In the interest of simplicity we
will usually write the volume element as |g| dn
x, rather than as the explicit wedge
product |g| dx0
∧ · · · ∧ dxn−1
; it will be enough to keep in mind that it’s supposed
to be an n-form.
We ﬁnish this section with the introduction Stokes’s theorem. Imagine that we
have an n-manifold M with boundary ∂M, and an (n−1)-form ω on M. (We haven’t
discussed manifolds with boundaries, but the idea is obvious; M could for instance
be the interior of an (n − 1)-dimensional closed surface ∂M.) Then dω is an n-form,
which can be integrated over M, while ω itself can be integrated over ∂M. Stokes’s
theorem is then
M
dω =
∂M
ω . (2.35)
You can convince yourself that diﬀerent special cases of this theorem include not
only the fundamental theorem of calculus, but also the theorems of Green, Gauss,
and Stokes, familiar from vector calculus in three dimensions.
Now we introduce a few extra mathematical techniques. Let us now discuss the
problem how diﬀerent maps between two manifolds M and N carry along tensor
ﬁelds from one manifold to another. We therefore consider two manifolds M and N,
possibly of diﬀerent dimension, with coordinate systems xµ
and yα
, respectively. We
imagine that we have a map φ : M → N and a function f : N → R.
It is obvious that we can compose φ with f to construct a map (f ◦ φ) : M → R,
which is simply a function on M. Such a construction is suﬃciently useful that it
gets its own name; we deﬁne the pullback of f by φ, denoted φ∗f, by
φ∗f = (f ◦ φ) . (2.36)
22
M
x
f = f
f
φ
R
R
R
m n
µ yα
N
*
φ φ
The name makes sense, since we think of φ∗ as “pulling back” the function f from
N to M.
We can pull functions back, but we cannot push them forward. If we have a
function g : M → R, there is no way we can compose g with φ to create a function
on N; the arrows don’t ﬁt together correctly. But recall that a vector can be thought
of as a derivative operator that maps smooth functions to real numbers. This allows
us to deﬁne the pushforward of a vector; if V (p) is a vector at a point p on M, we
deﬁne the pushforward vector φ∗
V at the point φ(p) on N by giving its action on
functions on N:
(φ∗
V )(f) = V (φ∗f) . (2.37)
So to push forward a vector ﬁeld we say “the action of φ∗
V on any function is simply
the action of V on the pullback of that function.”
Let us now give more concrete description. We know that a basis for vectors on
M is given by the set of partial derivatives ∂µ = ∂
∂xµ , and a basis on N is given by the
set of partial derivatives ∂α = ∂
∂yα . Therefore we would like to relate the components
of V = V µ
∂µ to those of (φ∗
V ) = (φ∗
V )α
∂α. We can ﬁnd the sought-after relation
by applying the pushed-forward vector to a test function and using the chain rule
(2.3):
(φ∗
V )α
∂αf = V µ
∂µ(φ∗f) = V µ
∂µ(f ◦ φ) = V µ ∂yα
∂xµ
∂αf .
(2.38)
This result shows that the pushforward operation φ∗
can be considered as a matrix
operator, (φ∗
V )α
= (φ∗
)α
µV µ
, with the matrix being given by
(φ∗
)α
µ =
∂yα
∂xµ
. (2.39)
23
The behavior of a vector under a pushforward has similar form as the vector transformation
law under change of coordinates. However generally, when µ and α have
diﬀerent allowed values, then there is no reason for the matrix ∂yα
/∂xµ
to be invert-
ible.
Let us now discuss transformation properties of one-forms. Since one-forms are
dual to vectors then it is natural to expect that one-forms can be pulled back (but
not in general pushed forward). In fact, we remember that one-forms are linear
maps from vectors to the real numbers. The pullback φ∗ω of a one-form ω on N can
therefore be deﬁned by its action on a vector V on M, by equating it with the action
of ω itself on the pushforward of V :
(φ∗ω)(V ) = ω(φ∗
V ) . (2.40)
Once again, there is a matrix description of the pullback operator on forms, (φ∗ω)µ =
(φ∗)µ
α
ωα, which we can derive using the chain rule. It is given by
(φ∗)µ
α
=
∂yα
∂xµ
. (2.41)
That is, it is the same matrix as the pushforward (2.39).
Let us now consider (0, l) tensor — one with l lower indices and no upper ones.
Recall that it is a linear map from the direct product of l vectors to R. Then it
is natural to pull back not only one-forms, but tensors with an arbitrary number
of lower indices. The deﬁnition is simply the action of the original tensor on the
pushed-forward vectors:
(φ∗T)(V (1)
, V (2)
, . . . , V (l)
) = T(φ∗
V (1)
, φ∗
V (2)
, . . . , φ∗
V (l)
) , (2.42)
where Tα1···αl
is a (0, l) tensor on N. We can similarly push forward any (k, 0) tensor
Sµ1···µk by acting it on pulled-back one-forms:
(φ∗
S)(ω(1)
, ω(2)
, . . . , ω(k)
) = S(φ∗ω(1)
, φ∗ω(2)
, . . . , φ∗ω(k)
) . (2.43)
Fortunately, the matrix representations of the pushforward (2.39) and pullback (2.41)
extend to the higher-rank tensors simply by assigning one matrix to each index; thus,
for the pullback of a (0, l) tensor, we have
(φ∗T)µ1···µl
=
∂yα1
∂xµ1
· · ·
∂yαl
∂xµl
Tα1···αl
, (2.44)
while for the pushforward of a (k, 0) tensor we have
(φ∗
S)α1···αk
=
∂yα1
∂xµ1
· · ·
∂yαk
∂xµk
Sµ1···µk
. (2.45)
Our complete picture is therefore:
24
φ*
φ*
( )
( )k
0 ( )k
0
l l
0
( )0
φ
NM
It is important to stress that tensors with both upper and lower indices can generally
be neither pushed forward nor pulled back. On the other hand it is important
to stress that if φ is invertible (and both φ and φ−1
are smooth, which we always
implicitly assume), then it deﬁnes a diﬀeomorphism between M and N. In this case
M and N are the same abstract manifold. The of diﬀeomorphism is that we can
use both φ and φ−1
to move tensors from M to N; this will allow us to deﬁne the
pushforward and pullback of arbitrary tensors. Speciﬁcally, for a (k, l) tensor ﬁeld
Tµ1···µk
ν1···µl
on M, we deﬁne the pushforward by
(φ∗
T)(ω(1)
, . . . , ω(k)
, V (1)
, . . . , V (l)
) = T(φ∗ω(1)
, . . . , φ∗ω(k)
, [φ−1
]∗
V (1)
, . . . , [φ−1
]∗
V (l)
) ,
(2.46)
where the ω(i)
’s are one-forms on N and the V (i)
’s are vectors on N. In components
this becomes
(φ∗
T)α1···αk
β1···βl
=
∂yα1
∂xµ1
· · ·
∂yαk
∂xµk
∂xν1
∂yβ1
· · ·
∂xνl
∂yβl
Tµ1···µk
ν1···νl
. (2.47)
Since φ is invertible the inverse matrix ∂xν
∂yβ is well deﬁned.
Now we are ready to explain the relationship between diﬀeomorphism and coordinate
transformations. We can interpret diﬀeomorphism as “active coordinate
transformations”, while traditional coordinate transformations as “passive.”
Since a diﬀeomorphism allows us to pull back and push forward arbitrary tensors,
it provides another way of comparing tensors at diﬀerent points on a manifold.
Let us consider a diﬀeomorphism φ : M → M and a tensor ﬁeld Tµ1···µk
ν1···µl
(x),
we can form the diﬀerence between the value of the tensor at some point p and
φ∗[Tµ1···µk
ν1···µl
(φ(p))], its value at φ(p) pulled back to p. This fact suggests that we
could deﬁne another kind of derivative operator on tensor ﬁelds that characterized
the rate of change of the tensor as it changes under the diﬀeomorphism. In order
to do this we have to introduce a one-parameter family of diﬀeomorphism, φt. This
family can be thought of as a smooth map R × M → M, such that for each t ∈ R
25
φt is a diﬀeomorphism and φs ◦ φt = φs+t. Note that this last condition implies that
φ0 is the identity map.
It can be show that one-parameter families of diﬀeomorphisms arises from vector
ﬁelds (and vice-versa). Let us study what happens to the point p under the entire
family φt. In fact it is clear that it describes a curve in M. Further the same thing
will be true of every point on M and these curves ﬁll the manifold (although there
can be degeneracies where the diﬀeomorphisms have ﬁxed points). Then we deﬁne
a vector ﬁeld V µ
(x) as the set of tangent vectors to each of these curves at every
point, evaluated at t = 0.
We can proceed in opposite direction. We can deﬁne a one-parameter family
of diﬀeomorphisms from any vector ﬁeld. Given a vector ﬁeld V µ
(x), we deﬁne the
integral curves of the vector ﬁeld to be those curves xµ
(t) which solve
dxµ
dt
= V µ
. (2.48)
This equation should be interpreted in the opposite sense from our usual way —
we are given the vectors, from which we deﬁne the curves. The diﬀeomorphisms φt
represents “ﬂow down the integral curves,” and the associated vector ﬁeld is deﬁned
as the generator of the diﬀeomorphism.
Let us consider vector ﬁeld V µ
(x) so that we have a family of diﬀeomorphisms
parameterized by t. Then we can ask the question how fast a tensor changes as we
travel down the integral curves. For each t we can deﬁne this change as
∆tTµ1···µk
ν1···µl
(p) = φt∗[Tµ1···µk
ν1···µl
(φt(p))] − Tµ1···µk
ν1···µl
(p) . (2.49)
Note that both terms on the right hand side are tensors at p.
T[ (p)]φt
(p)
p
[T( (p))]φt tφ
*
T(p)
x (t)µ
φt
M
We then deﬁne the Lie derivative of the tensor along the vector ﬁeld as
£V Tµ1···µk
ν1···µl
= lim
t→0
∆tTµ1···µk
ν1···µl
t
. (2.50)
26
The Lie derivative is a map from (k, l) tensor ﬁelds to (k, l) tensor ﬁelds, which is
manifestly independent of coordinates. It is clear this derivative is linear,
£V (aT + bS) = a£V T + b£V S , (2.51)
and obeys the Leibniz rule,
£V (T ⊗ S) = (£V T) ⊗ S + T ⊗ (£V S) , (2.52)
where S and T are tensors and a and b are constants. It is important to stress that
deﬁnition of Lie derivative does not depend on the metric structure of the manifold.
For functions it reduces to the ordinary derivative on functions,
£V f = V (f) = V µ
∂µf . (2.53)
It can be shown that in components the Lie derivative takes the form
£V Uµ
= [V, U]µ
. (2.54)
where
[V, U]µ
= V ν
∂νUµ
− Uν
∂νV µ
(2.55)
From this deﬁnition we immediately see that £V S = −£W V . It is because of (5.27)
that the commutator is sometimes called the “Lie bracket.”
To derive the action of £V on a one-form ωµ, begin by considering the action
on the scalar ωµUµ
for an arbitrary vector ﬁeld Uµ
. First use the fact that the Lie
derivative with respect to a vector ﬁeld reduces to the action of the vector itself when
applied to a scalar:
£V (ωµUµ
) = V (ωµUµ
)
= V ν
∂ν(ωµUµ
)
= V ν
(∂νωµ)Uµ
+ V ν
ωµ(∂νUµ
) . (2.56)
Then use the Leibniz rule on the original scalar:
£V (ωµUµ
) = (£V ω)µUµ
+ ωµ(£V U)µ
= (£V ω)µUµ
+ ωµV ν
∂νUµ
− ωµUν
∂νV µ
. (2.57)
Setting these expressions equal to each other and requiring that equality hold for
arbitrary Uµ
, we see that
£V ωµ = V ν
∂νωµ + (∂µV ν
)ων . (2.58)
By a similar procedure we can deﬁne the Lie derivative of an arbitrary tensor
ﬁeld. The answer can be written
£V Tµ1µ2···µk
ν1ν2···νl
= V σ
∂σTµ1µ2···µk
ν1ν2···νl
27
−(∂λV µ1
)Tλµ2···µk
ν1ν2···νl
− (∂λV µ2
)Tµ1λ···µk
ν1ν2···νl
− · · ·
+(∂ν1 V λ
)Tµ1µ2···µk
λν2···νl
+ (∂ν2 V λ
)Tµ1µ2···µk
ν1λ···νl
+ · · · .(2.59)
Let us now discuss another aspects of the diﬀeomorphism.
There is one more use to which we will put the machinery we have set up in this
section: symmetries of tensors. We say that a diﬀeomorphism φ is a symmetry of
some tensor T if the tensor is invariant after being pulled back under φ:
φ∗T = T . (2.60)
It is important to stress that even if discrete symmetry (invariance under reﬂections)
exists it is more common to have a one-parameter family of symmetries φt. If the
family is generated by a vector ﬁeld V µ
(x), then (5.39) leads to
£V T = 0 . (2.61)
It can be shown that if T is symmetric under some one-parameter family of diﬀeomorphisms,
it is possible to ﬁnd a coordinate system in which the components of T
are all independent of one of the coordinates (the integral curve coordinate of the
vector ﬁeld).
The most important symmetries are those of the metric, for which
φ∗gµν = gµν . (2.62)
A diﬀeomorphism of this type is called an isometry. If a one-parameter family
of isometries is generated by a vector ﬁeld V µ
(x), then V µ
is known as a Killing
vector ﬁeld. The condition that V µ
be a Killing vector is thus
£V gµν = 0 , (2.63)
or using the relation between Lie derivative and connection
(µVν) = 0 . (2.64)
This last version is Killing’s equation. Then if a space-time has a Killing vector,
then we can ﬁnd a coordinate system in which the metric is independent of one of
the coordinates.
It is important to stress that Killing vectors imply conserved quantities associated
with the motion of free particles. To see this let us consider a motion of a free particle.
Then xµ
(λ) is a geodesic with tangent vector Uµ
= dxµ
dλ
. Let us denote V µ
as a Killing
vector. Then we have
Uν
ν(VµUµ
) = Uν
Uµ
νVµ + VµUν
νUµ
= 0 , (2.65)
28
where the ﬁrst term vanishes from Killing’s equation and the second from the fact
that xµ
(λ) is a geodesic. Consequently the quantity VµUµ
is conserved along the
particle’s world-line. More physically, by deﬁnition the metric is unchanging along
the direction of the Killing vector. Then we can say that a free particle will not feel
any “forces” in this direction, and the component of its momentum in that direction
will consequently be conserved.
For next purposes we also give a deﬁnition of the concept of a space with maximal
symmetry. The maximally symmetric space is one which possesses the largest
possible number of Killing vectors, which on an n-dimensional manifold is n(n+1)/2.
It is possible that it may not be simple to actually solve Killing’s equation in
any given space-time. On the other hand it is frequently possible to write down
some Killing vectors by inspection. For example in R2
with metric ds2
= dx2
+ dy2
,
independence of the metric components with respect to x and y immediately yields
two Killing vectors:
Xµ
= (1, 0) ,
Y µ
= (0, 1) . (2.66)
These clearly represent the two translations. The one rotation would correspond to
the vector R = ∂/∂θ if we were in polar coordinates; in Cartesian coordinates this
becomes
Rµ
= (−y, x) . (2.67)
3. Curvature
We know that we can deﬁne functions, take their derivatives, consider parameterized
paths, set up tensors, and so on any manifolds. On the other hand other concepts,
such as the volume of a region or the length of a path, required some additional
piece of structure, namely the introduction of a metric. Then we can show how the
existence of a metric implies a certain connection, whose curvature may be thought
of as that of the metric.
The connection has to be deﬁned when we attempt to address the problem that
the partial derivative is not a good tensor operator. Our goal is to introduce the
covariant derivative; that is, an operator which reduces to the partial derivative in
ﬂat space with Cartesian coordinates, but transforms as a tensor on an arbitrary
manifold.
In ﬂat space in Cartesian coordinates, the partial derivative operator ∂µ is a map
from (k, l) tensor ﬁelds to (k, l+1) tensor ﬁelds, which acts linearly on its arguments
and obeys the Leibniz rule on tensor products. All of this continues to be true in
the more general situation we would now like to consider, but the map provided by
the partial derivative depends on the coordinate system used. We would therefore
29
like to deﬁne a covariant derivative operator to perform the functions of the
partial derivative, but in a way independent of coordinates. We therefore require
that be a map from (k, l) tensor ﬁelds to (k, l + 1) tensor ﬁelds which has these
two properties:
1. linearity: (T + S) = T + S ;
2. Leibniz (product) rule: (T ⊗ S) = ( T) ⊗ S + T ⊗ ( S) .
It can be shown that takes the form
µV ν
= ∂µV ν
+ Γν
µλV λ
, (3.1)
where (Γµ)ρ
σ (an n × n matrix known as connection coeﬃcients, where n is the
dimensionality of the manifold, for each µ). Further, we also demand that µV ν
transform as (1, 1) tensor
µ V ν
=
∂xµ
∂xµ
∂xν
∂xν µV ν
. (3.2)
This requirement implies following transformation rules for metric coeﬃcients
Γν
µ λ =
∂xµ
∂xµ
∂xλ
∂xλ
∂xν
∂xν
Γν
µλ −
∂xµ
∂xµ
∂xλ
∂xλ
∂2
xν
∂xµ∂xλ
. (3.3)
which of course is not the tensor transformation law; the second term on the right
spoils it. The covariant derivative of a one-form takes the form
µων = ∂µων − Γλ
µνωλ . (3.4)
Then it is clear that the connection coeﬃcients encode all of the information necessary
to take the covariant derivative of a tensor of arbitrary rank. The formula is quite
straightforward; for each upper index you introduce a term with a single +Γ, and
for each lower index a term with a single −Γ:
σTµ1µ2···µk
ν1ν2···νl
= ∂σTµ1µ2···µk
ν1ν2···νl
+Γµ1
σλ Tλµ2···µk
ν1ν2···νl
+ Γµ2
σλ Tµ1λ···µk
ν1ν2···νl
+ · · ·
−Γλ
σν1
Tµ1µ2···µk
λν2···νl
− Γλ
σν2
Tµ1µ2···µk
ν1λ···νl
− · · · . (3.5)
This is the general expression for the covariant derivative.
It turns out that in order to deﬁne unique connection on a manifold with a metric
gµν we have to introduce two additional properties:
• torsion-free: Γλ
µν = Γλ
(µν).
• metric compatibility: ρgµν = 0.
30
This requirement implies that we can express connection coeﬃcients as functions of
metric
Γσ
µν =
1
2
gσρ
(∂µgνρ + ∂νgρµ − ∂ρgµν) . (3.6)
This connection we have derived from the metric is the one on which conventional
general relativity is based . It is known as Christoﬀel connection or as the LeviCivita
connection.
Let us mention once again that the exterior derivative is a well-deﬁned tensor in
the absence of any connection. Then if we use a symmetric (torsion-free) connection,
the exterior derivative (deﬁned to be the antisymmetrized partial derivative) happens
to be equal to the antisymmetrized covariant derivative:
[µων] = ∂[µων] − Γλ
[µν]ωλ
= ∂[µων] . (3.7)
Now we deﬁne the notion of parallel transport that corresponds to the motion of a
vector along a path, keeping constant all the while. The parallel transport is deﬁned
whenever we have a connection. In fact the crucial diﬀerence between ﬂat and curved
spaces is that, in a curved space, the result of parallel transporting a vector from one
point to another will depend on the path taken between the points. More precisely, let
us have a curve xµ
(λ) and deﬁne the covariant derivative along the path to be given
by an operator
D
dλ
=
dxµ
dλ
µ . (3.8)
We then deﬁne parallel transport of the tensor T along the path xµ
(λ) to be the
requirement that, along the path,
D
dλ
T
µ1µ2···µk
ν1ν2···νl
≡
dxσ
dλ
σTµ1µ2···µk
ν1ν2···νl
= 0 . (3.9)
This is a well-deﬁned tensor equation, since both the tangent vector dxµ
/dλ and the
covariant derivative T are tensors. This is known as the equation of parallel
transport. For a vector it takes the form
d
dλ
V µ
+ Γµ
σρ
dxσ
dλ
V ρ
= 0 . (3.10)
It is clear that the notion of parallel transport depends on the connection, and
diﬀerent connections lead to diﬀerent answers. On the other hand since we consider
the connection that is metric-compatible we obtain that the metric is always parallel
transported
D
dλ
gµν =
dxσ
dλ
σgµν = 0 . (3.11)
Then we show that the inner product of two parallel-transported vectors is preserved.
In fact, if V µ
and Wν
are parallel-transported along a curve xσ
(λ), we have
D
dλ
(gµνV µ
Wν
) =
D
dλ
gµν V µ
Wν
+ gµν
D
dλ
V µ
Wν
+ gµνV µ D
dλ
Wν
31
= 0 . (3.12)
This means that parallel transport with respect to a metric-compatible connection
preserves the norm of vectors, the sense of orthogonality, and so on.
Now we are going to discuss the geodesics. To begin with we recall that the
tangent vector to a path xµ
(λ) is dxµ
dλ
. The condition that it be parallel transported
is thus
D
dλ
dxµ
dλ
= 0 , (3.13)
or alternatively
d2
xµ
dλ2
+ Γµ
ρσ
dxρ
dλ
dxσ
dλ
= 0 . (3.14)
This is the familiar geodesic equation.
It is important to stress that geodesics in general relativity are the paths followed
by unaccelerated particles. To see this note that we can think about the geodesic
equation as the generalization of Newton’s law f = ma for the case f = 0. In fact it
can be shown that the equation of motion for a particle of mass m and charge q in
general relativity takes the form
d2
xµ
dτ2
+ Γµ
ρσ
dxρ
dτ
dxσ
dτ
=
q
m
Fµ
ν
dxν
dτ
. (3.15)
An important property of geodesics in a spacetime with Lorentzian metric is
that the character (timelike/null/spacelike) of the geodesic (relative to a metriccompatible
connection) never changes. This follows from the fact that parallel transport
preserves inner products. Consequently the character of the curve does not
change since it is determined by the inner product of the tangent vector with itself.
There are also null geodesics, which satisfy the same equation, except that the
proper time cannot be used as a parameter (some set of allowed parameters will
exist, related to each other by linear transformations).
We can also ﬁnd some interesting application of geodetics. Namely the geodetics
can be used for mapping the tangent space at a point p to a local neighborhood of
p. To begin with notice that any geodesic xµ
(λ) which passes through point p can
be speciﬁed by its behavior at p. We parameterize the geodetics with parameter λ
and choose to be equal to 0 for λ(p) = 0. Then the tangent vector at p is
dxµ
dλ
(λ = 0) = kµ
, (3.16)
where kµ
some vector at p (some element of Tp). Then there will be a point on the
manifold M which lies on this geodesic where the parameter has the value λ = 1.
We deﬁne the exponential map at p, expp : Tp → M, via
expp(kµ
) = xν
(λ = 1) , (3.17)
32
M
x ( )
k
T
p
µ
p
λ
λ=1
ν
where xν
(λ) solves the geodesic equation subject to (3.60). For tangent vectors kµ
near the zero vector, this map will be well-deﬁned, and in fact invertible. Then in the
neighborhood of p that is determined by validity of the map on this set of tangent
vectors we can say that the tangent vectors themselves deﬁne a coordinate system
on the manifold. In this coordinate system, any geodesic through p is expressed as
xµ
(λ) = λkµ
, (3.18)
for some appropriate vector kµ
. Now we will study curvature that is related to the
Riemann tensor. that is (1, 3) that is antisymmetric in the last two indices:
Rρ
σµν = −Rρ
σνµ . (3.19)
We can show that the Riemann tensor is related to covariant derivatives. In fact, let
us consider a vector ﬁeld V ρ
, we take
[ µ, ν]V ρ
= µ νV ρ
− ν µV ρ
= ∂µ( νV ρ
) − Γλ
µν λV ρ
+ Γρ
µσ νV σ
− (µ ↔ ν)
= ∂µ∂νV ρ
+ (∂µΓρ
νσ)V σ
+ Γρ
νσ∂µV σ
− Γλ
µν∂λV ρ
− Γλ
µνΓρ
λσV σ
+Γρ
µσ∂νV σ
+ Γρ
µσΓσ
νλV λ
− (µ ↔ ν)
= (∂µΓρ
νσ − ∂νΓρ
µσ + Γρ
µλΓλ
νσ − Γρ
νλΓλ
µσ)V σ
− 2Γλ
[µν] λV ρ
. (3.20)
The last term is simply the torsion tensor and hence we write
[ µ, ν]V ρ
= Rρ
σµνV σ
− Tµν
λ
λV ρ
, (3.21)
where the Riemann and torsion tensors are identiﬁed as
Rρ
σµν = ∂µΓρ
νσ − ∂νΓρ
µσ + Γρ
µλΓλ
νσ − Γρ
νλΓλ
µσ . (3.22)
and
T λ
µν = Γλ
µν − Γλ
νµ (3.23)
Since in GR we are mainly interested in Christoﬀel connection. In this case the
connection is derived from the metric and consequently the associated curvature
33
may be thought of as that of the metric itself. Let us try to show following way. If
we are in some coordinate system such that ∂σgµν = 0 (everywhere, not just at a
point), then Γρ
µν = 0 and ∂σΓρ
µν = 0; thus Rρ
σµν = 0 by (3.22). However since this is
a tensor equation that implies that if it is true in one coordinate system it must be
true in any coordinate system. Therefore, the statement that the Riemann tensor
vanishes is a necessary condition for it to be possible to ﬁnd coordinates in which the
components of gµν are constant everywhere. Note that the Riemann tensor obeys
Bianchi identity:
[λRρσ]µν = 0 . (3.24)
This identity is closely related to the Jacobi identity, since it basically expresses
[[ λ, ρ], σ] + [[ ρ, σ], λ] + [[ σ, λ], ρ] = 0 . (3.25)
It is frequently useful to consider contractions of the Riemann tensor. Firstly we
form contraction known as the Ricci tensor:
Rµν = Rλ
µλν . (3.26)
The Ricci tensor associated with the Christoﬀel connection is symmetric,
Rµν = Rνµ , (3.27)
as a consequence of the various symmetries of the Riemann tensor. Using the metric,
we can take a further contraction to form the Ricci scalar:
R = Rµ
µ = gµν
Rµν . (3.28)
Another identity related to Riemann tensor is
µ
Rρµ =
1
2
ρR . (3.29)
Let us deﬁne the Einstein tensor as
Gµν = Rµν −
1
2
Rgµν , (3.30)
that obeys
µ
Gµν = 0 . (3.31)
The Einstein tensor is very important in general relativity.
Let us illustrate the main ideas and notation on a simple example that is twosphere,
with metric
ds2
= a2
(dθ2
+ sin2
θ dφ2
) , (3.32)
where a is the radius of the sphere (thought of as embedded in R3
). It is simple
example to calculate the connection coeﬃcients from the metric above and we obtain
Γθ
φφ = − sin θ cos θ
34
Γφ
θφ = Γφ
φθ = cot θ . (3.33)
and also
Rθ
φθφ = ∂θΓθ
φφ − ∂φΓθ
θφ + Γθ
θλΓλ
φφ − Γθ
φλΓλ
θφ
= (sin2
θ − cos2
θ) − (0) + (0) − (− sin θ cos θ)(cot θ)
= sin2
θ . (3.34)
It is easy to check that all of the components of the Riemann tensor either vanish
or are related to this one by symmetry. We can compute the Ricci tensor and we
obtain
Rθθ = gφφ
Rφθφθ = 1
Rθφ = Rφθ = 0
Rφφ = gθθ
Rθφθφ = sin2
θ . (3.35)
Consequently the Ricci scalar is equal to
R = gθθ
Rθθ + gφφ
Rφφ =
2
a2
. (3.36)
We see that the Ricci scalar that for a two-dimensional manifold completely characterizes
the curvature, is a constant over this two-sphere. This follows from the fact
that two-sphere is “maximally symmetric,” manifold 2
. In any number of dimensions
the curvature of a maximally symmetric space satisﬁes (for some constant a)
Rρσµν = a−2
(gρµgσν − gρνgσµ) . (3.37)
Two-sphere is an example of “positively curved” space-time where the Ricci scalar is
positive. We can demonstrate this notation on following example where we can also
ﬁnd meaning of the negative curved space-time.
2
We give precise deﬁnition of this notion letter.
35
positive curvature
negative curvature
4. Gravitation
The main idea of the General theory of relativity is that the spacetime should be
described as a curved manifold. In other words the famous Einstein idea is that
gravity is a manifestation of spacetime curvature.
Let us again introduce the Einstein tensor
Gµν = Rµν −
1
2
Rgµν , (4.1)
which always obeys µ
Gµν = 0. Then the Einstein equation takes the form
Gµν = κTµν (4.2)
Note that the right-hand side is a covariant expression of the energy and momentum
density in the form of a symmetric and conserved (0, 2) tensor, while the left-hand
side is a symmetric and conserved (0, 2) tensor constructed from the metric and its
ﬁrst and second derivatives.
Let us now contract both sides of (4.2) and we obtain (in four dimensions)
R = −κT , (4.3)
and using this we can rewrite (4.2) as
Rµν = κ(Tµν −
1
2
Tgµν) . (4.4)
This is the same equation, just written slightly diﬀerently.
Einstein’s equations may be thought of as second-order diﬀerential equations
for the metric tensor ﬁeld gµν. There are ten independent equations (since both
sides are symmetric two-index tensors), which seems to be exactly right for the
ten unknown functions of the metric components. However, the Bianchi identity
µ
Gµν = 0 represents four constraints on the functions Rµν, so there are only six
36
truly independent equations in (4.4). In fact this is appropriate, since if a metric is a
solution to Einstein’s equation in one coordinate system xµ
it should also be a solution
in any other coordinate system xµ
. This means that there are four unphysical degrees
of freedom in gµν (represented by the four functions xµ
(xµ
)), and we should expect
that Einstein’s equations only constrain the six coordinate-independent degrees of
freedom.
It is important to stress that as diﬀerential equations, these are extremely complicated;
the Ricci scalar and tensor are contractions of the Riemann tensor, which
involves derivatives and products of the Christoﬀel symbols, which in turn involve
the inverse metric and derivatives of the metric. Furthermore, the energy-momentum
tensor Tµν will generally involve the metric as well. The equations are also nonlinear,
that implies that two known solutions cannot be superposed to ﬁnd a third. It is
therefore very diﬃcult to solve Einstein’s equations in any sort of generality. Then
in order to solve them we have to perform some simplifying assumptions. The most
popular sort of simplifying assumption is that the metric has a signiﬁcant degree of
symmetry, and we will talk later on about how symmetries of the metric make life
easier.
Now we demonstrate how Einstein’s equations can be derived from an action
principle. The action should be the integral over spacetime of a Lagrange density
(“Lagrangian” for short, although strictly speaking the Lagrangian is the integral
over space of the Lagrange density):
SH = dn
xLH . (4.5)
The Lagrange density is a tensor density, which can be written as
√
−g times a
scalar. What scalars can we make out of the metric? Since we know that the metric
can be set equal to its canonical form and its ﬁrst derivatives set to zero at any one
point, any nontrivial scalar must involve at least second derivatives of the metric.
The Riemann tensor is of course made from second derivatives of the metric, and we
argued earlier that the only independent scalar we could construct from the Riemann
tensor was the Ricci scalar R. What we did not show, but is nevertheless true, is
that any nontrivial tensor made from the metric and its ﬁrst and second derivatives
can be expressed in terms of the metric and the Riemann tensor. Therefore, the
only independent scalar constructed from the metric, which is no higher than second
order in its derivatives, is the Ricci scalar. Hilbert ﬁgured that this was therefore
the simplest possible choice for a Lagrangian, and proposed
LH =
√
−gR . (4.6)
The equations of motion should come from varying the action with respect to the
metric. In fact let us consider variations with respect to the inverse metric gµν
,
which are slightly easier but give an equivalent set of equations. Using R = gµν
Rµν,
37
in general we will have
δS = dn
x
√
−ggµν
δRµν +
√
−gRµνδgµν
+ Rδ
√
−g
= (δS)1 + (δS)2 + (δS)3 . (4.7)
The second term (δS)2 is already in the form of some expression times δgµν
; let’s
examine the others more closely.
Recall that the Ricci tensor is the contraction of the Riemann tensor, which is
given by
Rρ
µλν = ∂λΓλ
νµ + Γρ
λσΓσ
νµ − (λ ↔ ν) . (4.8)
We perform the variation of the Riemann tensor in such a way that we ﬁrstly perform
variation of the connection coeﬃcients and then we substitute into this expression.
In fact, after some calculations we ﬁnd the variation of the Riemann tensor in the
form
δRρ
µλν = λ(δΓρ
νµ) − ν(δΓρ
λµ) . (4.9)
Therefore, the contribution of the ﬁrst term in (4.7) to δS can be written
(δS)1 = dn
x
√
−g gµν
λ(δΓλ
νµ) − ν(δΓλ
λµ)
= dn
x
√
−g σ gµσ
(δΓλ
λµ) − gµν
(δΓσ
µν) , (4.10)
where we have used metric compatibility. However the integral above is an integral
with respect to the natural volume element of the covariant divergence of a vector;
by Stokes’s theorem, this is equal to a boundary contribution at inﬁnity which we
can set to zero by making the variation vanish at inﬁnity. Therefore this term does
not contribute to the total variation.
In order to calculate the (δS)3 term we have to use the variation
δ(g−1
) =
1
g
gµνδgµν
. (4.11)
and consequently
δ
√
−g = −
1
2
√
−ggµνδgµν
. (4.12)
If we now return back to (4.7), and remembering that (δS)1 does not contribute, we
ﬁnd
δS = dn
x
√
−g Rµν −
1
2
Rgµν δgµν
. (4.13)
However this should vanish for arbitrary variations and consequently we derive Einstein’s
equations in vacuum:
1
√
−g
δS
δgµν
= Rµν −
1
2
Rgµν = 0 . (4.14)
38
However we would like to get the non-vacuum ﬁeld equations as well. In other words
we consider an action of the form
S =
1
8πG
SH + SM , (4.15)
where SM is the action for matter, and we have presciently normalized the gravitational
action (although the proper normalization is somewhat convention-dependent).
Following through the same procedure as above leads to
1
√
−g
δS
δgµν
=
1
8πG
Rµν −
1
2
Rgµν +
1
√
−g
δSM
δgµν
= 0 , (4.16)
and we recover Einstein’s equations if we set
Tµν = −
1
√
−g
δSM
δgµν
. (4.17)
In fact (4.17) turns out to be the best way to deﬁne a symmetric energy-momentum
tensor. We are mainly interested in the existence of solutions to Einstein’s equations
in the presence of “realistic” sources of energy and momentum. The most common
property that is demanded of Tµν is that it represent positive energy densities —
no negative masses are allowed. In a locally inertial frame this requirement can be
written as ρ = T00 ≥ 0. We write it in the coordinate-independent notation as
TµνV µ
V ν
≥ 0 , for all timelike vectors V µ
. (4.18)
This is known as the Weak Energy Condition, or WEC. It seems like a reasonable
requirement however it is very restrictive. Indeed it is straightforward to show that
there are many examples of the classical ﬁeld theories which violate the WEC, and
almost impossible to invent a quantum ﬁeld theory which obeys it. Nevertheless, it
is legitimate to assume that the WEC holds in most cases and it is violated in some
extreme conditions. (There are also stronger energy conditions, but they are even
less true than the WEC, and we won’t dwell on them.)
We continue with the study of the Einstein equations where we now discuss the
possibility of the introduction of a cosmological constant. In order to introduce it we
add it to the conventional Hilbert action. We therefore consider an action given by
S = dn
x
√
−g(R − 2Λ) , (4.19)
where Λ is some constant. The resulting ﬁeld equations are
Rµν −
1
2
Rgµν + Λgµν = 0 , (4.20)
and of course there would be an energy-momentum tensor on the right hand side if
we had included an action for matter. Λ is the cosmological constant. In order to
39
ﬁnd its meaning it is convenient to move the additional term in (4.20) to the right
hand side, and think of it as a kind of energy-momentum tensor, with Tµν = −Λgµν
(it is automatically conserved by metric compatibility). Then Λ can be interpreted
as the “energy density of the vacuum,” a source of energy and momentum that
is present even in the absence of matter ﬁelds. This interpretation is important
because quantum ﬁeld theory predicts that the vacuum should have some sort of
energy and momentum. In ordinary quantum mechanics, an harmonic oscillator with
frequency ω and minimum classical energy E0 = 0 upon quantization has a ground
state with energy E0 = 1
2
¯hω. A quantized ﬁeld can be thought of as a collection of
an inﬁnite number of harmonic oscillators, and each mode contributes to the ground
state energy. The result is of course inﬁnite, and must be appropriately regularized,
for example by introducing a cutoﬀ at high frequencies. The ﬁnal vacuum energy,
which is the regularized sum of the energies of the ground state oscillations of all the
ﬁelds of the theory, has no good reason to be zero and in fact would be expected to
have a natural scale
Λ ∼ m4
P , (4.21)
where the Planck mass mP is approximately 1019
GeV, or 10−5
grams. Observations
of the universe on large scales allow us to constrain the actual value of Λ, which turns
out to be smaller than (4.21) by at least a factor of 10120
. This is the largest known
discrepancy between theoretical estimate and observational constraint in physics,
and convinces many people that the “cosmological constant problem” is one of the
most important unsolved problems today. On the other hand the observations do
not tell us that Λ is strictly zero, and in fact allow values that can have important
consequences for the evolution of the universe.
40