Linear Models in Statistics I
Lecture notes
Andrea Kraus
Autumn 2018
Motivation Univariate normal distribution Multivariate normal distribution
1 Motivation
What we are doing and why
2 Univariate normal distribution
Definition
Properties
Related distributions
3 Multivariate normal distribution
Definition
Properties
Related distributions
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 0 / 36
Motivation Univariate normal distribution Multivariate normal distribution
1 Motivation
What we are doing and why
2 Univariate normal distribution
Definition
Properties
Related distributions
3 Multivariate normal distribution
Definition
Properties
Related distributions
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 0 / 36
Motivation Univariate normal distribution Multivariate normal distribution
What we are doing and why
Linear model
Yi = β0 + β1xi,1 + . . . + βkxi,k + εi , i ∈ {1, . . . , n}
Yi : outcome, response, output, dependent variable
• random variable, we observe a realization yi
• (odezva, z´avisle promˇenn´a, regresand)
xi,1, . . . , xi,k : covariates, predictors, explanatory variables,
input, independent variables
• given, known
• (nez´avisle promˇenn´e, regresory)
β0, . . . , βk : coefficients
• unknown, fixed, we want to estimate
• (regresn´ı koeficienty)
εi : random error
• random variable, unobserved
εi
iid
∼ (0, σ2), i ∈ {1, . . . , n}
E εi = 0: no systematic errors
Var εi = σ2
: same precision
we often assume that εi
iid
∼ N(0, σ2), i ∈ {1, . . . , n}
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 1 / 36
Motivation Univariate normal distribution Multivariate normal distribution
What we are doing and why
Linear model in matrix form
Yi = β0 + β1xi,1 + . . . + βkxi,k + εi , i ∈ {1, . . . , n}
matrix notation:




Y1
Y2
. . .
Yn




Y
=




1 x1,1 x1,2 . . . x1,k
1 x2,1 x2,2 . . . x2,k
. . . . . . . . . . . . . . .
1 xn,1 xn,2 . . . xn,k




X
×




β0
β1
. . .
βk




β
+




ε1
ε2
. . .
εn




ε
linear model in matrix form: Y = Xβ + ε, ε ∼ (0, σ2I)
and often ε ∼ N(0, σ2I)
X: design matrix
• (regresn´ı matice, matice pl´anu)
let p = k + 1
then Y
n×1
= X
n×p
β
p×1
+ ε
n×1
we assume that n > p (and often think about n → ∞, p fixed)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 2 / 36
Motivation Univariate normal distribution Multivariate normal distribution
What we are doing and why
Linear model for fev data
question: association between the FEV [l] and Smoking,
corrected for Age [years], Height [cm] and Gender
data:
FEV Age Height Gender Smoking
1.708 9 144.8 Female Non
1.724 8 171.5 Female Non
1.720 7 138.4 Female Non
1.558 9 134.6 Male Non
. . . . . . . . . . . . . . .
3.727 15 172.7 Male Current
2.853 18 152.4 Female Non
2.795 16 160.0 Female Current
3.211 15 168.9 Female Non
FEVi = β0 + β1 × Agei + β2 × Heighti + β3 × Genderi + β4 × Smokingi + εi
model: Y = Xβ + ε













1.708
1.724
1.720
1.558
. . .
3.727
2.853
2.795
3.211













=













1 9 144.8 0 0
1 8 171.5 0 0
1 7 138.4 0 0
1 9 134.6 1 0
. . . . . . . . . . . . . . .
1 15 172.7 1 1
1 18 152.4 0 0
1 16 160.0 0 1
1 15 168.9 0 0













×




β0
β1
. . .
β4



 +













ε1
ε2
ε3
ε4
. . .
ε651
ε652
ε653
ε654













Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 3 / 36
Motivation Univariate normal distribution Multivariate normal distribution
What we are doing and why
Linear model for bloodpress data
question: association between the mean arterial blood pressure and
age [years], weight [kg], body surface area [m2
], stress,
duration of hypertension [years], basal pulse [beats/min]
data:
MAP Age Weight BSA DoH Pulse Stress
105 47 85.4 1.75 5.1 63 33
115 49 94.2 2.10 3.8 70 14
. . . . . . . . . . . . . . . . . . . . .
110 48 90.5 1.88 9.0 71 99
122 56 95.7 2.09 7.0 75 99
MAPi = β0 + β1 × Agei + β2 × Weighti + β3 × BSAi +
+ β4 × DoHi + β5 × Pulsei + β6 × Stressi + εi
model: Y = Xβ + ε






105
115
. . .
110
122






=






1 47 85.4 1.75 5.1 63 33
1 49 94.2 2.10 3.8 70 14
. . . . . . . . . . . . . . . . . . . . .
1 48 90.5 1.88 9.0 71 99
1 56 95.7 2.09 7.0 75 99






×


β0
. . .
β6

 +






ε1
ε2
. . .
ε19
ε20






Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 4 / 36
Motivation Univariate normal distribution Multivariate normal distribution
What we are doing and why
Normal distribution in a linear model
model: Y = Xβ + ε
assumptions of the normal linear model:
X fixed and known
β fixed unknown
ε ∼ N(0, σ2
I)
⇒ Y ∼ N(Xβ, σ2
I)
estimators of β and σ2
functions of Y
test statistics concerning β and σ2
functions of Y
⇒ to make inference in normal linear model, we need to study
multivariate normal distribution N(µ, Σ)
distributions of functions of N(µ, Σ)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 5 / 36
Motivation Univariate normal distribution Multivariate normal distribution
1 Motivation
What we are doing and why
2 Univariate normal distribution
Definition
Properties
Related distributions
3 Multivariate normal distribution
Definition
Properties
Related distributions
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 5 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Definition
Normal distribution N(µ, σ2
)
let µ ∈ R and σ2 > 0
density f (x) = 1√
2πσ2
exp − 1
2σ2 (x − µ)2
for the standard normal distribution (µ = 0, σ2
= 1):
−4 −2 0 2 4
0.00.10.20.30.4
x
f(x)
if σ2 = 0 then X = µ a.s.
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 6 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Properties of N(µ, σ2
): µ ∈ R, σ2
> 0
Let X ∼ N(µ, σ2). Then E X = µ and Var X = σ2.
Let a, b ∈ R, X ∼ N(µ, σ2). Then aX + b ∼ N(aµ + b, a2σ2).
Let X ∼ N(µ, σ2
) and Z = 1
σ (X − µ). Then Z ∼ N(0, 1).
If X ∼ N(µ, σ2
), then X
d
= µ + σ Z, where Z ∼ N(0, 1).
−4 −2 0 2 4
0.00.10.20.30.4
x
f(x)
Let ai , bi ∈ R, Xi
ind.
∼ N(µi , σ2
i ) for i ∈ {1, . . . , n}.
Then n
i=1(ai Xi + bi ) ∼ N n
i=1(ai µi + bi ), i=1 a2
i σ2
i .
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 7 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Related distributions
χ2
(n) distribution
let Z ∼ N(0, 1) Z2 ∼ χ2(1)
let Zi
ind.
∼ N(0, 1) for i ∈ {1, . . . , n} X = n
i=1 Z2
i ∼ χ2(n)
density
0 10 20 30 40 50
0.00.40.81.2
χ2
(1)
0 10 20 30 40 50
0.000.050.100.15
χ2
(5)
0 10 20 30 40 50
0.000.040.08
χ2
(10)
0 10 20 30 40 50
0.000.020.040.06
χ2
(20)
E X = n, Var X = 2n
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 8 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Related distributions
Student’s t-distribution
let Z ∼ N(0, 1) and X ∼ χ2(n), Z ⊥⊥ X
T = Z√
X/n
∼ t(n)
density
−6 −4 −2 0 2 4 6
0.000.100.200.30
t(1)
−6 −4 −2 0 2 4 6
0.00.10.20.3
t(3)
−6 −4 −2 0 2 4 6
0.00.10.20.30.4
t(5)
E T = 0 for n > 1, Var T = n/(n − 2) for n > 2
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 9 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Related distributions
Fisher–Snedecor distribution
let X1 ∼ χ2(n1) and X2 ∼ χ2(n2), X1 ⊥⊥ X2
F = X1/n1
X2/n2
∼ F(n1, n2)
density
0 2 4 6 8 10
0.01.02.0
F(1, 5)
0 2 4 6 8 10
0.00.20.4
F(5, 1)
0 2 4 6 8 10
0.00.20.40.6
F(5, 5)
0 2 4 6 8 10
0.00.20.40.6
F(5, 10)
E F = n2/(n2 − 2) for n2 > 2
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 10 / 36
Motivation Univariate normal distribution Multivariate normal distribution
1 Motivation
What we are doing and why
2 Univariate normal distribution
Definition
Properties
Related distributions
3 Multivariate normal distribution
Definition
Properties
Related distributions
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 10 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Definition
Multivariate normal distribution N(µ, Σ)
µ ∈ Rn, Σ is an n × n positive semidefinite matrix
Definition
A random vector X : (Ω, A) → (Rn, B(Rn)) has multivariate
normal distribution N(µ, Σ) if and only if a X ∼ N(a µ, a Σ a)
for every a ∈ Rn.
if rank(Σ) = n then N(µ, Σ) is non-degenerate
has density
f (x) =
1
(2π)n det(Σ)
exp −
1
2
(x − µ) Σ−1
(x − µ)
if rank(Σ) = r < n then N(µ, Σ) is degenerate
a.s. “lives” in a subspace of Rn
of dimension r
no density w.r.t. Lebesgue measure on B(Rn
)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 11 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Properties of N(µ, Σ)
µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix
Theorem (MVN 1)
Let X ∼ N(µ, Σ). Then E X = µ and Var X = Σ.
Theorem (MVN 2)
Let Z1, . . . , Zn
iid
∼ N(0, 1) and Z = (Z1, . . . , Zn) . Then
Z ∼ N(0, I).
Theorem (MVN 3)
Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm.
Then AX + b ∼ N(Aµ + b, AΣA ).
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 12 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 1
let ei = (0, 0, . . . , 0, 1, 0, . . . , 0) be a vector with 1 on the ith
position and 0 elsewhere
by definition: ei X ∼ N(ei µ, ei Σ ei ),
which is Xi ∼ N(µi , σi,i )
so E Xi = µi and Var Xi = σi,i
let ei,j = (0, 0, . . . , 0, 1, 0, . . . , 0, 1, 0, . . . , 0) be a vector with
1 on the ith and the jth positions and 0 elsewhere
by definition: ei,j X ∼ N(ei,j µ, ei,j Σ ei,j ),
which is (Xi + Xj ) ∼ N{(µi + µj ), (σi,i + σi,j + σj,i + σj,j )}
so Var(Xi + Xj ) = σi,i + σi,j + σj,i + σj,j
but also Var(Xi + Xj ) = Var Xi + 2 Cov(Xi , Xj ) + Var Xj and
σi,j = σj,i
hence Cov(Xi , Xj ) = σi,j
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 13 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 2
we will verify that Z satisfies the definition of N(0, I)
let a ∈ Rn
then a Z =
n
i=1 ai Zi
recall that the sum of independent normals is a normal:
n
i=1
ai Zi ∼ N
n
i=1
ai × 0,
n
i=1
a2
i × 1 =
= N 0,
n
i=1
a2
i = N a 0, a I a
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 14 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 3
we have X
n×1
, A
m×n
and b
m×1
so Y = AX + b
m×1
we verify that the def. of N(Aµ + b, AΣA ) holds for Y:
let a ∈ Rm
a Y = a (AX + b) = a A X + a b
denote a = A a
n×1
X ∼ N(µ, Σ), so a X ∼ N(a µ, a Σ a),
which is a AX ∼ N(a A µ, a A Σ A a)
now, a b is a constant, so a b ∼ N(a b, 0) and
a b is independent of a AX
recall that sum of univariate normals is normal,
so a AX + a b ∼ N(a A µ + a b, a A Σ A a + 0)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 15 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Non-degenerate N(µ, Σ) seen through N(0, I)
µ ∈ Rn, Σ is an n × n symmetric positive definite matrix
rank(Σ) = n
spectral decomposition Σ = UΛU
λ1 ≥ λ2 ≥ . . . ≥ λn > 0
Σ = UΛU = UΛ1/2
Σ
Λ1/2
U = ΣΣ
Let Z = Σ
−1
(X − µ) = Λ−1/2
U (X − µ). Then Z ∼ N(0, I)
(n-dimensional).
MVN3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix
and b ∈ Rm
. Then AX + b ∼ N(Aµ + b, AΣA ).
Σ
−1
µ − Σ
−1
µ = 0
Σ
−1
Σ (Σ
−1
) = Λ−1/2
U UΛU U Λ−1/2
= I
If X ∼ N(µ, Σ), then X
d
= µ + ΣZ, where Z ∼ N(0, I)
(n-dimensional).
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 16 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Degenerate N(µ, Σ) seen through N(0, I)
µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix
suppose that rank(Σ) = r < n
spectral decomposition Σ = UΛU
λ1 ≥ λ2 ≥ . . . ≥ λr > 0, λr+1 = λr+2 = . . . = λn = 0
Σ = UΛU = Un×r
(u,1 | u,2 | ... | u,r )
Λr×r
diag{λ1,λ2,...,λr }
Un×r =
= Un×r Λ
1/2
r×r
Σ
Λ
1/2
r×r Un×r = ΣΣ
Let Z = Σ
+
(X − µ) = Λ
−1/2
r×r Un×r (X − µ). Then Z ∼ N(0, I)
(r-dimensional).
If X ∼ N(µ, Σ), then X
d
= µ + ΣZ, where Z ∼ N(0, I)
(r-dimensional).
X a.s. “lives” in a subspace of Rn
of dimension r
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 17 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Density of non-degenerate N(µ, Σ)
µ ∈ Rn, Σ is an n × n symmetric positive definite matrix
Theorem (MVN 4)
Let X ∼ N(µ, Σ) where rank(Σ) = n. Then X has density f (x)
w.r.t. Lebesgue measure on B(Rn) and
f (x) =
1
(2π)n det(Σ)
exp −
1
2
(x − µ) Σ−1
(x − µ) .
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 18 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 4
recall the density of Z ∼ N(0, 1): f (z) = 1√
2π
exp −1
2 z2
consider a random vector Z = (Z1, . . . , Zn) : Zi
i.i.d.
∼ N(0, 1)
by independence, the joint density of Z is
f (z) =
n
i=1
1
√
2π
exp −
1
2
z2
i =
1
(2π)n
exp −
1
2
n
i=1
z2
i
(recall that by MVN 2, Z ∼ N(0, I)
and note that f (z) = 1√
(2π)n det(I)
exp −1
2 z I−1z )
now if X ∼ N(µ, Σ), then X
d
= µ + Σ Z,
where Σ = U Λ1/2
and Σ = U Λ U ,
so the density of X can be derived from that of Z
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 19 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 4 ctd. (extract from last year)
2 y
12. Transformace náhodných vektorů
Vzorec fY (y) = fX h−1(y) dh−1(y)
dy lze pomocí věty o substituci v mnohorozměrných integrálech
jednoduše rozšířit i na vícerozměrný případ. Proto nejdříve připomeneme několik základních pojmů.
Mějme zobrazení h : Rn → Rn, kde h(x) = (h1(x), . . . , hn(x)). To znamená, že h1, . . . , hn jsou
funkce proměnných x1, . . . , xn.
Předpokládejme, že existují parciální derivace ∂hi(x1,...,xn)
∂xj
(i, j = 1, . . . , n). Matice těchto parciálních
derivací se nazývá Jacobiho matice.
Potom Jacobiho determinant (jakobián) je determinant Jacobiho matice
Dh(x) = det
∂ h
∂ x′
= det



∂h1
∂x1
· · · ∂h1
∂xn
...
...
∂hn
∂x1
· · · ∂hn
∂xn


 =
∂h1
∂x1
· · · ∂h1
∂xn
...
...
∂hn
∂x1
· · · ∂hn
∂xn
Označme nyní y = h(x), tj. y1 = h1(x), . . . , yn = hn(x) a připomeňme definici regulárního
zobrazení.56 M3121 Pravděpodobnost a statistika I
Definice 12.1. Říkáme, že zobrazení h : Rn → Rn je regulární v množině M ⊆ Rn, právě
když
(1) M je otevřená množina,
(2) funkce h1, . . . , hn mají spojité první parciální derivace v M,
(3) pro ∀ x ∈ M je jakobián nenulový, tj. Dh(x) = 0.
Připomeňme, že zobrazení h je prosté na M, jestliže pro x1, x2 ∈ M takové, že x1 = x2, je
h(x1) = h(x2).
Věta 12.2. Věta o substituci. Nechť h je zobrazení otevřené množiny P ⊆ Rn na Q ⊆ Rn.
Nechť h je regulární a prosté s jakobiánem Dh. Budiž M ⊂ Q borelovská množina a budiž
H : Rn → R měřitelná reálná funkce. Potom platí
M
H(y)dy =
h−1(M)
H(h(x)) |Dh(x)| dx. (3.12.3)
Důkaz. Jarník, V.: Integrální počet I,II, NČSAV, Praha, 1955.
Bezprostředním důsledkem této věty je následující věta.
Věta 12.3. Věta o hustotě transformovaného náhodného vektoru. Nechť náhodný
vektor X = (X1, . . . , Xn)′ má hustotu fX(x), x ∈ Rn. Nechť h je zobrazení Rn do Rn, které je
regulární a prosté na otevřené množině G, kterou zobrazuje na h(G) a pro niž platí
56 M3121 Pravděpodobnost a statistika I
Definice 12.1. Říkáme, že zobrazení h : Rn → Rn je regulární v množině M ⊆ Rn, právě
když
(1) M je otevřená množina,
(2) funkce h1, . . . , hn mají spojité první parciální derivace v M,
(3) pro ∀ x ∈ M je jakobián nenulový, tj. Dh(x) = 0.
Připomeňme, že zobrazení h je prosté na M, jestliže pro x1, x2 ∈ M takové, že x1 = x2, je
h(x1) = h(x2).
Věta 12.2. Věta o substituci. Nechť h je zobrazení otevřené množiny P ⊆ Rn na Q ⊆ Rn.
Nechť h je regulární a prosté s jakobiánem Dh. Budiž M ⊂ Q borelovská množina a budiž
H : Rn → R měřitelná reálná funkce. Potom platí
M
H(y)dy =
h−1(M)
H(h(x)) |Dh(x)| dx. (3.12.3)
Důkaz. Jarník, V.: Integrální počet I,II, NČSAV, Praha, 1955.
Bezprostředním důsledkem této věty je následující věta.
Věta 12.3. Věta o hustotě transformovaného náhodného vektoru. Nechť náhodný
vektor X = (X1, . . . , Xn)′ má hustotu fX(x), x ∈ Rn. Nechť h je zobrazení Rn do Rn, které je
regulární a prosté na otevřené množině G, kterou zobrazuje na h(G) a pro niž platí
G
fX(x)dx = 1.
Nechť h−1 je inverzní zobrazení k h. Potom náhodný vektor Y = h(X) má hustotu fY(y) tvaru
fY(y) =
fX h−1(y) |Dh−1 (y)| pro y ∈ h(G),
0 jinak.
(3.12.4)
Důkaz. Zřejmě platí
1 = fX(x)dx = P(X ∈ G) = P(h(X) ∈ h(G)) ⇒ P(h(X) /∈ h(G)) = 0.
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 20 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 4 ctd.
h : Rn → Rn, h(x) = µ + Σ x
then h−1(y) = Σ
−1
(y − µ) = Λ−1/2
U (y − µ) and
det Dh−1 (y) = det Λ−1/2
U = det U Λ−1/2
=
= det{U Λ−1/2
Λ−1/2
U } = det{Σ−1
} =
1
det{Σ}
so
f (x) =
1
(2π)n det(Σ)
exp −
1
2
(x − µ) Σ
−1
Σ
−1
(x − µ) =
=
1
(2π)n det(Σ)
exp −
1
2
(x − µ) U Λ−1/2
Λ−1/2
U (x − µ) =
=
1
(2π)n det(Σ)
exp −
1
2
(x − µ) Σ−1
(x − µ)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 21 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Density of non-degenerate N(µ, Σ)
f (x) =
1
(2π)n det(Σ)
exp −
1
2
(x − µ) Σ−1
(x − µ)
Σ: square symmetric positive definite matrix
spectral decomposition Σ = UΛU
λ1 ≥ λ2 ≥ . . . ≥ λn > 0
Σ−1
= UΛ−1
U
quadratic form (x − µ) Σ−1
(x − µ) can be written as
(x − µ) UΛ−1
U (x − µ) = {U (x − µ)} Λ−1
{U (x − µ)}
level sets of f (x), Ic = {x ∈ Rn; f (x) = c} for c > 0:
ellipsoids centred at µ
directions of principal axes: u1,, . . . , un,
lengths of principal semi-axes:
√
dλ1, . . . ,
√
dλn
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 22 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Non-degenerate bivariate normal distribution
N
0
0
,
1 0
0 1
x
y
0.02
0.04
0.06
0.08
0.1
0.12
0.14
N
−1
2
,
1 0
0 1
N
0
0
,
2 0
0 2
x
y
0.02
0.04
0.06
0.08
0.1
0.12
0.14
x
y
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 23 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Non-degenerate bivariate normal distribution
N
0
0
,
1 0
0 1
x
y
0.02
0.04
0.06
0.08
0.1
0.12
0.14
N
0
0
,
1 0.2
0.2 1
N
0
0
,
1 −0.8
−0.8 1
x
y
0.02
0.04
0.06
0.08
0.1
0.12
0.14
x
y
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.2
6
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 24 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Characteristic function (reminder)
Definition (Characteristic function of a random variable)
Let X be a random variable. The function ψX : R → C defined by
ψX (t) = E exp{i t X}, t ∈ R, is the characteristic function of X.
Definition (Characteristic function of a random vector)
Let X be an n-dimensional random vector. The function
ψX : Rn → C defined by ψX(t) = E exp{i t X}, t ∈ Rn, is the
characteristic function of X.
note that
ψX(t) = E exp{i t X} = E exp{i × 1 × t X} = ψt X(1)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 25 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Properties of characteristic function (reminder)
Theorem (ChF 1)
Let X be an n-dimensional random vector and X1 and X2 its
subvectors such that X = (X1 , X2 ) . Then X1 ⊥⊥ X2 iff
ψX(t) = ψX1 (t1) × ψX2 (t2) for every t = (t1 , t2 ) ∈ Rn.
a proof can be found in Petr Lachout: Teorie
pravdˇepodobnosti (1998). Nakladatelstv´ı Univerzity Karlovy
Theorem (ChF 2)
Let X ∼ N(µ, σ2). Then ψX (t) = exp i t µ − 1
2σ2t2 .
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 26 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Characteristic function of N(µ, Σ)
µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix
Theorem (MVN 5)
Let X ∼ N(µ, Σ). Then
ψX(t) = exp i t µ −
1
2
t Σ t .
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 27 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 5
need to compute
ψX(t) = E exp{i t X} = ψt X(1)
by definition of the multivariate normal distribution
t X ∼ N(t µ, t Σt)
ChF 2: Let X ∼ N(µ, σ2). Then ψX (t) = exp i t µ − 1
2σ2t2 .
hence
ψX(t) = exp i t µ −
1
2
t Σ t
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 28 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Subvectors of N(µ, Σ)
µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix
Theorem (MVN 6)
Let X ∼ N(µ, Σ) and let k ∈ {1, . . . , n}. Then




X1
X2
. . .
Xk



 ∼ N








µ1
µ2
. . .
µk



 ,




σ1,1 σ1,2 . . . σ1,k
σ2,1 σ2,2 . . . σ2,k
. . . . . . . . . . . .
σk,1 σk,2 . . . σk,k







 .
analogous statement is true for any sub-vector of X
converse is not true
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 29 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 6
set A = (Ik×k | 0k×(n−k)) and b = 0k×1
then (X1, . . . , Xk) = AX + b
MVN3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix
and b ∈ Rm. Then AX + b ∼ N(Aµ + b, AΣA ).
A µ + 0 = (µ1, . . . , µk )
A Σ A =
= (Ik×k | 0k×(n−k))
Σ1:k,1:k Σ1:k,k+1:n
Σk+1:n,1:k Σk+1:n,k+1:n
Ik×k
0(n−k)×k
=
= Σ1:k,1:k
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 30 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
(In)dependence in N(µ, Σ)
µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix
Theorem (MVN 7)
Let X ∼ N(µ, Σ) and let k ∈ {1, . . . , n − 1}. Denote
X1 = (X1, . . . , Xk) , X2 = (Xk+1, . . . , Xn) and
X1 ∼ N(µ1, Σ1,1), X2 ∼ N(µ2, Σ2,2).
If
Σ =
Σ1,1 0
0 Σ2,2
then X1 ⊥⊥ X2.
AX ⊥⊥ BX iff AΣB = 0
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 31 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of MVN 7
write t = (t1 , t2 ) , t1 ∈ Rk, t2 ∈ R(n−k)
recall that
MVN 5: Let X ∼ N(µ, Σ). Then
ψX(t) = exp i t µ − 1
2 t Σ t .
and compute
ψX(t) = exp it µ −
1
2
t Σt
= exp it1 µ1 + it2 µ2 −
1
2
t1 Σ1,1t1 −
1
2
t2 Σ2,2t2
= ψX1 (t1)ψX2 (t2)
this implies independence of X1, X2 by
ChF 1: Let X be an n-dimensional random vector and X1 and
X2 its subvectors such that X = (X1 , X2 ) . Then X1 ⊥⊥ X2
iff ψX(t) = ψX1
(t1) × ψX2
(t2) for every t = (t1 , t2 ) ∈ Rn
.
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 32 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Properties
Proof of the corollary
the corollary follows from the multivariate normality of
((AX) , (BX) ) with Cov(AX, BX) = AΣB :
by MVN 3
A
B
X =
A X
B X
X ∼ N . . . ,
AΣ A AΣ B
AΣ B BΣ B
“⇒” independence implies zero covariance
“⇐” follows from MVN7
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 33 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Related distributions
Quadratic forms
Let X ∼ N(µ, Σ), µ ∈ Rn, Σ is an n × n symmetric positive
semidefinite matrix
Theorem (QF 1)
Let Z ∼ N(0, I). Then Z Z ∼ χ2(n).
Theorem (QF 2)
Let X ∼ N(µ, Σ) where rank(Σ) = n. Then
(X − µ) Σ−1
(X − µ) ∼ χ2(n).
Theorem (QF 3)
Let X ∼ N(µ, Σ) where rank(Σ) = r < n. Then
(X − µ) Σ+
(X − µ) ∼ χ2(r).
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 34 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Related distributions
Proof of QF 1
obviously Z Z = n
i=1 Z2
i ∼ χ2(n)
Proof of QF 2
let Σ = UΛU , Σ−1 = UΛ−1U
define Z = Λ−1/2U (X − µ), we see that Z ∼ Nn(0, I)
therefore,
(X − µ) Σ−1
(X − µ) = Z Z ∼ χ2
(n)
Proof of QF 3
let Σ = Un×r Λr×r Un×r , Σ+ = Un×r Λ−1
r×r Un×r
define Z = Λ
−1/2
r×r Un×r (X − µ), we see that Z ∼ Nr (0, I)
therefore,
(X − µ) Σ+
(X − µ) = Z Z ∼ χ2
(r)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 35 / 36
Motivation Univariate normal distribution Multivariate normal distribution
Related distributions
Quadratic forms
Let X ∼ N(µ, Σ), µ ∈ Rn, Σ is an n × n symmetric positive
semidefinite matrix
Theorem (QF 4)
Let Z ∼ N(0, I) and let P be an n × n orthogonal projection matrix
of rank r. Then Z PZ ∼ χ2(r).
Proof of QF 4
P can be written as P = Un×r Un×r
then Un×r Z ∼ Nr (0, I)
therefore,
Z PZ = (Un×r Z) (Un×r Z) ∼ χ2
(r)
Andrea Kraus Linear Models in Statistics
MUNI, Fall 2018 36 / 36