Linear Models in Statistics I Lecture notes Andrea Kraus Autumn 2018 Motivation Univariate normal distribution Multivariate normal distribution 1 Motivation What we are doing and why 2 Univariate normal distribution Definition Properties Related distributions 3 Multivariate normal distribution Definition Properties Related distributions Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 0 / 36 Motivation Univariate normal distribution Multivariate normal distribution 1 Motivation What we are doing and why 2 Univariate normal distribution Definition Properties Related distributions 3 Multivariate normal distribution Definition Properties Related distributions Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 0 / 36 Motivation Univariate normal distribution Multivariate normal distribution What we are doing and why Linear model Yi = β0 + β1xi,1 + . . . + βkxi,k + εi , i ∈ {1, . . . , n} Yi : outcome, response, output, dependent variable • random variable, we observe a realization yi • (odezva, z´avisle promˇenn´a, regresand) xi,1, . . . , xi,k : covariates, predictors, explanatory variables, input, independent variables • given, known • (nez´avisle promˇenn´e, regresory) β0, . . . , βk : coefficients • unknown, fixed, we want to estimate • (regresn´ı koeficienty) εi : random error • random variable, unobserved εi iid ∼ (0, σ2), i ∈ {1, . . . , n} E εi = 0: no systematic errors Var εi = σ2 : same precision we often assume that εi iid ∼ N(0, σ2), i ∈ {1, . . . , n} Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 1 / 36 Motivation Univariate normal distribution Multivariate normal distribution What we are doing and why Linear model in matrix form Yi = β0 + β1xi,1 + . . . + βkxi,k + εi , i ∈ {1, . . . , n} matrix notation:     Y1 Y2 . . . Yn     Y =     1 x1,1 x1,2 . . . x1,k 1 x2,1 x2,2 . . . x2,k . . . . . . . . . . . . . . . 1 xn,1 xn,2 . . . xn,k     X ×     β0 β1 . . . βk     β +     ε1 ε2 . . . εn     ε linear model in matrix form: Y = Xβ + ε, ε ∼ (0, σ2I) and often ε ∼ N(0, σ2I) X: design matrix • (regresn´ı matice, matice pl´anu) let p = k + 1 then Y n×1 = X n×p β p×1 + ε n×1 we assume that n > p (and often think about n → ∞, p fixed) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 2 / 36 Motivation Univariate normal distribution Multivariate normal distribution What we are doing and why Linear model for fev data question: association between the FEV [l] and Smoking, corrected for Age [years], Height [cm] and Gender data: FEV Age Height Gender Smoking 1.708 9 144.8 Female Non 1.724 8 171.5 Female Non 1.720 7 138.4 Female Non 1.558 9 134.6 Male Non . . . . . . . . . . . . . . . 3.727 15 172.7 Male Current 2.853 18 152.4 Female Non 2.795 16 160.0 Female Current 3.211 15 168.9 Female Non FEVi = β0 + β1 × Agei + β2 × Heighti + β3 × Genderi + β4 × Smokingi + εi model: Y = Xβ + ε              1.708 1.724 1.720 1.558 . . . 3.727 2.853 2.795 3.211              =              1 9 144.8 0 0 1 8 171.5 0 0 1 7 138.4 0 0 1 9 134.6 1 0 . . . . . . . . . . . . . . . 1 15 172.7 1 1 1 18 152.4 0 0 1 16 160.0 0 1 1 15 168.9 0 0              ×     β0 β1 . . . β4     +              ε1 ε2 ε3 ε4 . . . ε651 ε652 ε653 ε654              Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 3 / 36 Motivation Univariate normal distribution Multivariate normal distribution What we are doing and why Linear model for bloodpress data question: association between the mean arterial blood pressure and age [years], weight [kg], body surface area [m2 ], stress, duration of hypertension [years], basal pulse [beats/min] data: MAP Age Weight BSA DoH Pulse Stress 105 47 85.4 1.75 5.1 63 33 115 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 110 48 90.5 1.88 9.0 71 99 122 56 95.7 2.09 7.0 75 99 MAPi = β0 + β1 × Agei + β2 × Weighti + β3 × BSAi + + β4 × DoHi + β5 × Pulsei + β6 × Stressi + εi model: Y = Xβ + ε       105 115 . . . 110 122       =       1 47 85.4 1.75 5.1 63 33 1 49 94.2 2.10 3.8 70 14 . . . . . . . . . . . . . . . . . . . . . 1 48 90.5 1.88 9.0 71 99 1 56 95.7 2.09 7.0 75 99       ×   β0 . . . β6   +       ε1 ε2 . . . ε19 ε20       Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 4 / 36 Motivation Univariate normal distribution Multivariate normal distribution What we are doing and why Normal distribution in a linear model model: Y = Xβ + ε assumptions of the normal linear model: X fixed and known β fixed unknown ε ∼ N(0, σ2 I) ⇒ Y ∼ N(Xβ, σ2 I) estimators of β and σ2 functions of Y test statistics concerning β and σ2 functions of Y ⇒ to make inference in normal linear model, we need to study multivariate normal distribution N(µ, Σ) distributions of functions of N(µ, Σ) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 5 / 36 Motivation Univariate normal distribution Multivariate normal distribution 1 Motivation What we are doing and why 2 Univariate normal distribution Definition Properties Related distributions 3 Multivariate normal distribution Definition Properties Related distributions Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 5 / 36 Motivation Univariate normal distribution Multivariate normal distribution Definition Normal distribution N(µ, σ2 ) let µ ∈ R and σ2 > 0 density f (x) = 1√ 2πσ2 exp − 1 2σ2 (x − µ)2 for the standard normal distribution (µ = 0, σ2 = 1): −4 −2 0 2 4 0.00.10.20.30.4 x f(x) if σ2 = 0 then X = µ a.s. Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 6 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Properties of N(µ, σ2 ): µ ∈ R, σ2 > 0 Let X ∼ N(µ, σ2). Then E X = µ and Var X = σ2. Let a, b ∈ R, X ∼ N(µ, σ2). Then aX + b ∼ N(aµ + b, a2σ2). Let X ∼ N(µ, σ2 ) and Z = 1 σ (X − µ). Then Z ∼ N(0, 1). If X ∼ N(µ, σ2 ), then X d = µ + σ Z, where Z ∼ N(0, 1). −4 −2 0 2 4 0.00.10.20.30.4 x f(x) Let ai , bi ∈ R, Xi ind. ∼ N(µi , σ2 i ) for i ∈ {1, . . . , n}. Then n i=1(ai Xi + bi ) ∼ N n i=1(ai µi + bi ), i=1 a2 i σ2 i . Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 7 / 36 Motivation Univariate normal distribution Multivariate normal distribution Related distributions χ2 (n) distribution let Z ∼ N(0, 1) Z2 ∼ χ2(1) let Zi ind. ∼ N(0, 1) for i ∈ {1, . . . , n} X = n i=1 Z2 i ∼ χ2(n) density 0 10 20 30 40 50 0.00.40.81.2 χ2 (1) 0 10 20 30 40 50 0.000.050.100.15 χ2 (5) 0 10 20 30 40 50 0.000.040.08 χ2 (10) 0 10 20 30 40 50 0.000.020.040.06 χ2 (20) E X = n, Var X = 2n Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 8 / 36 Motivation Univariate normal distribution Multivariate normal distribution Related distributions Student’s t-distribution let Z ∼ N(0, 1) and X ∼ χ2(n), Z ⊥⊥ X T = Z√ X/n ∼ t(n) density −6 −4 −2 0 2 4 6 0.000.100.200.30 t(1) −6 −4 −2 0 2 4 6 0.00.10.20.3 t(3) −6 −4 −2 0 2 4 6 0.00.10.20.30.4 t(5) E T = 0 for n > 1, Var T = n/(n − 2) for n > 2 Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 9 / 36 Motivation Univariate normal distribution Multivariate normal distribution Related distributions Fisher–Snedecor distribution let X1 ∼ χ2(n1) and X2 ∼ χ2(n2), X1 ⊥⊥ X2 F = X1/n1 X2/n2 ∼ F(n1, n2) density 0 2 4 6 8 10 0.01.02.0 F(1, 5) 0 2 4 6 8 10 0.00.20.4 F(5, 1) 0 2 4 6 8 10 0.00.20.40.6 F(5, 5) 0 2 4 6 8 10 0.00.20.40.6 F(5, 10) E F = n2/(n2 − 2) for n2 > 2 Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 10 / 36 Motivation Univariate normal distribution Multivariate normal distribution 1 Motivation What we are doing and why 2 Univariate normal distribution Definition Properties Related distributions 3 Multivariate normal distribution Definition Properties Related distributions Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 10 / 36 Motivation Univariate normal distribution Multivariate normal distribution Definition Multivariate normal distribution N(µ, Σ) µ ∈ Rn, Σ is an n × n positive semidefinite matrix Definition A random vector X : (Ω, A) → (Rn, B(Rn)) has multivariate normal distribution N(µ, Σ) if and only if a X ∼ N(a µ, a Σ a) for every a ∈ Rn. if rank(Σ) = n then N(µ, Σ) is non-degenerate has density f (x) = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ−1 (x − µ) if rank(Σ) = r < n then N(µ, Σ) is degenerate a.s. “lives” in a subspace of Rn of dimension r no density w.r.t. Lebesgue measure on B(Rn ) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 11 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Properties of N(µ, Σ) µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 1) Let X ∼ N(µ, Σ). Then E X = µ and Var X = Σ. Theorem (MVN 2) Let Z1, . . . , Zn iid ∼ N(0, 1) and Z = (Z1, . . . , Zn) . Then Z ∼ N(0, I). Theorem (MVN 3) Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm. Then AX + b ∼ N(Aµ + b, AΣA ). Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 12 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 1 let ei = (0, 0, . . . , 0, 1, 0, . . . , 0) be a vector with 1 on the ith position and 0 elsewhere by definition: ei X ∼ N(ei µ, ei Σ ei ), which is Xi ∼ N(µi , σi,i ) so E Xi = µi and Var Xi = σi,i let ei,j = (0, 0, . . . , 0, 1, 0, . . . , 0, 1, 0, . . . , 0) be a vector with 1 on the ith and the jth positions and 0 elsewhere by definition: ei,j X ∼ N(ei,j µ, ei,j Σ ei,j ), which is (Xi + Xj ) ∼ N{(µi + µj ), (σi,i + σi,j + σj,i + σj,j )} so Var(Xi + Xj ) = σi,i + σi,j + σj,i + σj,j but also Var(Xi + Xj ) = Var Xi + 2 Cov(Xi , Xj ) + Var Xj and σi,j = σj,i hence Cov(Xi , Xj ) = σi,j Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 13 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 2 we will verify that Z satisfies the definition of N(0, I) let a ∈ Rn then a Z = n i=1 ai Zi recall that the sum of independent normals is a normal: n i=1 ai Zi ∼ N n i=1 ai × 0, n i=1 a2 i × 1 = = N 0, n i=1 a2 i = N a 0, a I a Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 14 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 3 we have X n×1 , A m×n and b m×1 so Y = AX + b m×1 we verify that the def. of N(Aµ + b, AΣA ) holds for Y: let a ∈ Rm a Y = a (AX + b) = a A X + a b denote a = A a n×1 X ∼ N(µ, Σ), so a X ∼ N(a µ, a Σ a), which is a AX ∼ N(a A µ, a A Σ A a) now, a b is a constant, so a b ∼ N(a b, 0) and a b is independent of a AX recall that sum of univariate normals is normal, so a AX + a b ∼ N(a A µ + a b, a A Σ A a + 0) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 15 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Non-degenerate N(µ, Σ) seen through N(0, I) µ ∈ Rn, Σ is an n × n symmetric positive definite matrix rank(Σ) = n spectral decomposition Σ = UΛU λ1 ≥ λ2 ≥ . . . ≥ λn > 0 Σ = UΛU = UΛ1/2 Σ Λ1/2 U = ΣΣ Let Z = Σ −1 (X − µ) = Λ−1/2 U (X − µ). Then Z ∼ N(0, I) (n-dimensional). MVN3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm . Then AX + b ∼ N(Aµ + b, AΣA ). Σ −1 µ − Σ −1 µ = 0 Σ −1 Σ (Σ −1 ) = Λ−1/2 U UΛU U Λ−1/2 = I If X ∼ N(µ, Σ), then X d = µ + ΣZ, where Z ∼ N(0, I) (n-dimensional). Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 16 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Degenerate N(µ, Σ) seen through N(0, I) µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix suppose that rank(Σ) = r < n spectral decomposition Σ = UΛU λ1 ≥ λ2 ≥ . . . ≥ λr > 0, λr+1 = λr+2 = . . . = λn = 0 Σ = UΛU = Un×r (u,1 | u,2 | ... | u,r ) Λr×r diag{λ1,λ2,...,λr } Un×r = = Un×r Λ 1/2 r×r Σ Λ 1/2 r×r Un×r = ΣΣ Let Z = Σ + (X − µ) = Λ −1/2 r×r Un×r (X − µ). Then Z ∼ N(0, I) (r-dimensional). If X ∼ N(µ, Σ), then X d = µ + ΣZ, where Z ∼ N(0, I) (r-dimensional). X a.s. “lives” in a subspace of Rn of dimension r Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 17 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Density of non-degenerate N(µ, Σ) µ ∈ Rn, Σ is an n × n symmetric positive definite matrix Theorem (MVN 4) Let X ∼ N(µ, Σ) where rank(Σ) = n. Then X has density f (x) w.r.t. Lebesgue measure on B(Rn) and f (x) = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ−1 (x − µ) . Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 18 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 4 recall the density of Z ∼ N(0, 1): f (z) = 1√ 2π exp −1 2 z2 consider a random vector Z = (Z1, . . . , Zn) : Zi i.i.d. ∼ N(0, 1) by independence, the joint density of Z is f (z) = n i=1 1 √ 2π exp − 1 2 z2 i = 1 (2π)n exp − 1 2 n i=1 z2 i (recall that by MVN 2, Z ∼ N(0, I) and note that f (z) = 1√ (2π)n det(I) exp −1 2 z I−1z ) now if X ∼ N(µ, Σ), then X d = µ + Σ Z, where Σ = U Λ1/2 and Σ = U Λ U , so the density of X can be derived from that of Z Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 19 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 4 ctd. (extract from last year) 2 y 12. Transformace náhodných vektorů Vzorec fY (y) = fX h−1(y) dh−1(y) dy lze pomocí věty o substituci v mnohorozměrných integrálech jednoduše rozšířit i na vícerozměrný případ. Proto nejdříve připomeneme několik základních pojmů. Mějme zobrazení h : Rn → Rn, kde h(x) = (h1(x), . . . , hn(x)). To znamená, že h1, . . . , hn jsou funkce proměnných x1, . . . , xn. Předpokládejme, že existují parciální derivace ∂hi(x1,...,xn) ∂xj (i, j = 1, . . . , n). Matice těchto parciálních derivací se nazývá Jacobiho matice. Potom Jacobiho determinant (jakobián) je determinant Jacobiho matice Dh(x) = det ∂ h ∂ x′ = det    ∂h1 ∂x1 · · · ∂h1 ∂xn ... ... ∂hn ∂x1 · · · ∂hn ∂xn    = ∂h1 ∂x1 · · · ∂h1 ∂xn ... ... ∂hn ∂x1 · · · ∂hn ∂xn Označme nyní y = h(x), tj. y1 = h1(x), . . . , yn = hn(x) a připomeňme definici regulárního zobrazení.56 M3121 Pravděpodobnost a statistika I Definice 12.1. Říkáme, že zobrazení h : Rn → Rn je regulární v množině M ⊆ Rn, právě když (1) M je otevřená množina, (2) funkce h1, . . . , hn mají spojité první parciální derivace v M, (3) pro ∀ x ∈ M je jakobián nenulový, tj. Dh(x) = 0. Připomeňme, že zobrazení h je prosté na M, jestliže pro x1, x2 ∈ M takové, že x1 = x2, je h(x1) = h(x2). Věta 12.2. Věta o substituci. Nechť h je zobrazení otevřené množiny P ⊆ Rn na Q ⊆ Rn. Nechť h je regulární a prosté s jakobiánem Dh. Budiž M ⊂ Q borelovská množina a budiž H : Rn → R měřitelná reálná funkce. Potom platí M H(y)dy = h−1(M) H(h(x)) |Dh(x)| dx. (3.12.3) Důkaz. Jarník, V.: Integrální počet I,II, NČSAV, Praha, 1955. Bezprostředním důsledkem této věty je následující věta. Věta 12.3. Věta o hustotě transformovaného náhodného vektoru. Nechť náhodný vektor X = (X1, . . . , Xn)′ má hustotu fX(x), x ∈ Rn. Nechť h je zobrazení Rn do Rn, které je regulární a prosté na otevřené množině G, kterou zobrazuje na h(G) a pro niž platí 56 M3121 Pravděpodobnost a statistika I Definice 12.1. Říkáme, že zobrazení h : Rn → Rn je regulární v množině M ⊆ Rn, právě když (1) M je otevřená množina, (2) funkce h1, . . . , hn mají spojité první parciální derivace v M, (3) pro ∀ x ∈ M je jakobián nenulový, tj. Dh(x) = 0. Připomeňme, že zobrazení h je prosté na M, jestliže pro x1, x2 ∈ M takové, že x1 = x2, je h(x1) = h(x2). Věta 12.2. Věta o substituci. Nechť h je zobrazení otevřené množiny P ⊆ Rn na Q ⊆ Rn. Nechť h je regulární a prosté s jakobiánem Dh. Budiž M ⊂ Q borelovská množina a budiž H : Rn → R měřitelná reálná funkce. Potom platí M H(y)dy = h−1(M) H(h(x)) |Dh(x)| dx. (3.12.3) Důkaz. Jarník, V.: Integrální počet I,II, NČSAV, Praha, 1955. Bezprostředním důsledkem této věty je následující věta. Věta 12.3. Věta o hustotě transformovaného náhodného vektoru. Nechť náhodný vektor X = (X1, . . . , Xn)′ má hustotu fX(x), x ∈ Rn. Nechť h je zobrazení Rn do Rn, které je regulární a prosté na otevřené množině G, kterou zobrazuje na h(G) a pro niž platí G fX(x)dx = 1. Nechť h−1 je inverzní zobrazení k h. Potom náhodný vektor Y = h(X) má hustotu fY(y) tvaru fY(y) = fX h−1(y) |Dh−1 (y)| pro y ∈ h(G), 0 jinak. (3.12.4) Důkaz. Zřejmě platí 1 = fX(x)dx = P(X ∈ G) = P(h(X) ∈ h(G)) ⇒ P(h(X) /∈ h(G)) = 0. Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 20 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 4 ctd. h : Rn → Rn, h(x) = µ + Σ x then h−1(y) = Σ −1 (y − µ) = Λ−1/2 U (y − µ) and det Dh−1 (y) = det Λ−1/2 U = det U Λ−1/2 = = det{U Λ−1/2 Λ−1/2 U } = det{Σ−1 } = 1 det{Σ} so f (x) = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ −1 Σ −1 (x − µ) = = 1 (2π)n det(Σ) exp − 1 2 (x − µ) U Λ−1/2 Λ−1/2 U (x − µ) = = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ−1 (x − µ) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 21 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Density of non-degenerate N(µ, Σ) f (x) = 1 (2π)n det(Σ) exp − 1 2 (x − µ) Σ−1 (x − µ) Σ: square symmetric positive definite matrix spectral decomposition Σ = UΛU λ1 ≥ λ2 ≥ . . . ≥ λn > 0 Σ−1 = UΛ−1 U quadratic form (x − µ) Σ−1 (x − µ) can be written as (x − µ) UΛ−1 U (x − µ) = {U (x − µ)} Λ−1 {U (x − µ)} level sets of f (x), Ic = {x ∈ Rn; f (x) = c} for c > 0: ellipsoids centred at µ directions of principal axes: u1,, . . . , un, lengths of principal semi-axes: √ dλ1, . . . , √ dλn Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 22 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Non-degenerate bivariate normal distribution N 0 0 , 1 0 0 1 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 N −1 2 , 1 0 0 1 N 0 0 , 2 0 0 2 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 x y 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 23 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Non-degenerate bivariate normal distribution N 0 0 , 1 0 0 1 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 N 0 0 , 1 0.2 0.2 1 N 0 0 , 1 −0.8 −0.8 1 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 x y 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.2 6 Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 24 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Characteristic function (reminder) Definition (Characteristic function of a random variable) Let X be a random variable. The function ψX : R → C defined by ψX (t) = E exp{i t X}, t ∈ R, is the characteristic function of X. Definition (Characteristic function of a random vector) Let X be an n-dimensional random vector. The function ψX : Rn → C defined by ψX(t) = E exp{i t X}, t ∈ Rn, is the characteristic function of X. note that ψX(t) = E exp{i t X} = E exp{i × 1 × t X} = ψt X(1) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 25 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Properties of characteristic function (reminder) Theorem (ChF 1) Let X be an n-dimensional random vector and X1 and X2 its subvectors such that X = (X1 , X2 ) . Then X1 ⊥⊥ X2 iff ψX(t) = ψX1 (t1) × ψX2 (t2) for every t = (t1 , t2 ) ∈ Rn. a proof can be found in Petr Lachout: Teorie pravdˇepodobnosti (1998). Nakladatelstv´ı Univerzity Karlovy Theorem (ChF 2) Let X ∼ N(µ, σ2). Then ψX (t) = exp i t µ − 1 2σ2t2 . Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 26 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Characteristic function of N(µ, Σ) µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 5) Let X ∼ N(µ, Σ). Then ψX(t) = exp i t µ − 1 2 t Σ t . Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 27 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 5 need to compute ψX(t) = E exp{i t X} = ψt X(1) by definition of the multivariate normal distribution t X ∼ N(t µ, t Σt) ChF 2: Let X ∼ N(µ, σ2). Then ψX (t) = exp i t µ − 1 2σ2t2 . hence ψX(t) = exp i t µ − 1 2 t Σ t Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 28 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Subvectors of N(µ, Σ) µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 6) Let X ∼ N(µ, Σ) and let k ∈ {1, . . . , n}. Then     X1 X2 . . . Xk     ∼ N         µ1 µ2 . . . µk     ,     σ1,1 σ1,2 . . . σ1,k σ2,1 σ2,2 . . . σ2,k . . . . . . . . . . . . σk,1 σk,2 . . . σk,k         . analogous statement is true for any sub-vector of X converse is not true Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 29 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 6 set A = (Ik×k | 0k×(n−k)) and b = 0k×1 then (X1, . . . , Xk) = AX + b MVN3: Let X ∼ N(µ, Σ) and let A be an m × n real matrix and b ∈ Rm. Then AX + b ∼ N(Aµ + b, AΣA ). A µ + 0 = (µ1, . . . , µk ) A Σ A = = (Ik×k | 0k×(n−k)) Σ1:k,1:k Σ1:k,k+1:n Σk+1:n,1:k Σk+1:n,k+1:n Ik×k 0(n−k)×k = = Σ1:k,1:k Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 30 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties (In)dependence in N(µ, Σ) µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix Theorem (MVN 7) Let X ∼ N(µ, Σ) and let k ∈ {1, . . . , n − 1}. Denote X1 = (X1, . . . , Xk) , X2 = (Xk+1, . . . , Xn) and X1 ∼ N(µ1, Σ1,1), X2 ∼ N(µ2, Σ2,2). If Σ = Σ1,1 0 0 Σ2,2 then X1 ⊥⊥ X2. AX ⊥⊥ BX iff AΣB = 0 Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 31 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of MVN 7 write t = (t1 , t2 ) , t1 ∈ Rk, t2 ∈ R(n−k) recall that MVN 5: Let X ∼ N(µ, Σ). Then ψX(t) = exp i t µ − 1 2 t Σ t . and compute ψX(t) = exp it µ − 1 2 t Σt = exp it1 µ1 + it2 µ2 − 1 2 t1 Σ1,1t1 − 1 2 t2 Σ2,2t2 = ψX1 (t1)ψX2 (t2) this implies independence of X1, X2 by ChF 1: Let X be an n-dimensional random vector and X1 and X2 its subvectors such that X = (X1 , X2 ) . Then X1 ⊥⊥ X2 iff ψX(t) = ψX1 (t1) × ψX2 (t2) for every t = (t1 , t2 ) ∈ Rn . Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 32 / 36 Motivation Univariate normal distribution Multivariate normal distribution Properties Proof of the corollary the corollary follows from the multivariate normality of ((AX) , (BX) ) with Cov(AX, BX) = AΣB : by MVN 3 A B X = A X B X X ∼ N . . . , AΣ A AΣ B AΣ B BΣ B “⇒” independence implies zero covariance “⇐” follows from MVN7 Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 33 / 36 Motivation Univariate normal distribution Multivariate normal distribution Related distributions Quadratic forms Let X ∼ N(µ, Σ), µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix Theorem (QF 1) Let Z ∼ N(0, I). Then Z Z ∼ χ2(n). Theorem (QF 2) Let X ∼ N(µ, Σ) where rank(Σ) = n. Then (X − µ) Σ−1 (X − µ) ∼ χ2(n). Theorem (QF 3) Let X ∼ N(µ, Σ) where rank(Σ) = r < n. Then (X − µ) Σ+ (X − µ) ∼ χ2(r). Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 34 / 36 Motivation Univariate normal distribution Multivariate normal distribution Related distributions Proof of QF 1 obviously Z Z = n i=1 Z2 i ∼ χ2(n) Proof of QF 2 let Σ = UΛU , Σ−1 = UΛ−1U define Z = Λ−1/2U (X − µ), we see that Z ∼ Nn(0, I) therefore, (X − µ) Σ−1 (X − µ) = Z Z ∼ χ2 (n) Proof of QF 3 let Σ = Un×r Λr×r Un×r , Σ+ = Un×r Λ−1 r×r Un×r define Z = Λ −1/2 r×r Un×r (X − µ), we see that Z ∼ Nr (0, I) therefore, (X − µ) Σ+ (X − µ) = Z Z ∼ χ2 (r) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 35 / 36 Motivation Univariate normal distribution Multivariate normal distribution Related distributions Quadratic forms Let X ∼ N(µ, Σ), µ ∈ Rn, Σ is an n × n symmetric positive semidefinite matrix Theorem (QF 4) Let Z ∼ N(0, I) and let P be an n × n orthogonal projection matrix of rank r. Then Z PZ ∼ χ2(r). Proof of QF 4 P can be written as P = Un×r Un×r then Un×r Z ∼ Nr (0, I) therefore, Z PZ = (Un×r Z) (Un×r Z) ∼ χ2 (r) Andrea Kraus Linear Models in Statistics MUNI, Fall 2018 36 / 36