Statistical Inference I and II
Probabilistic and Statistical Models
Stanislav Katina1
1Institute of Mathematics and Statistics, Masaryk University
Honorary Research Fellow, The University of Glasgow
November 13, 2018
1 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Random variable, random vector, data, individuals
random variable and random vector
random variable X is a function from a sample space to a
set of real numbers X : Y → R (a set of all possible
outcomes)
2-dimensional random vector (X1, X2)T
: Y → R2
k-dimensional random vector (X1, X2, . . . , Xk )T
: Y → Rk
data – data vector and data matrix – the elements of a
vector and the rows of a matrix are measured on
individuals (statistical units)
data as realisations of X – n-dimensional vector
x = (x1, x2, . . . , xn)T
, where n is a sample size
data as realisations of (X1, X2)T
– (n × 2)-dimensional
matrix with rows (xi1, xi2)T
, i = 1, 2, . . . , n and columns x1
and x2
data as realisations of (X1, X2, . . . , Xk )T
–
(n × k)-dimensional matrix with rows (xi1, xi2, . . . , xik )T
,
i = 1, 2, . . . , n and columns x1, x2 and xk
2 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Model
based on probabilistic sampling principles, the
individuals are sampled from a population
attribute – a specific value of a variable
with certain precision, data are measured on individuals
descriptive statistics – describing and summarising data
inferential statistics (statistical inference) – inferring
(drawing conclusions) about random variable based on a
model fitted to data
F is a set of models (probabilistic or statistical)
X is characterised by a model F(∙), F ∈ F
(X1, X2)T
is characterised by a model F(2)
(∙), F ∈ F
(X1, X2, . . . , Xk )T
is characterised by a model F(k)
(∙), F ∈ F
parameter – a numerical quantity that characterises a
model – one-dimensional parameter θ, k-dimensional
vector of parameters θ = (θ1, θ2, . . . , θk )T
3 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Distribution function, probability and density function
useful assumption – Xi, i = 1, 2, . . . , n, are independently
identically distributed random variables
distribution function
discrete random variable
FX (x) = Pr (X ≤ x) =
i:xi ≤x
Pr (X = xi ) ,
where
k(∞)
i=1 pi = 1, Pr (X = xi ) = pi = fX (xi ) = f(xi ), ∀xi ,
where pi is probability mass function; {xi , pi }
k(∞)
i=1 , k ∈ N+
continuous random variable
FX (x) =
x
−∞
f (t) dt, f (x) ≥ 0,
where
∞
−∞
f (x) dx = 1, fX (x) = f(x) = ∂
∂x FX (x) is density
function
4 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Parametric and non-parametric model
Θ is a parametric space, the support of F(∙; θ) is
Yθ ⊆ Rn (the smallest set, where the distribution function
is defined); sample space Y = ∪θ∈ΘYθ
F as a parametric set of distribution functions
F = F(∙; θ) : θ ∈ Θ ⊆ Rk
,
F as a parametric set of probability or density
functions
F = f(∙; θ) : θ ∈ Θ ⊆ Rk
F as non-parametric set
F = {a set of all density functions} ,
alternatively, probability or distribution function can be used
5 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Reading of mathematical notation
the term ”probability model” is often reduced to
”distribution”
”Random variable X is distributed as F(x)” or ”random
variable X is characterised by distribution F(x)”, notation
X ∼ FX (x); symbol ”∼” means ”asymptotically”, ”for
sufficiently large n” (notation X ∼ fX (x) is used very rarely)
”Random variable X is distributed as random variable Y” or
”Random variable X and Y are identically distributed”
(notation X ∼ Y or FX (x) ∼ FY (y)
the term ”statistical model” is often reduced to ”model”
(usually referred as causal statistical model or model of
causal dependence)
”Y depends on X”, where X is independent variable and
Y is dependent variable (notation Y|X)
6 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Reading of mathematical notation
”X is normally distributed with parameters μ and σ2”,
notation X ∼ N(μ, σ2), where θ = (μ, σ2)T
”X = (X1, X2)T is characterised by bivariate normal
distribution with parameters μ1, μ2, σ2
1, σ2
2 and ρ”, notation
X ∼ N2(μ, Σ), where θ = (μ1, μ2, σ2
1, σ2
2, ρ)T
”X = (X1, X2, . . . , Xk )T is characterised by multivariate
normal distribution with parameters μ1, μ2, . . ., μk , σ2
1, σ2
2,
. . ., σ2
k , and ρ1,2, . . ., ρk−1,k , ”, notation X ∼ Nk (μ, Σ),
where θ = (μ1, μ2, . . . , μk , σ2
1, σ2
2, . . . , σ2
k , ρ1,2, . . . , ρk−1,k )T
”X is binomially distributed with parameter p”, notation
X ∼ Bin(N, p), where θ = p
”X is characterised by distribution with parameter λ”,
notation X ∼ Poiss(λ), where θ = λ
”X = (X1, X2, . . . , Xk )T is multinomially distributed with
parameter p”, notation X ∼ Multk (N, p), where θ = p
7 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Measures of normal distribution
”X is normally distributed with parameters μ and σ2”,
notation X ∼ N(μ, σ2), where θ = (μ, σ2)T
Random variable Z (Z-transformation)
Pr(Z = X−μ
σ < x1−α) = 1 − α, Z ∼ N(0, 1)
Rule ”90 − 95 − 99”
Pr (a ≤ X ≤ b) = 1 − α, where 1 − α = 0.90, 0.95 and
0.99, a = μ − x1− α
2
σ and b = μ + x1− α
2
σ
Rule ”68.27 − 95.45 − 99.73”
Pr (a ≤ X < b) = Pr (X < b)−Pr (X < a) = FX (b)−FX (a),
where a = μ − kσ, b = μ + kσ, k = 1, 2 and 3
8 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Approximation of binomial distribution by normal distribution
Definition (approximation of binomial distribution by normal
distribution)
If random variable X is binomially distributed with parameter p,
X ∼ Bin(N, p), where θ = p, if Np > 5 and Nq > 5, where
q = 1 − p, then the distribution of random variable X can be
approximated by normal distribution, X ∼ N(Np, Npq), where
θ = (Np, Npq)T .
Table: Examples of minimal N for fixed p
p 0.1 0.2 0.3 0.4 0.5
q 0.9 0.8 0.7 0.6 0.5
N 51 26 17 13 11
9 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Approximation of binomial distribution by normal distribution
Definition (approximation of binomial distribution by normal
distribution, Hald condition)
If random variable X is binomially distributed with parameter p,
X ∼ Bin(N, p), where θ = p, if Npq > 9 (Hald condition), where
q = 1 − p, then the distribution of random variable X can be
approximated by normal distribution, X ∼ N(Np, Npq), where
θ = (Np, Npq)T .
Table: Examples of minimal N for fixed p
p 0.01 0.02 0.05 0.10 0.15 0.20 0.30 0.40 0.50
1 − p 0.99 0.98 0.95 0.90 0.85 0.80 0.70 0.60 0.50
N 910 460 190 100 71 57 43 38 36
10 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Approximation of binomial distribution by normal distribution
Example
Let Pr(male) = 0.515 and Pr(female) = 0.485. Let X be the
frequency of males and Y frequency of females. Assuming that
X ∼ Bin(N, p), calculate (a) Pr(X ≤ 3), if N = 5, (b) Pr(X ≤ 5),
if N = 10 and (c) Pr(X ≤ 25), if N = 50. Compare the results
with normal approximation X ∼ N(Np, Npq).
Solution
(a) E[X] = Np = 5 × 0.515 = 2.575, E[Y] = 5 × 0.485 = 2.425,
Pr(X ≤ 3) = k≤3
5
k
0.515k
0.4855−k
= 0.793,
Pr(X ≤ 3) = 0.648, N(5 × 0.515, 5 × 0.515 × 0.485).
(b) E[X] = 10 × 0.515 = 5.15, E[Y] = 10 × 0.485 = 4.85,
Pr(X ≤ 5) = k≤5
10
k
0.515k
0.48510−k
= 0.586,
Pr(X ≤ 5) = 0.462, N(10 × 0.515, 10 × 0.515 × 0.485).
(c) E[X] = 50 × 0.515 = 25.75, E[Y] = 50 × 0.485 = 24.25,
Pr(X ≤ 25) = k≤25
50
k
0.515k
0.48550−k
= 0.471,
Pr(X ≤ 25) = 0.416, N(50 × 0.515, 50 × 0.515 × 0.485).
11 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Approximation of binomial distribution by normal distribution
0 1 2 3 4 5
0.00.10.20.30.4
Bin(5,0.515)
0 1 2 3 4 5
0.00.20.40.60.81.0
Bin(5,0.515)
0 2 4 6 8 10
0.00.10.20.30.4
Bin(10,0.515)
0 2 4 6 8 10
0.00.20.40.60.81.0
Bin(10,0.515)
0 10 20 30 40 50
0.00.10.20.30.4
Bin(50,0.515)
0 10 20 30 40 50
0.00.20.40.60.81.0
Bin(50,0.515)
Figure: Probability function (first row) and distribution function
(second row) of binomial distribution superimposed by normal
distribution (p = 0.515; N = 5, 10 and 50)
12 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Approximation of binomial distribution by normal distribution
0 5 10 15
0.00.10.20.30.40.50.6
Bin(5,0.1)
0 5 10 15
0.00.20.40.60.81.0
Bin(5,0.1)
0 5 10 15
0.00.10.20.30.40.50.6
Bin(10,0.1)
0 5 10 15
0.00.20.40.60.81.0
Bin(10,0.1)
0 5 10 15
0.00.10.20.30.40.50.6
Bin(50,0.1)
0 5 10 15
0.00.20.40.60.81.0
Bin(50,0.1)
Figure: Probability function (first row) and distribution function
(second row) of binomial distribution superimposed by normal
distribution (p = 0.1; N = 5, 10 and 50)
13 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Binomial distribution
Example (number of boys)
Number of boys X in families with N children is binomially
distributed, i.e. X ∼ Bin(N, p), where N = 12, number of
families M = 6115 (Geissler 1889). Question: Calculate
theoretical frequencies mn,E .
You know that p =
N
n=0 nmn,O
NM = 0.5192 (weighted average;
average of number of families weighted by number of boys).
Table: Observed and theoretical frequencies (mn,O and mn,E ) of
families with n boys (O = observed, E = expected, theoretical)
n 0 1 2 3 4 5 6 7 8 9 10 11 12
mn,O 3 24 104 286 670 1033 1343 1112 829 478 181 45 7
mn,E 1 12 72 259 628 1085 1367 1266 854 410 133 26 2
14 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Binomial distribution
number of boys in the families with 12 children
observedabsolutefrequencies
0200400600800100012001400
3 24
104
286
670
1033
1343
1112
829
478
181
45
7
0 1 2 3 4 5 6 7 8 9 10 11 12
number of boys in the families with 12 children
expectedabsolutefrequencies
0200400600800100012001400
0 1 2 3 4 5 6 7 8 9 10 11 12
1 12
72
258
628
1085
1367
1266
854
410
133
26 2
Figure: Histograms of observed and expected frequencies
15 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
number of boys in the families with 12 children
absolutefrequencies
0200400600800100012001400
0 1 2 3 4 5 6 7 8 9 10 11 12
2 12
32
28
42
−52
−24
−154
−25
68
48
19 5
q q
q
q
q
q
q
q
q
q
q
q
q
Figure: Comparison of observed and expected frequencies
16 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Binomial distribution
Example (number of individuals with certain blood type)
Number of individuals X = (X1, X2, X3, X4)T with certain blood
group is multinomially distributed following Hardy-Wienberg
equilibrium, i.e. X = (X1, X2, X3, X4)T ∼ Mult4(N, p), where
N = 500 (Katina et al. 2015). Question: Calculate theoretical
frequencies nj,E .
attributes (groups) 0 A B AB
nj,O 209 184 81 26
nj,E 210 183 80 27
17 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
blood type
absolutefrequencies
050100150200
0 A B AB
−1
1
1
−1
q
q
q
q
Figure: Comparison of observed and expected frequencies
18 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
Example (number of individuals with certain blood type)
Let X = (X1, X2)T , where X1 = (X11, X12, X13, X14)T is number
of individuals in Koˇsice (Slovakia) with certain blood group,
X2 = (X21, X22, X23, X24)T is number of individuals in Prague
(Czech Republic) with certain blood group. X is
product-multinomially distributed, i.e. X ∼ ProdMult2(N, p),
where N = (N1, N2)T , where N1 = 500 and N2 = 400 (Katina et
al. 2015). Calculate theoretical frequencies nE,ij . Question:
What are the probabilities of having particular blood group in
Prague and Koˇsice?
Table: Observed frequencies of particular blood group
attributes (groups) 0 A B AB
n1j,O=nKoˇsice,j,O 138 147 84 31
n2j,O=nPrague,j,O 209 184 81 26
19 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
0.4180
0.3680
0.1620
0.0520
0.3450
0.3675
0.2100
0.0775
0.00
0.25
0.50
0.75
1.00
Kosice Prague
factor(city)
frequencies
factor(blood.type)
0
A
AB
B
41.80%
36.80%
16.20%
5.20%
34.50%
36.75%
21.00%
7.75%
0.00
0.25
0.50
0.75
1.00
Kosice Prague
factor(city)
frequencies
factor(blood.type)
0
A
AB
B
Figure: Barplots of four blood types in Koˇsice and Prague (default
palette)
20 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Example (number of individuals with certain eye and hair
colour)
Let X = (X1, X2, . . . , X12)T be random vector of number of
individuals, eye colour (with levels blue Bl, green Gr, brown Br)
and hair colour (with levels blond Blo, light-brown LB, black Bla,
red R), where X1 means Bl-Blo, X2 means Bl-LB, X3 means
Bl-Bla, X4 means Bl-R, X5 means Gr-Blo, X6 means Gr-LB, X7
means Gr-Bla, X8 means Gr-R, X9 means Br-Blo, X10 means
Br-LB, X11 means Br-Bla and X12 means Br-R. Let
X ∼ Mult12(N, p), where N = 6800 (Yule and Kendall 1950).
Question: Calculate probabilities of having (1) particular eye
and hair colour, (2) particular hair colour conditional on eye
colour, (3) particular eye colour conditional on hair colour.
21 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Table: 3 × 4 contingency table of frequencies nj
Blo LB Bla R row sums
Bl 1768 807 189 47 2811
Gr 946 1387 746 53 3132
Br 115 438 288 16 857
column sums 2829 2632 1223 116 6800
22 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
62.90%
28.71%
6.72%
1.67%
30.20%
44.28%
23.82%
1.69%
13.42%
51.11%
33.61%
1.87%
0.00
0.25
0.50
0.75
1.00
blue green brown
factor(eyes)
frequencies
factor(hair)
blond
light−brown
black
red
62.50%
33.44%
4.07%
30.66%
52.70%
16.64%
15.45%
61.00%
23.55%
40.52%
45.69%
13.79%
0.00
0.25
0.50
0.75
1.00
blond light−brown black red
factor(hair)
frequencies
factor(eyes)
blue
green
brown
Figure: Barplots of eye and hair colour (default palette)
23 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
62.90%
28.71%
6.72%
1.67%
30.20%
44.28%
23.82%
1.69%
13.42%
51.11%
33.61%
1.87%
0.00
0.25
0.50
0.75
1.00
blue green brown
factor(eyes)
frequencies
factor(hair)
blond
light−brown
black
red
62.50%
33.44%
4.07%
30.66%
52.70%
16.64%
15.45%
61.00%
23.55%
40.52%
45.69%
13.79%
0.00
0.25
0.50
0.75
1.00
blond light−brown black red
factor(hair)
frequencies
factor(eyes)
blue
green
brown
Figure: Barplots of eye and hair colour (blue palette)
24 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
62.90%
28.71%
6.72%
1.67%
30.20%
44.28%
23.82%
1.69%
13.42%
51.11%
33.61%
1.87%
0.00
0.25
0.50
0.75
1.00
blue green brown
factor(eyes)
frequencies
factor(hair)
blond
light−brown
black
red
62.50%
33.44%
4.07%
30.66%
52.70%
16.64%
15.45%
61.00%
23.55%
40.52%
45.69%
13.79%
0.00
0.25
0.50
0.75
1.00
blond light−brown black red
factor(hair)
frequencies
factor(eyes)
blue
green
brown
Figure: Barplots of eye and hair colour (spectral palette)
25 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Example (number of individuals with certain socioeconomic
status, political philosophy and political affiliation)
Number of individuals X1, . . . , X8 with socioeconomic status,
political philosophy and political affiliation is multinomially
distributed, i.e. X = (X1, . . . , X8)T ∼ Mult8(N, p), where
realisations x = (x1, x2, . . . , x8)T and N = 500 (Christensen
1990, modified). Question: Calculate probabilities of having
particular socioeconomic status, political philosophy and
political affiliation.
26 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Notation: (1) socioeconomic status (high – H, low – Lo), (2) political
philosophy (democrat – D, republican – R) a (3) political affiliation
(conservative – C, liberal – Li). Then X1 (H-D-C), X2 (H-D-Li), X3 (H-R-C), X4
(H-R-Li), X5 (Lo-D-C), X6 (Lo-D-Li), X7 (Lo-R-C) and X8 (Lo-R-Li).
Table: 2 × 4 contingency table of frequencies Xj
D-C D-Li R-C R-Li
H 60 60 60 20
Lo 90 90 90 30
27 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
30%
30%
30%
10%
30%
30%
30%
10%
0.00
0.25
0.50
0.75
1.00
high low
factor(status)
frequencies
factor(group)
democrat−conservative
democrat−liberal
republican−conservative
republican−liberal
Figure: Barplots of socioeconomic status, political philosophy and
affiliation (blue palette)
28 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson distribution
Example (Poisson distribution; killing by horse kicks)
Data were published by Russian economist Ladislaus Bortkiewicz in his book
entitled Das Gesetz der keinem Zahlen (The Law of Small Numbers) in 1898.
Let X be the number of corps of soldiers with n annual deaths (killed by horse
kicks) in the Prussian army within one year (von Bortkiewicz 1898; in 10
different army corps; in 20 years, between 1875 and 1894), n be the number
of annual deaths, mn,O be the number of army corps with particular number
of annual deaths, M = n mn,O = 10 × 20 = 200. Then X ∼ Poiss(λ),
where λ = n nmn,O
n mn,O
= 0.61 (weighted average; average of number of army
corps weighted by number of annual deaths). Question: Calculate
theoretical frequencies mn,E .
Table: Observed and theoretical frequencies (mn,O and mn,E ) of corps
of solders with n annual deaths (killed by horse kicks) over 20 years
n 0 1 2 3 4 ≥ 5
mn,O 109 65 22 3 1 0
mn,E 109 66 20 4 1 0
29 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson distribution
number of annual deaths
absolutefrequencies
020406080100120
0 1 2 3 4 5+
0
−1
2
−1
0 0
q
q
q
q
q q
Figure: Comparison of observed and expected frequencies
30 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson distribution
Example (Poisson distribution; accidents in the factories)
Let X be the number of workers having an accident in munition
factories in England during First World War (Greenwood and Yule
1920), n be the number of accidents, mn,O be the number of workers
with particular number of accidents, M = n mn,O = 647. Then
X ∼ Poiss(λ), where λ = n nmn,O
n mn,O
= 0.47 (weighted average; average
of number of workers weighted by number of accidents). Question:
Calculate theoretical frequencies mn,E .
Table: Observed and theoretical frequencies (mn,O and mn,E ) of
workers with n accidents
n 0 1 2 3 4 ≥ 5
mn,O 447 132 42 21 3 2
mn,E 406 189 44 7 1 0
31 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson distribution
number of annual deaths
absolutefrequencies
0100200300400
0 1 2 3 4 5+
41
−57
−2
14 2 2
q
q
q
q
q q
Figure: Comparison of observed and expected frequencies
32 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Negative binomial distribution
Example (Negative binomial distribution; accidents in the factories)
Let X be the number of workers having an accident in munition
factories in England during First World War (Greenwood and Yule
1920), n be the number of accidents, mn,O be the number of workers
with particular number of accidents, M = n mn,O = 647. Question:
Calculate theoretical frequencies mn,E .
Table: Observed and theoretical frequencies (mn,O and mn,E ) of
workers with n accidents
n 0 1 2 3 4 ≥ 5
mn,O 447 132 42 21 3 2
mn,E 446 134 44 15 5 3
33 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Negative binomial distribution
number of accidents
absolutefrequencies
0100200300400
0 1 2 3 4 5+
1
-2
-2
6
-2 -1
Figure: Comparison of observed and expected frequencies
34 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Zero-inflated Poisson (ZIP) distribution
Example (ZIP distribution; number of movements of a foetal lamb)
Let X be the number of movements of a foetal lamb in 240
five-second periods (Leroux and Puterman 1992), n be the number of
movements, mn,O be the number of foetal lambs with particular
number of movements. Question: Calculate theoretical frequencies
mn,E using Poisson and ZIP distribution.
Table: Observed and theoretical frequencies (mn,O and mn,E ) of
workers with n accidents
n 0 1 2 3 4 5 6 7
mn,O 182 41 12 2 2 0 0 1
mn,E (Poisson) 168 60 11 1 0 0 0 0
mn,E (ZIP) 182 37 16 4 1 0 0 0
35 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Zero-inflated Poisson (ZIP) distribution
number of movements
absolutefrequencies
050100150
0 1 2 3 4 5 6 7
14
-19
1
1 2 0 0 1
number of movements
absolutefrequencies
050100150
0 1 2 3 4 5 6 7
0
4
-4
-2 1 0 0 1
Figure: Comparison of observed and expected frequencies, Poisson
(left), ZIP (right)
36 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Formulations of hypotheses about probability distributions
1. binomial distribution – example – number of boys:
Is the probability of number of boys in the families with 12
boys binomial?
Is the probability of having a boy in the family equal to 0.5?
2. multinomial distribution – example – number of individuals
with certain eye and hair colour: Are the rows and columns of
a contingency table independent?
Are the frequencies of individuals with certain eye colour
(with levels blue, green, brown) independent of hair colour
(with levels blond, light-brown, black, red)?
37 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Formulations of hypotheses about probability distributions
3. product-multinomial distribution: Are the vectors of frequencies
the same in each row? Are the vectors of frequencies
independent of the row index?
example – number of individuals with certain
socioeconomic status, political philosophy and
affiliation – Are the vectors of frequencies of individuals
(D-Li, D-C, R-Li, R-C) the same for each level of
socioeconomic status (high and low)?
example – blood groups – Is the distribution of the blood
groups (0, A, B, AB) the same in Prague and Koˇsice?
4. Poisson distribution:
example – killing by horse kick – Is the distribution of
number of corps of soldiers with n annual deaths (killed by
horse kicks) Poisson?
example – accidents in the factories – Is the distribution
of number of workers having an accident Poisson?
38 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Assignments in
Assignment number of boys:
1 Draw probability mass function of number of boys in the families with 12
children.
2 What are the probabilities of having n boys in the family
(n = 1, 2, . . . , 12)? What is the probability of having eight or more boys
in the family? What is the probability of having five to seven boys in the
family?
Assignment killing by horse kick:
1 Draw probability mass function of number of corps with n annual deaths
(killed by horse kicks).
2 What are the probabilities of having n annual deaths
(n = 0, 1, 2, 3, 4, 5+)? What is the probability of having one or less
annual deaths?
Assignment accidents in the factories:
1 Draw probability mass function of number of workers having an
accident.
2 What are the probabilities of having n accidents (n = 0, 1, 2, 3, 4, 5+)?
What is the probability of having two or more accidents?
39 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Assignments in
Assignment number of boys:
Calculate p (the probability of having a boy in a family) and Var[p]
(the variance of probability of having a boy in a family).
Assignment killing by horse kick:
Calculate λ (the mean number of annual deaths) and Var[λ] (the
variance of mean number of annual deaths).
Assignment accidents in the factories:
Calculate λ (the mean number of accidents in the factories) and
Var[λ] (the variance of mean number of accidents in the factories).
40 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Assignments in
Assignment blood groups:
In Prague and Koˇsice, calculate p (the probabilities of having certain
blood group in particular city) and Var[p] (the covariance matrix of
probability of having certain blood group in particular city).
Assignment eye and hair colour:
Calculate p (the probabilities of having certain eye and hair colour)
and Var[p] (the covariance matrix of probability of having certain eye
and hair colour).
41 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Types of contingency tables – multinomial distribution
1 × J contingency table of frequencies
outcome 1 outcome 2 . . . outcome J sum
x1 x2 . . . xJ N
1 × J contingency table of probabilities
outcome 1 outcome 2 . . . outcome J sum
p1 p2 . . . pJ 1
2 × J contingency table of frequencies
outcome 1 outcome 2 . . . outcome J sum
row 1 x11 x12 . . . x1J N1
row 2 x21 x22 . . . x2J N2
2 × J contingency table of probabilities
outcome 1 outcome 2 . . . outcome J sum
row 1 p11 p12 . . . p1J p1• = 1
row 2 p21 p22 . . . p2J p2• = 1
42 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Types of contingency tables – multinomial distribution
K × J contingency table of frequencies
outcome 1 outcome 2 . . . outcome J sum
row 1 x11 x12 . . . x1J N1
row 2 x21 x22 . . . x2J N2
...
...
... . . .
...
...
row K xK1 xK2 . . . xKJ NK
K × J contingency table of probabilities
outcome 1 outcome 2 . . . outcome J sum
row 1 p11 p12 . . . p1J p1• = 1
row 2 p21 p22 . . . p2J p2• = 1
...
...
... . . .
...
...
row K pK1 pK2 . . . pKJ pK• = 1
43 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Types of contingency tables – product-multinomial distribution
1 × J contingency table of frequencies (≈ multinomial distribution)
outcome 1 outcome 2 . . . outcome J sum
x1 x2 . . . xJ N
1 × J contingency table of probabilities (≈ multinomial distribution)
outcome 1 outcome 2 . . . outcome J sum
p1 p2 . . . pJ 1
2 × J contingency table of frequencies (≈ multinomial distribution)
outcome 1 outcome 2 . . . outcome J sum
group 1 x11 x12 . . . x1J N1
group 2 x21 x22 . . . x2J N2
2 × J contingency table of probabilities
outcome 1 outcome 2 . . . outcome J sum
group 1 p1|1 p2|1 . . . pJ|1 1
group 2 p1|2 p2|2 . . . pJ|2 1
44 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Types of contingency tables – product-multinomial distribution
K × J contingency table of frequencies (≈ multinomial distribution)
outcome 1 outcome 2 . . . outcome J sum
group 1 x11 x12 . . . x1J N1
group 2 x21 x22 . . . x2J N2
...
...
... . . .
...
...
group K xK1 xK2 . . . xKJ NK
K × J contingency table of probabilities
outcome 1 outcome 2 . . . outcome J sum
group 1 p1|1 p2|1 . . . pJ|1 1
group 2 p1|2 p2|2 . . . pJ|2 1
...
...
... . . .
...
...
group K p1|K p2|K . . . pJ|K 1
45 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Data structure for 1 × J contingency table – multinomial distribution
outcome 1 outcome 2 . . . outcome J sum
x1 1 0 . . . 0 1
x2 0 1 0 1
x3 0 1 0 1
x4 1 0 . . . 0 1
...
...
... . . .
...
...
xN−1 0 0 . . . 1 1
xN 1 0 . . . 0 1
sum= x x1 x2 . . . xJ N
sum of each row is one
sum of all row sums is N
sum of each column is xj , where j = 1, 2, . . . , J
sum of all xj , j = 1, 2, . . . , J, is N
x = n
46 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Data structure for K × J contingency table – (product-)multinomial distribution
outcome 1 outcome 2 . . . outcome J sum
xk1 1 0 . . . 0 1
xk2 0 1 0 1
xk3 0 1 0 1
xk4 1 0 . . . 0 1
...
...
... . . .
...
...
xk,Nk −1 0 0 . . . 1 1
xk,Nk
1 0 . . . 0 1
sum= xk xk1 xk2 . . . xkJ Nk
sum of each row is one
sum of all row sums is Nk
sum of each column is xkj , where j = 1, 2, . . . , J
sum of all xkj , j = 1, 2, . . . , J, is Nk
xk = nk , where k = 1, 2, . . . , K
47 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
(Univariate) normal distribution
Definition (normal distribution)
Random variable X is normally distributed with parameters μ
and σ2, i.e. X ∼ N(μ, σ2), where θ = (μ, σ2)T and density is
defined as f (x) = 1√
2πσ2
e
−
(x−μ)2
2σ2 , x ∈ R, σ > 0.
Definition (standardised normal distribution)
Random variable X is normally distributed with parameters
μ = 0 and σ2 = 1, i.e. X ∼ N(0, 1), where θ = (0, 1)T and
density is defined as φ (x) = f (x) = 1√
2π
e− x2
2 , x ∈ R.
Parameter μ is called mean of X and σ2 the variance of X.
48 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Bivariate normal distribution
Definition (bivariate normal distribution)
Random vector (X, Y)T is normally distributed with
parameters μ and Σ, i.e. (X, Y)T ∼ N2(μ, Σ), where
μ = (μ1, μ2)T
and Σ =
σ2
1 ρσ1σ2
ρσ1σ2 σ2
2
,
θ = (μ1, μ2, σ2
1, σ2
2, ρ)T , (x, y)T
∈ R2, μj ∈ R1, σ2
j > 0, j = 1, 2,
ρ ∈ −1, 1 ; density is defined as
f (x, y) =
1
A
exp −
1
B
(x−μ1)2
σ2
1
− 2ρ(x−μ1)(y−μ2)
σ1σ2
+ (y−μ2)2
σ2
2
,
where A = 2π σ2
1σ2
2 (1 − ρ2), B = 2 1 − ρ2 .
49 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Standardised bivariate normal distribution
Definition (bivariate standardised normal distribution)
Random vector (X, Y)T is normally distributed with
parameters μ and Σ, i.e. (X, Y)T ∼ N2(μ, Σ), where
μ = (0, 0)T
and Σ =
1 ρ
ρ 1
,
θ = (0, 0, 1, 1, ρ)T , (x, y)T
∈ R2, ρ ∈ −1, 1 ; density is defined
as
f (x, y) =
1
2π 1 − ρ2
exp −
x2 − 2ρxy + y2
2 (1 − ρ2)
.
50 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Standardised bivariate and multivariate normal distribution
Let x = x1, y = x2 and x = (x1, x2)T . Then the density of
standardised bivariate normal distribution can be rewritten into
matrix form:
f (x) =
1
2π(det(Σ))1/2
exp −
1
2
xT
Σ−1
x .
Let (X1, X2, . . . , Xk )T ∼ Nk (μ, Σ) and x is k-dimensional vector,
then the density is equal to
f (x) =
1
(2π)k/2(det(Σ))1/2
exp −
1
2
xT
Σ−1
x .
Marginal distributions of:
bivariate normal distribution – Xj ∼ N(μj, σ2
j ), j = 1, 2, . . . , k
standardised bivariate normal distribution –
Xj ∼ N(0, 1), j = 1, 2, . . . , k
51 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Bivariate normal distribution – simulation
Simulation of pseudo-random numbers from bivariate
normal distribution:
1 let X1 ∼ N(0, 1) and X2 ∼ N(0, 1)
2 then (Y1, Y2)T ∼ N2 (μ, Σ), where Y1 = σ1X1 + μ1
and Y2 = σ2(ρX1 + 1 − ρ2X2) + μ2
Example
Simulate pseudo-random numbers from bivariate normal
distribution, where θ = (μ1, μ2, σ2
1, σ2
2, ρ)T .
(a) μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0; (1) n = 50 and (2)
n = 1000;
(b) μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5; (1) n = 50 and (2)
n = 1000;
(c) μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1.2, ρ = 0.5; (1) n = 50 and (2)
n = 1000.
52 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Bivariate normal models
Figure: Joint density of three different bivariate normal distributions
(column by column); contour plots superimposed by image plots (first
row), 3D surface plot (second row); simulation study
53 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Bivariate normal models
Figure: Joint density of three different bivariate normal distributions
(column by column); n = 50 (first row), n = 1000 (second row);
contour plots superimposed by image plots; simulation study
54 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Mixture of two univariate and bivariate normal distribution
The mixture of two univariate normal distribution is defined
as follows: pN(μ1, σ2
1) + (1 − p)N(μ2, σ2
2), where
θ = (p, μ1, μ2, σ2
1, σ2
2)T .
The mixture of two bivariate normal distribution is defined
as follows: pN2 (μ1, Σ1) + (1 − p)N2 (μ2, Σ2), where
θ = (p, μ11, μ12, σ2
11, σ2
12, ρ1, μ21, μ22, σ2
21, σ2
22, ρ2)T .
55 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Different normal models – skewed, mesokurtic, platykurtic and leptokurtic
−4 −2 0 2 4
0.00.10.20.30.4
N(0,1)
t(0,1,df=10)
sN(0,1)
st(0,1,df=10)
μ = 0, σ = 1
−4 −2 0 2 4
0.00.20.40.60.8
N(0,0.5)
N(0,1)
N(0,1.5)
sN(0,0.5)
sN(0,1)
sN(0,1.5)
μ = 0, σ1 = 0.5, σ1 = 1, σ2 = 1.5
−3 −2 −1 0 1 2 3 4
0.00.10.20.30.4
N(0,1)
sN(0,1)
μ = 0, σ = 1
0.02
0.04
0.06
0.08
0.1
0.12 0.14
0.16
0.24
−3 −2 −1 0 1 2 3−3−2−10123
0.02
0.04
0.06
0.0
8
0.1
0.12
0.14
0.16
0.18
−3 −2 −1 0 1 2 3
−3−2−10123
μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5
0.02
0.04
0.06
0.08
0.1
0.1
2
0.14
−3 −2 −1 0 1 2 3
−3−2−10123
μ1 = 0, μ2 = 0, σ1 = 1, σ2 = 1, ρ = 0.5
Figure: Densities of different normal and skewed normal distributions
(first row, skewed normal indicated as ”sN”), densities of different
bivariate skewed normal distributions (second row)
56 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
The mixture of two bivariate normal distribution
Figure: Joint density of bivariate normal distribution (left), density of
the mixture of two bivariate normal distributions (middle), bivariate
kernel density estimate superimposed by density of the mixture of two
bivariate normal distributions (right); simulation study (contour plots
superimposed by image plots)
57 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Mixture of two univariate normal distribution
To express the binormal distribution formally, let Bi be (unobserved)
iid Bernoulli(p) random variable, p ∈ (0, 1). If Bi = 1 then Xi is
observed from N(μ1, σ2
1) distribution, otherwise it is observed from
N(μ2, σ2
2). Thus, the distribution of Xi given by Bi is
Xi |(Bi = bi ) ∼
N(μ1, σ2
1), if bi = 1,
N(μ2, σ2
2), if bi = 0.
The joint density of (Xi , Bi ) is therefore given by
f(xi , bi , θ) = f(xi |bi , θ) Pr(Bi = bi , p) ∼



p√
2πσ1
exp(−(xi −μ1)2
2σ2
1
), if bi = 1,
1−p√
2πσ2
exp(−(xi −μ2)2
2σ2
2
), if bi = 0,
where θ = (p, μ1, σ2
1, μ2, σ2
2)T
, from which the marginal density of Xi is
obtained as
f(xi , θ) =
bi ∈{0,1}
f(xi , bi , θ) = f(xi , 0, θ) + f(xi , 1, θ).
58 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Mixture of two univariate normal distribution
The binormal density function is a linear combination of the
density functions given by N(μ1, σ2
1) and N(μ2, σ2
2) distributions.
waiting time (minutes)
histograminabsolutescale
40 50 60 70 80 90 100
01020304050
0.00
0.01
0.02
0.03
50 60 70 80 90
waiting time (minutes)
density
Figure: Mixture of two normal densities – data faithful
59 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Binomial distribution
Jacob Bernoulli (1655–1705) – one of the founding fathers of
probability theory.
Definition (binomial distribution)
Let N be number of independent identical (random) Bernoulli
trials Xi, where Xi = 1 is a success (event occurred) and
Xi = 0 is a failure (event did not occur), i = 1, 2, . . . , N. Then
probability of success Pr(Xi = 1) = p and probability of
failure Pr(Xi = 0) = 1 − p. Number of successes X = N
i=1 Xi.
The probability that random variable X is equal to x = n
(realisation) is defined as Pr(X = x) = N
x px (1 − p)N−x
, for
x = 0, 1, 2, . . . , N.
Expected value of X is defined as
E[X] = N
x=0 x Pr(X = x) = N
x=0 x N
x px (1 − p)N−x
= Np.
Variance of X is defined as Var[X] = N
x=0 (x − E[X])2
Pr(X =
x) = N
x=0 (x − Np)2 N
x px (1 − p)N−x
= Np (1 − p).
60 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Binomial distribution
Reading: Random variable X is binomially distributed with
parameters N and p, where θ = p.
Notation: X ∼ Bin(N, p), θ = p
Do we need to change it? YES.
Why? Due to generalisation.
Equivalently, X ∼ Bin (N, p, 1 − p), where X = (X1, X2)T
,
θ = (p, 1 − p)T , X1 is number of successes, X2 = N − X1 is
number of failures, X1 ∼ Bin (N, p) and X2 ∼ Bin (N, 1 − p).
Then
E[X1] = Np, E[X2] = N (1 − p),
Var[X2] = Np (1 − p) = Var[X1] is independent of p,
Cov [X1, X2] = −Np (1 − p),
Cor [X1, X2] = −1.
Finally, n = (n1, n2)T
and p = (p1, p2)T
, p1 = p and p2 = 1 − p.
Then θ = p.
61 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Random sampling from a population of size Npop
If each selection from a population of size Npop is returned to the
population, i.e. the sampling is with replacement, then, for each
selection, the probability of selecting an individual with given
characteristic is p = M/Npop, where number of individuals with given
characteristic is M (M means ”marked”) and the proportion can now
be treated as a probability. Since the selections or ”trials” are
mutually independent and number of trials N is fixed, number of
outcomes X having given characteristic in the sample now has a
Binomial distribution, denoted by Bin(N, p).
62 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Binomial distribution
Definition (binomial distribution)
If a random sample of size N is taken from the population of
size Npop with replacement and X is the number of individuals
with a given characteristic in the sample, then X has a
binomial distribution with probability mass function defined as
Pr(X = x) = N
x px (1 − p)N−x
, where x = 0, 1, 2, . . . , N.
Expected value of X is defined as E[X] = Np.
Variance of X is defined as Var[X] = Np (1 − p).
63 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Random sampling from a population of size Npop
If we remove and individual chosen at random from the population of
size Npop and chose a second individual at random from the
remainder, then the probability of getting an individual with given
characteristic (M means ”marked”) is (M − 1)/(Npop − 1) if the first
individual was with this characteristic and M/(Npop − 1) if it was not.
This is called sampling without replacement and the probability of
choosing an individual with given characteristic changes with each
selection. Then number of outcomes X having given characteristic
now has a Hypergeometric distribution, denoted by
HypGeom(N, p).
64 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Hypergeometric distribution
Definition (hypergeometric distribution)
If a random sample of size N is taken from the population of
size Npop without replacement and X is the number of
individuals with a given characteristic in the sample, then X has
a hypergeometric distribution with probability mass function
defined as Pr(X = x) = M
x
Npop−M
N−x
/ Npop
N
, where
max {N + M − Npop, 0} ≤ x ≤ min {M, N}, but we usually have
x = 0, 1, 2, . . . , N.
Expected value of X is defined as E[X] = Np.
Variance of X is defined as Var[X] = Np (1 − p) r, where
r =
Npop−N
Npop−1 = 1 − N−1
Npop−1 > 1 − fs, fs = N/Npop is sampling
fraction. (fs can generally be neglected if fs < 0.1 (or preferably
fs < 0.05) and we can then set r = 1)
65 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Hypergeometric distribution
We see then that if fs can be ignored, we can approximate sampling
without replacement by sampling with replacement, and
approximate the hypergeometric distribution by the binomial
distribution.
66 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Definition (multinomial distribution)
Let N be number of independent identical (random) trials and in
each of them J ≥ 2 distinct possible outcomes can occur,
where Xji = 1 is a success (event occurred) and Xji = 0 is a
failure (event did not occur), i = 1, 2, . . . , N, j = 1, 2, . . . , J.
Number of successes Xj = N
i=1 Xji, N = J
j=1 Xj. Then
probability of success of j-th outcome in i-th trial is equal to
Pr(Xji = 1) = pj (cell probabilities) and probability of failure
in j-th trial is equal to Pr(Xji = 0) = 1 − pj. Let
X = (X1, X2, . . . , XJ)T . The probability that random variables Xj
are equal to xj = nj is defined as
Pr(X1 = x1, . . . , XJ = xJ) =
N!
j xj!
J
j=1
p
xj
j .
67 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Expected value of X is a vector defined as E[X] = Np.
Covariance matrix of X is defined as
Var[X] = N diag (p) − ppT
,
where
(Var[X])ij =
Npj(1 − pj) if i = j
−Npipj if i = j
Marginal distributions are binomial, i.e. Xj ∼ Bin N, pj .
Then
E[Xj] = Npj,
Var[Xj] = Npj 1 − pj
Cov Xi, Xj = −Npipj
Cor Xi, Xj = −pipj / pi (1 − pi) pj 1 − pj
68 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Reading: Random vector X is multinomially distributed with
parameters N and p, where θ = p.
Notation: X ∼ MultJ(N, p).
If J = 2, then Bin (N, p) ≈ Mult2 (N, p)
Realisation of one trial xji could be (1, 0, . . . , 0)T or
(0, 1, . . . , 0)T .
Example (number of individuals with certain blood type)
Number of individuals X = (X1, X2, X3, X4)T with certain blood
group is multinomially distributed following Hardy-Wienberg
equilibrium, i.e. X = (X1, X2, X3, X4)T ∼ Mult4(N, p), where
N = 500 (Katina et al. 2015). Calculate theoretical frequencies
nj,E .
attributes (groups) 0 A B AB
nj,O 209 184 81 26
nj,E 210 183 80 27
69 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Example (number of individuals with certain socioeconomic
status, political philosophy and political affiliation)
Number of individuals X1, . . . , X8 with socioeconomic status,
political philosophy and political affiliation is multinomially
distributed, i.e. X = (X1, . . . , X8)T ∼ Mult8(N, p), where
p = (p1, p2, . . . , p8)T and N = 500 (Christensen 1990,
modified). Calculate (a) Var[X1], (b) Var[X4], (c) Cov [X1, X4]
and (d) Cor [X1, X4].
Table: 2 × 4 contingency table of probabilities pj
D-C D-Li R-C R-Li total
H 0.12 0.12 0.12 0.04 0.4
Lo 0.18 0.18 0.18 0.06 0.6
total 0.30 0.30 0.30 0.10 1.0
70 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multinomial distribution
Notation: (1) socioeconomic status (high – H, low – Lo), (2)
political philosophy (democrat – D, republican – R) a (3) political
affiliation (conservative – C, liberal – Li). Then X1 (H-D-C), X2
(H-D-Li), X3 (H-R-C), X4 (H-R-Li), X5 (Lo-D-C), X6 (Lo-D-Li), X7
(Lo-R-C) and X8 (Lo-R-Li).
Solution:
Var[X1] = 500 × 0.12 × (1 − 0.12) = 52.8
Var[X4] = 500 × 0.04 × (1 − 0.04) = 19.2
Cov [X1, X4] = −500 × 0.12 × 0.04 = −2.4
Cor [X1, X4] = −2.4/
√
52.8 × 19.2 = −0.075
What are the expected frequencies?
Table: 2 × 4 contingency table of frequencies Xj
D-C D-Li R-C R-Li
H 60 60 60 20
Lo 90 90 90 30
71 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multi-hypergeometric distribution
Definition (multi-hypergeometric distribution)
Suppose we have k subpopulations of sizes Mj, where
j = 1, 2, . . . , k, and k
j=1 Mj = Npop, the total population size.
Let pj = Mj/Npop. A simple random sample of size N is taken
from the population yielding Xj from the j-th subpopulation.
X = (X1, X2, . . . , Xk )T has multi-hypergeometric distribution.
Then joint probability mass function of X is defined as
f(x) = Pr(X = x) = k
j=1
Mj
xj
/ Npop
N
, where
0 ≤ xj ≤ min Mj, N and N = k
j=1 xj.
72 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multi-hypergeometric distribution
Since we can add the subpopulations together we see that the
marginal distribution of an Xj is also hypergeometric, with two
subpopulations Mj and N − Mj , namely fj (xj ) = Mj
xj
Npop−Mj
N−xj
/ Npop
N
.
In a similar fashion we see that the probability function of X1 + X2 is
the multi-hypergeometric distribution, namely
f12(x1, x2) = M1+M2
x1+x2
Npop−M1−M2
N−x1−x2
/ Npop
N
.
Additionally, Var[Xj ] = Npj (1 − pj )r, where r = (Npop − N)/(Npop − 1),
and Var[X1 + X2] = Nr(p1 + p2)(1 − p1 − p2). Finally, the covariance
of X1 and X2 is equal to
Cov[X1, X2] = 1
2 (Var[X1 + X2] − Var[X1] − Var[X2]) = −rNp1p2.
We then find that if qj = 1 − pj , then Var[X1 − X2] =
Var[X1] + Var[X2] − 2Cov[X1, X2] = rN [p1q1 + p2q2 − 2p1p2]
= rN p1 + p2 − (p1 − p2)2
.
73 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multi-hypergeometric distribution and two dependent proportions
Suppose we have a population of Npop people and a sample of size N
is chosen at random without replacement. Each selected person is
asked two questions to each of which they answer yes (1) or no (2),
so that p12 is the proportion answering yes to the first question and no
to the second, p11 is the proportion answering yes to both questions,
and so forth. Then the proportion answering yes to the first question
is p1 = p11 + p12 and the proportion answering yes to the second
question is p2 = p11 + p21. Let Xij (i, j = 1, 2) be the number observed
in the sample in the category with probability pij , let X1 = X11 + X12
the number answering yes to the first question, and let X2 = X11 + X21
be the number answering yes to the second question. The interest is
to compare p1 and p2 but p12 is often ignored (and p21 as well).
74 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multi-hypergeometric distribution and two dependent proportions
The four variables Xij have a multi-hypergeometric distribution, and
X1
N − X2
N = X1−X2
N = X12−X21
N = X12
N − X21
N
E X1
N − X2
N = p1 − p2 = p12 − p21,
Finally, Var X1
N − X2
N = 1
N2 Var[X1 − X2] =
1
N2 (Var[X1] + Var[X2] − 2Cov[X1, X2]) = r 1
N [p1q1 + p2q2 − 2p1p2]
= r 1
N p1 + p2 − (p1 − p2)2
.
75 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Multi-hypergeometric distribution vs multinomial distribution
If we can approximate sampling without replacement by sampling with
replacement, we can set r = 1 above, and the multi-hypergeometric
distribution can be replaced by the multinomial distribution.
The Multinomial distribution also arises when we have N fixed
Bernoulli trials but with k possible outcomes rather than just two, as
with the binomial distribution.
76 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
Definition (product-multinomial distribution)
Let Nk be number of independent identical (random) trials and
in each of them J ≥ 2 distinct possible outcomes can occur,
where Xkji = 1 is a success (event occurred) and Xkji = 0 is a
failure (event did not occur), i = 1, 2, . . . , Nk , k = 1, 2, . . . , K,
j = 1, 2, . . . , J. Number of successes Xkj = Nk
i=1 Xkji and
K
k=1 Nk = N. Then probability of success of kj-th outcome
in i-th trial is equal to Pr(Xkji = 1) = pkj (cell probabilities) and
probability of failure of kj-th outcome in i-th trial is equal to
Pr(Xkji = 0) = 1 − pkj. Let Xk = (Xk1, Xk2, . . . , XkJ)T be
multinomially distributed with parameters Nk and pk , i.e.
Xk ∼ MultJ (Nk , pk ), where θk = pk a pk = (pk1, pk2, . . . , pkJ)T .
Let realisations of Xk be xk . Then xkj = nkj and
nk = (nk1, nk2, . . . , nkJ)T . Additionally, Xk are independent.
77 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
The probability that random variables Xkj are equal to xkj = nkj
(for all j and k) is defined as
Pr(Xkj = xkj, ∀k, j) =
K
k=1
Pr(Xkj = xkj , ∀j).
The probability that random variables Xkj are equal to xkj = nkj
(for all j) is defined as
Pr(Xkj = xkj, ∀j) =

Nk !/
J
j=1
xkj!


J
j=1
p
xkj
kj .
Then
Pr(Xkj = xkj, ∀k, j) =
K
k=1



Nk !/
J
j=1
xkj!


J
j=1
p
xkj
kj

 .
78 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
Reading: Random matrix X is product-multinomially distributed
with parameters N = (N1, N2, . . . , NK )T and p with the rows pk ,
where θk = pk , k = 1, 2, . . . , K.
Notation: X ∼ ProdMultK (N, p).
If K = 1, then MultJ (N, p ) ≈ ProdMult1 (N, p)
Realisation of one trial xkij could be (1, 0, . . . , 0)T or
(0, 1, . . . , 0)T .
Then
expected frequencies are equal to Nk pkj ,
within each Xk , variances Var[Xkj], covariances
Cov[Xkj, Xki] and correlations Cor[Xkj, Xki] are calculated
as for multinomial distribution,
between Xk , e.g. Cov[X1, X2], k = 1, 2, are zeroes due to
independence of Xk
79 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
Example (number of individuals with certain socioeconomic
status, political philosophy and political affiliation)
Number of individuals X = (X1, X2)T with socioeconomic status,
political philosophy and political affiliation is
product-multinomially distributed, i.e. X ∼ ProdMult2(N, p),
where X1 = (X11, X12, X13, X14)T are number of individuals with
high socioeconomic status, X2 = (X21, X22, X23, X24)T number
of individuals with low socioeconomic status,
pk = (p1|k , p2|k , . . . , pJ|k )T , pkj = pj|k =
njk
nk
, k = 1, 2,
N = (N1, N2)T , N1 = 200, N2 = 300 (Christensen 1990.
modified). Calculate (a) probabilities pj|k , (b) expected
frequencies, (c) Var[X4|1], (d) Cov X1|2, X4|2 and (e)
Cov X1|1, X4|2 .
80 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
Notation: (1) socioeconomic status (high – H, low – Lo), (2)
political philosophy (democrat – D, republican – R) a (3) political
affiliation (conservative – C, liberal – Li). Then X1 (H-D-C), X2
(H-D-Li), X3 (H-R-C), X4 (H-R-Li), X5 (Lo-D-C), X6 (Lo-D-Li), X7
(Lo-R-C) and X8 (Lo-R-Li).
Solution:
Table: 2 × 4 contingency table of probabilities pj|k
D-C D-Li R-C R-Li total
H 0.3 0.3 0.3 0.1 1.0
Lo 0.3 0.3 0.3 0.1 1.0
81 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Product-multinomial distribution
Table: 2 × 4 contingency table of frequencies nkj
D-C D-Li R-C R-Li total
H 60 60 60 20 200
Lo 90 90 90 30 300
Var[X4|1] = 200 × 0.1 × (1 − 0.1) = 18.
Cov X1|2, X4|2 = −300 × 0.3 × 0.1 = −9,
Cov X1|1, X4|2 = 0, due to the independence of X1 and X2.
82 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson distribution
Definition (Poisson distribution)
Let X be random variable characterised by Poisson distribution,
i.e. X ∼ Poiss(λ), where θ = λ. Then
Pr(X = x) =
λx e−λ
x!
, x = 0, 1, . . . ,
where x = n is realisation of X. Then E[X] = λ and Var[X] = λ.
Binomial distribution can be approximated by Poisson
distribution if N → ∞, p → 0 and λN = Np → λ, where
X ∼ Poiss(λ).
Poisson distribution can be approximated by χ2 distribution if
N → ∞, p → 0 and λN = Np → λ and
Pr(X ≤ y) = Pr(χ2
2(1+y) ≤ 2λ), where X ∼ Poiss(λ).
83 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson distribution
Example (Poisson distribution; number of car accidents per week)
Having 50 million people driving car independently in Italy next week,
the probability of car crash deaths (road traffic deaths) is 0.000002
(death rate), where number of deaths X is distributed binomially, i.e.
Bin(50 mil, 0.000002) or Poiss(50 mil × 0.000002) ≈ Poiss(100).
Example (Poisson distribution, three types of accidents)
Let n1 be number of car crash deaths, n2 be number of airplane
crash deaths, n3 be number of train crash deaths in Italy next
week. Then Poisson model with parameters λ1, λ2 a λ3 for
independent Poisson random variables X1, X2 a X3 is defined as
X1 + X2 + X3 ∼ Poiss(λ1 + λ2 + λ3).
Generalising this example we get
X1 + X2 + . . . + XJ ∼ Poiss (λ1 + λ2 + . . . + λJ ) .
84 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson distribution
Multinomial distribution can be approximated by Poisson distribution
(X1 + X2 + . . . + XJ ) |N ∼ MultJ (N, p1, p2, . . . , pJ ) ,
where N = j Xj and pj =
λj
j λj
, j = 1, 2, . . . , J. If Xj , j = 1, 2, . . . , J
are independent, Xj ∼ Poiss λj , where E[Xj ] = λj , then conditional
probability, that all Xj = xj fixing (conditioning on) N = j Xj is equal
to
Pr

X = x|
j
Xj = N

 =
Pr(X1 = x1, X2 = x2, . . . , XJ = xJ )
Pr( j Xj = N)
=
j
λ
xj
j
e
−λj
xj !
λN e−λ
N!
=
N!e−λ
j λ
xj
j
e−λ
j λx
j xj !
=
N!
j xj !
j
λj
λ
xj
, where pj =
λj
λ
.
85 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Cumulative distribution function and density
Definition (cumulative distribution function)
Let X be random variable. The cumulative distribution function of
X is defined as
FX (x) = Pr(X ≤ x).
for all x ∈ R, where R is called a domain and with 0, 1 as
counterdomain.
Properties of cumulative distribution function:
1 FX (−∞) = limx→−∞ FX (x) = 0, and
FX (∞) = limx→∞ FX (x) = 1.
2 FX (x) is a monotone, nondecreasing function, i.e.
FX (a) ≤ FX (b) for a < b.
3 FX (x) is right continuous in each argument, i.e.
lim0<h→0 F(x + h) = F(x).
86 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Joint and marginal cumulative distribution function
Definition (joint cumulative distribution function)
Let X1, X2, . . . , Xk be k random variables. The joint cumulative
distribution function of X1, X2, . . . , Xk is defined as
FX1,X2,...,Xk
(x1, x2, . . . , xk ) = Pr(X1 ≤ x1, X2 ≤ x2, . . . , Xk ≤ xk )
for all (x1, x2, . . . , xk ) ∈ Rk
, where Rk
is called a domain and with
0, 1 as counterdomain.
Properties of bivariate cumulative distribution function:
1 FXY (−∞, y) = limx→−∞ FXY (x, y) = 0 for ∀y,
FXY (x, −∞) = limy→−∞ FXY (x, y) = 0 for ∀x, and
limx,y→∞ FXY (x, y) = FXY (∞, ∞) = 1.
2 If x1 < x2 and y1 < y2, then Pr(x1 < X ≤ x2, y1 < Y ≤ y2) =
FXY (x2, y2) − FXY (x2, y1) − FXY (x1, y2) + FXY (x1, y1) ≥ 0
3 FXY (x, y) is right continuous in each argument, i.e.
lim0<h→0 FXY (x + h, y) = lim0<h→0 FXY (x, y + h) = FXY (x, y).
87 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Joint and marginal cumulative distribution function
Definition (marginal cumulative distribution functions)
If FX1,X2,...,Xk
(x1, x2, . . . , xk ) is joint cumulative distribution function of
X1, X2, . . . , Xk , then the cumulative distribution functions
FX1
(x1), FX2
(x2), . . . , FXk
(xk ) are called marginal cumulative
distribution functions.
Definition (marginal cumulative distribution functions)
If FX,Y (x, y) is joint cumulative distribution function of X, Y, then the
cumulative distribution functions FX (x) and FY (y) are called
marginal cumulative distribution functions.
Remark: FX (x) = FXY (x, ∞) and FY (y) = FXY (∞, y), i.e. knowledge
of joint cumulative distribution function of X and Y implies knowledge
of the two marginal cumulative distribution functions.
88 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Joint and marginal discrete density function
Definition (joint discrete random variable)
The k-dimensional random vector X = (X1, X2, . . . , Xk )T
is defined to
be a k-dimensional discrete random vector if it can assume values
only at a countable number of points (x1, x2, . . . , xk )T
∈ Rk
. We also
say that the random variables X1, X2, . . . , Xk are joint(ly) discrete
random variables.
Definition (joint discrete density function ≈ probability mass function)
If X = (X1, X2, . . . , Xk )T
is k-dimensional discrete random vector, then
the joint discrete density function of X is defined as
fX1,X2,...,Xk
(x1, x2, . . . , xk ) = Pr(X1 = x1, X2 = x2, . . . , Xk = xk )
for all (x1, x2, . . . , xk ) ∈ Rk
, and is defined to be 0 otherwise.
Remark: fX1,X2,...,Xk
(x1, x2, . . . , xk ) = 1, where the summation is
over all possible values of X1, X2, . . . , Xk .
89 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Joint and marginal discrete density function
Definition (marginal discrete density functions ≈ probability mass
function)
If X and Y are jointly discrete random variables, then fX (x) and fY (y)
are called marginal discrete density functions. More generally,
Xj1
, . . . , Xjm
be any subset of jointly discrete random variables
X1, X2, . . . , Xk , then fXj1
,...,Xjm
(xj1
, . . . , xjm
) is also called a marginal
density of m-dimensional random vector (Xj1
, . . . , Xjm
)T
.
Remark: If X1, X2, . . . , Xk are jointly discrete random variables, then
any marginal discrete density can be found from joint density, but not
conversely. E.g. if X and Y are jointly discrete random variables with
values (xi , yj ), i = 1, 2, . . . , k, j = 1, 2, . . . , k, then
fX (xi ) =
j
fXY (xi , yj ),
where the summation is over all yj for the fixed xi .
90 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Joint continuous cumulative distribution function and density function
Definition (joint continuous random variable and density)
The k-dimensional random vector X = (X1, X2, . . . , Xk )T
is defined to
be a k-dimensional continuous random vector if and only if there
exists a function fX1,X2,...,Xk
(x1, x2, . . . , xk ) ≥ 0 such that
FX1,...,Xk
(x1, . . . , xk ) =
x1
−∞
. . .
xk
−∞
fX1,...,Xk
(u1, . . . , uk )du1, . . . , duk
for all (x1, x2, . . . , xk )T
∈ Rk
. Function fX1,...,Xk
(x1, . . . , xk ) is defined to
be joint continuous density function.
As in one dimensional case, joint continuous (probability) density
function has two properties:
1 fX1,...,Xk
(x1, . . . , xk ) ≥ 0, and
2
∞
−∞
. . .
∞
−∞
fX1,...,Xk
(x1, . . . , xk )dx1, . . . , dxk = 1.
91 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Marginal continuous density function
Definition (marginal continuous density functions)
If X and Y are jointly continuous random variables, then fX (x) and
fY (y) are called marginal continuous density functions. More
generally, Xj1
, . . . , Xjm
be any subset of jointly continuous random
variables X1, X2, . . . , Xk , then fXj1
,...,Xjm
(xj1
, . . . , xjm
) is also called a
marginal density of m-dimensional random vector (Xj1
, . . . , Xjm
)T
.
Remark: If X1, X2, . . . , Xk are jointly continuous random variables,
then any marginal continuous density can be found from joint density,
but not conversely. E.g. if X and Y are jointly continuous random
variables, then
fX (x) =
∂FX (x)
∂x
=
∂
∂x
x
−∞
∞
−∞
fXY (u, y)dy du =
∞
−∞
fXY (x, y)dy
and
fY (y) =
∂FY (y)
∂y
=
∂
∂y
y
−∞
∞
−∞
fXY (x, u)dx du =
∞
−∞
fXY (x, y)dx.
92 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Conditional discrete density function and cumulative distribution function
Definition (conditional discrete density function)
Let X and Y be jointly discrete random variables with joint discrete
density function fXY (x, y). The conditional discrete density
function of Y given X = x is defined as
fY|X (y|x) =
fXY (x, y)
fX (x)
,
if fX (x) > 0.
Definition (conditional discrete cumulative distribution function)
Let X and Y be jointly discrete random variables, the discrete
cumulative distribution function of Y given X = x is defined to be
FY|X (y|x) = Pr[Y ≤ y, X = x] for all fX (x) > 0.
Remark: FY|X (y|x) = j:yj ≤y fY|X (y|x).
93 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Conditional continuous density function and cumulative distribution function
Definition (conditional continuous density function)
Let X and Y be jointly continuous random variables with joint
continuous density function fXY (x, y). The conditional continuous
density function of Y given X = x is defined as
fY|X (y|x) =
fXY (x, y)
fX (x)
,
if fX (x) > 0.
Definition (conditional continuous cumulative distribution function)
Let X and Y be jointly continuous random variables, the conditional
continuous cumulative distribution function of Y given X = x is
defined as FY|X (y|x) = Pr[Y ≤ y, X = x] for all fX (x) > 0.
Remark: FY|X (y|x) =
y
−∞
fY|X (u|x)du.
94 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Conditional, joint and marginal distributions
We can also write the following:
∞
−∞
fY|X (y|x)dy =
∞
−∞
fXY (x,y)
fX (x) dy = 1
fX (x)
∞
−∞
fXY (x, y)dy = fX (x)
fX (x) = 1.
Example (joint normal density)
Prove that the function
fXY (x, y) =
1
A
exp −
1
B
x−μX
σX
2
− 2ρx−μX
σX
y−μY
σY
+ y−μY
σY
2
where A = 2πσX σY 1 − ρ2, B = 2 1 − ρ2
, has the following
property
∞
−∞
∞
−∞
fXY (x, y)dxdy = 1. To simplify the integral, you
shall substitute u = (x − μX )/σX and v = (y − μY )/σY , and then
w = u−ρv√
1−ρ2
and dw = du√
1−ρ2
.
95 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Marginal normal density
Theorem (marginal normal density)
If (X, Y) has a bivariate normal distribution, then the marginal
distributions of X and Y are univariate normal distributions, i.e. X is
normally distributed with mean μX and variance σ2
X , and Y is normally
distributed with mean μY and variance σ2
Y .
Example (marginal normal density)
Prove above mentioned theorem, e.g. for fX (x) =
∞
−∞
fXY (x, y)dy
and substituting v = (y − μY )/σY .
96 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Conditional normal density
Theorem (conditional normal density)
If random vector (X, Y)T
has a bivariate normal distribution, then the
conditional distributions of Y given X = x is normal with mean
μY + ρσY
σX
(x − μX ) and variance σ2
Y (1 − ρ2
) and density
fY|X (y|x) =
1
√
2πσY 1 − ρ2
exp −
1
2σ2
Y (1 − ρ2)
y − μY − ρ
σY
σX
(x − μX ) .
Example (conditional normal density)
Prove above mentioned theorem using joint and marginal normal
densities, i.e. prove that fY|X (y|x) = fXY (x,y)
fX (x) .
97 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Stochastic independence
Definition (stochastic independence)
Let (X1, X2, . . . , Xk )T
be a k-dimensional random vector.
X1, X2, . . . , Xk are defined to be stochastically independent if and
only if
FX1,...,Xk
(x1, . . . , xk ) =
k
j=1
FXj
(xj ) for all x1, . . . , xk .
Definition (stochastic independence)
Let (X1, X2, . . . , Xk )T
be a k-dimensional random vector.
X1, X2, . . . , Xk are defined to be stochastically independent if and
only if
fX1,...,Xk
(x1, . . . , xk ) =
k
j=1
fXj
(xj ) for all x1, . . . , xk .
Remark: Often the word ”stochastically” is omitted.
98 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Assignments in
Assignment number of individuals with certain socioeconomic status,
political philosophy and affiliation:
1 What is the number of all 2 × 4 contingency table with N = 50?
n+k−1
k
= 57
8
= 57
49
= 1652411475
1 choose(57,49)
2 choose(57,8)
2 What is the probability of getting the following 2 × 4 contingency table?
D-C D-Li R-C R-Li
H 5 7 6 4
Lo 8 7 10 3
Pr(X1 = x1, X2 = x2, . . . , X8 = x8) =
50!
5!7!4!6!8!7!3!10!
0.125
0.127
0.044
0.126
0.188
0.187
0.063
0.1810
=
2.332506 × 10−6
3 n <- c(5,7,6,4,8,7,10,3)
4 p <- c(.12,.12,.12,.04,.18,.18,.18,.06)
5 dmultinom(x=n, prob=p) # 2.332506e-06
99 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Assignments in
Assignment number of individuals with certain socioeconomic status,
political philosophy and affiliation:
1 What is the most probable 2 × 4 contingency table and what is the
probability of getting it?
D-C D-Li R-C R-Li
H 6 6 6 2
Lo 9 9 9 3
Pr(X1 = x1, X2 = x2, . . . , X8 = x8) =
50!
6!6!2!6!9!9!3!9!
0.126
0.126
0.126
0.042
0.189
0.189
0.189
0.063
=
1.020471 × 10−5
4.375× more than in (2)
6 n <- c(6,6,6,2,9,9,9,3)
7 p <- c(.12,.12,.12,.04,.18,.18,.18,.06)
8 dmultinom(x=n, prob=p) # 1.020471e-05
2 Draw probability mass function of number of possible 2 × 4 contingency
tables with N = 50.
100 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Distributions for circular data – uniform and wrapped normal distribution
Example (histogram on a circle, rose diagram)
A wind rose is a graphic tool used by meteorologists to give a
succinct view of how wind speed and direction are typically distributed
at a particular location. In statistics, it is called bivariate histogram.
Visualise in wind rose of wind speed Xs in m/s (for a reference
1 m/s = 3.6 km/h) and wind direction Xd in dgr of simulated data:
(A) Xd ∼ Unif(a, b), where a = 0 and b = 360, Xs ∼ Gamma(λ, k),
where λ = 50 and k = 1 (Gamma(λ, 1) ≈ Exp(λ)), n = 1000.
(B) Xd ∼ WN(μ, ρ), where μ = 0 and ρ = exp(−σ2
/2), σ = 0.5,
Xs ∼ Gamma(λ, k), where λ = 50 and k = 1 (Gamma(λ, 1),
n = 1000.
Use library(circular) and function windrose(). To visualise
wind speed use also function topo.colors(k). Be careful with
colour scaling of k ordered intervals of wind speed. Visualise also
rose diagrams, data and averages of wind direction (the latter when
appropriate) and compare it with wind rose (orientation, scaling, etc.).
101 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Distributions for circular data – uniform and wrapped normal distribution
N
E
S
W +
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
uniform distribution
N
E
S
W +
q
q
q
q
q
q
q
q
qq
q
q
q
q q
q
q qq
q
q
q q
q
q
q
q
q
qq qq q
q
q
q q
q
q
qqq
q
qq
q
q
qq
qq qqq qq qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
q
q
q
q q
q
q
q q
q
qq q
qqq
q
qq
q
qq
q
q
q
q q
qq
q
q
q
q
qq q
q
q
q
q
q
qq
q
qq qq
q
q
qq q
q
q
q q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
q
q q
q
q
qq
q
q
q
q
q
q
q
qq q
q
q
q
qq
q qq
q
q q
q
q
qq
q
qq
q
qq q
q
q
q
q
q
q
q
q
q
q q
qq
q
q
qq q
q q
q
q
q
q q q q
q
q
qq
q
q
q q qq
q q qq
q
q
qq
q
q
q q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q qq
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq q q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
qq q qqq
q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
qq
qq
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q qqq
q
q
q q
q
q
q
q qq
q
q
q
q
q
q
q
q qq
q q
q
q
q
q
q
q
q
qq q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qqq
q
q
qq
q
q
qq qqq q
qq
q
q
q
qqq q
q
q
q
q
q qq
qqq
q q q
qqqq q
q
q
q
qq
q
q
q
q
q
q
q
q
q q
q
q
q
qq
q
q
q
q
qq
q
q
q q
q
qqq q
q
q
q
q
q
q
q
q
qq
qqqq
qqq
q
q
q
qq
q
q
q
qq
q
qq
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
qq
q q
qq
q
q
q
q
q
qq
q
q
q
q
q qq
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
qqq
q
qq q
q
q
q
qq
q
q
q
q
q
q qq q
q
q
q
q qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q q
q
q
q
qq
q
q
q
q
q
q
q
qqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q qq q
q
q
q
q
q q
q
qq
q q
q
qq
q
q
q
q
q
q
q q
q
q
q
qqqq
q
qq q
q
q
q
qq
q
q
q
qqq q
q
q
qq
q
q
q
q
qq
q
q
qqqqq
q
q
q
q
q
q
q
q qq q
qqq
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q
q
q
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
qq
q
q
qqqq
q
q qq
q
q
q
q
q
q
qq qqqq
q
q
q
q
q
q
q
qq q
qq
q
q
q qq
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq q
q
q
q
q
qq
q
qq
q
q q q
q
q
q
qq q
q
q
qq
q
q
q
q
q
q
q
wrapped normal distribution
102 / 152 Stanislav Katina Statistical Inference I and II
Statistical Graphics
Distributions for circular data – uniform and wrapped normal distribution
0.2
N
E
S
W +
uniform distribution of wind direction
0.2
0.4
0.6
0.8
N
E
S
W +
wrapped normal distribution of wind direction
103 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson process and marked Poisson process
A common phenomenon is the arrival or occurrence of an event at
a time t independently of the time of previous occurrence of the
events – events on nonoverlapping time intervals are mutually
independent. In addition, the average rate of arrivals is constant.
The Poisson probability mass function (pmf) is a good model for
the number of arrivals in an interval t and in general we call it a
Poisson process. Typical applications include occurrence of
earthquakes. As we increase the rate, the pmf would be more and
more like a normal distribution.
We are interested in determining:
the pmf of the number arrivals in a time interval t,
the probability density function (pdf) of the arrival time of
the kth occurrence (e.g. k = 0, k = 1, k > 1), and
the pdf of the time interval between arrivals of successive
occurrences (interarrival time).
104 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Poisson process, marked and compound Poisson process
Note: This process refers to arrivals on a continuous line. For many
applications, this line is time, but for others it may be considered a
spatial domain of dimension one; e.g., a transect along an
ecosystem, or the midline of a river, or a road.
It is of interest also to include some quantity (a mark) to the
occurrence of the event at time t. For earthquakes, this quantity may
be intensity, magnitude, and energy. For rain events, the quantity
may be rainfall intensity. Associating a quantity yi to the time ti we
have a marked Poisson process. We assume that the random
variable describing quantity is independent from the random variable
describing arrival times.
The sum of all marks for arrivals occurring in the interval t is called a
compound Poisson process.
105 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Marked Poisson process, generalised gamma family of distributions
As an example, think about modeling rainfall for every day of a
month. A rainy or wet day be decided upon a Poisson process,
and the mark would be the amount of rain for that day if it is a wet
day. The frequency distribution of rainfall in rainy days at a site
determines the amount of rain, once a day is selected as wet
(Richardson and Nicks, 1990). Daily rainfall distribution is skewed
toward low values and it varies month to month according to climatic
records.
The most typical distributions for rainfall amount are:
exponential and Weibull,
gamma and generalised gamma, and
skewed normal, log-normal and log-logistic.
Note: In general, most of these distributions are from generalised
gamma family or related distributions.
106 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Simulation of marked Poisson process – rainfall
0 5 10 15 20 25 30
01234
day
run 1, rain days = 9
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 2, rain days = 11
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 3, rain days = 12
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 4, rain days = 9
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 5, rain days = 7
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 6, rain days = 10
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 7, rain days = 15
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 8, rain days = 8
rain(cm/day)
0 5 10 15 20 25 30
01234
day
run 9, rain days = 12
rain(cm/day)
Figure: Amount of rain for a day (cm/day) during 30-day period –
marked Poisson process simulation
107 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Simulation of marked Poisson process – rainfall
0.0 1.0 2.0 3.0
0.00.51.01.5
rain (cm/day)
run 1, rain days = 9
relativefrequency
0.0 0.2 0.4 0.6 0.8 1.0
0123
rain (cm/day)
run 2, rain days = 11
relativefrequency
0.0 0.4 0.8 1.2
0123
rain (cm/day)
run 3, rain days = 12
relativefrequency
0.0 0.5 1.0 1.5
01234
rain (cm/day)
run 4, rain days = 9
relativefrequency
0.0 0.5 1.0 1.5 2.0
0.00.51.01.5
rain (cm/day)
run 5, rain days = 7
relativefrequency
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0123
rain (cm/day)
run 6, rain days = 10
relativefrequency
0.0 0.5 1.0 1.5
0.00.51.01.52.02.53.0
rain (cm/day)
run 7, rain days = 15
relativefrequency
0.0 0.2 0.4 0.6 0.8 1.0 1.2
01234
rain (cm/day)
run 8, rain days = 8
relativefrequency
0 1 2 3 4
0.00.51.01.5
rain (cm/day)
run 9, rain days = 12
relativefrequency
Figure: Amount of rain for a day (cm/day) during 30-day period –
marked Poisson process simulation
108 / 152 Stanislav Katina Statistical Inference I and II
Probabilistic and Statistical Models
Simulation of marked Poisson process – rainfall
0 5 10 15 20 25 30 35
0.000.020.040.060.080.100.12
650
211
92
33
8 5 1
rain (cm/day)
exponential distributionrelativefrequency
0 10 20 30 40 50
0.000.010.020.030.040.050.06
333
310
189
88
46
26
7
0 0 1
rain (cm/day)
Weibull distribution
relativefrequency
0 5 10 15 20 25 30 35
0.000.050.100.15
734
172
64
18 8 3 1
rain (cm/day)
gamma distribution
relativefrequency
0 10 20 30 40
0.000.010.020.030.040.050.06
3
81
306
297
184
93
29
5 2
rain (cm/day)
skewed normal distributionrelativefrequency
Figure: Daily rainfall amount – simulations, numer of days n = 1000
109 / 152 Stanislav Katina Statistical Inference I and II